April 01, 2002 Re: Bubble sort bechmark | ||||
---|---|---|---|---|
| ||||
Posted in reply to John Culver | Hmm. The cpu specs advertise that complex addressing modes don't add extra time. Perhaps this is not true. -Walter "John Culver" <jculver@btinternet.spamless.com> wrote in message news:a8844q$2c0c$1@digitaldaemon.com... > Hi, > From my testing I doubt the speed difference is in the code generated in DoSort() : > > When I compiled with the following 2 fixes: > in DoSort() loop should be : > for (I = 0; I<(SORT_SIZE-1); I++) // I=0..I<SORT_SIZE produces incorrect > results and reads and writes unallocated memory which could introduce unknown delays > and in TestSort() loop should be : > for (i=SORT_ITER; i>0; i--) // strictly should be i>0 to generate SORT_ITER > loops rather than SORT_ITER+1 > > I get the following results for my system : Athlon 1.33 / Win98 SE (yuk!) / DMC 8.25 > > With the code as is (fixed) I get approx. 5500ms () > If I manually unroll the i loop in TestSort() the appropriate 10 time I consistently > get execution times of only 4000ms !!! > > Note that DoSort() and InitSort() have 100% identical code in this case (according to > obj2asm). > > So the unrolled code is noticably more efficient, so it looks like the identical and > apparently efficent code in DoSort() is being stuffed up by something else. Perhaps my > Athlons instruction translation techniques are doing something very different due to > the context (remember it's NOT really an x86 processor - it is really a risc86 faking > it), perhaps it's caches are messed up by the CS or SP alignment ?? It certainly > doesn't look like the low speed is the compiler generating poor code, as the same code > generates 2 very different speeds in only subtly different contexts. > > Unfortunately, these days if you do a one task computational benchmark you are more > likely to discover some subtle feature of your processor, not of your compiler. > > > JohnC > PS To clarify things : I love AMD products (well processors and chipsets) > > > ======= int.c (revised, and with brutal UNROLL option - see #define UNROLL ... > #include <stdio.h> > #include <stdlib.h> > #include <time.h> > > #define SORT_ITER 10 > #define SORT_SIZE 10000 > void TestSort (void); > void InitSort (int gaiTab[]); > void DoSort(int gaiTab[]); > > // #define UNROLL 1 > > void main (void) > { > TestSort(); > } > > void TestSort (void) > { > int i; > int *aiTab; > clock_t clkStart, clkStop; > > printf("Testing Int -> Bubble sort "); > aiTab=(int *) malloc(SORT_SIZE*sizeof(int)); > clkStart=clock(); > #ifndef UNROLL > for (i=SORT_ITER; i>0; i--) // strictly should be i>0 > #endif > { > InitSort(aiTab); > DoSort(aiTab); > #ifdef UNROLL > InitSort(aiTab); > DoSort(aiTab); > InitSort(aiTab); > DoSort(aiTab); > InitSort(aiTab); > DoSort(aiTab); > InitSort(aiTab); > DoSort(aiTab); > InitSort(aiTab); > DoSort(aiTab); > InitSort(aiTab); > DoSort(aiTab); > InitSort(aiTab); > DoSort(aiTab); > InitSort(aiTab); > DoSort(aiTab); > InitSort(aiTab); > DoSort(aiTab); > #endif > } > clkStop=clock(); > printf("%d ms.\n", (((clkStop-clkStart)*1000)/CLK_TCK)); > free(aiTab); > > } > > void InitSort (int paiTab[]) > { > int iCont; > > for (iCont=SORT_SIZE; iCont>=0; iCont--) > paiTab[iCont]=SORT_SIZE-iCont; > } > > > void DoSort (int paiTab[]) > { > int Swap; > int Temp,I; > > do > { > Swap = 0; > for (I = 0; I<(SORT_SIZE-1); I++) // I=0..I<SORT_SIZE produces incorrect results > and reads and writes unallocated memory > if (paiTab[I] > paiTab[I+1]) > { > Temp = paiTab[I]; > paiTab[I] = paiTab[I+1]; > paiTab[I+1] = Temp; > Swap = 1; > } > } > while (Swap); > } > ======= > > > "Javier GutiƩrrez" <nikkho@nospam.hotmail.com> wrote in message news:a882ek$2atl$1@digitaldaemon.com... > > Above is the one generated by C++ Builder 6, and VC++ .NET. > > The only think I see is the offset calculation, Borland adds 4 to the > > offset, while DMC adds 1, and mul it in the Mov. As far as I know, it should > > result in the same speed... > > But in fact Borland code is faster, 6168 ms against 7390 ms for DMC. > > > > As VC++ .NET, it seems the loop has been unrolled... Maybe this is the > > great advantage from 3374 ms... > > Why DMC have not unrolled it? > > > > > > C++ Builder 6 > > -------------------------------------------------------------------------- > > > |
Copyright © 1999-2021 by the D Language Foundation