Bubble sort bechmark (page 2)

April 01, 2002
Re: Bubble sort bechmark
Posted by Walter
in reply to John Culver
Permalink
Walter
Posted in reply to John Culver
Permalink
Hmm. The cpu specs advertise that complex addressing modes don't add extra time. Perhaps this is not true. -Walter

"John Culver" <jculver@btinternet.spamless.com> wrote in message news:a8844q$2c0c$1@digitaldaemon.com...
> Hi,
>     From my testing I doubt the speed difference is in the code generated
in DoSort() :
>
>     When I compiled with the following 2 fixes:
>     in DoSort() loop should be :
>       for (I = 0; I<(SORT_SIZE-1); I++)      // I=0..I<SORT_SIZE produces
incorrect
> results and reads and writes unallocated memory which could introduce
unknown delays
>     and in TestSort() loop should be :
>       for (i=SORT_ITER; i>0; i--)  // strictly should be i>0  to generate
SORT_ITER
> loops rather than SORT_ITER+1
>
> I get the following results for my system : Athlon 1.33 / Win98 SE (yuk!)
/ DMC 8.25
>
> With the code as is (fixed) I get approx.  5500ms ()
> If I manually unroll the i loop in TestSort() the appropriate 10 time I
consistently
> get  execution times of only 4000ms !!!
>
> Note that DoSort() and InitSort() have 100% identical code in this case
(according to
> obj2asm).
>
> So the unrolled code is noticably more efficient, so it looks like the
identical and
> apparently efficent code in DoSort() is being stuffed up by something
else. Perhaps my
> Athlons instruction translation techniques are doing something very
different due to
> the context (remember it's NOT really an x86 processor - it is really a
risc86 faking
> it), perhaps it's caches are messed up by the CS or SP alignment ?? It
certainly
> doesn't look like the low speed is the compiler generating poor code, as
the same code
> generates 2 very different speeds in only subtly different contexts.
>
> Unfortunately, these days if you do a one task computational benchmark you
are more
> likely to discover some subtle feature of your processor, not of your
compiler.
>
>
> JohnC
> PS To clarify things : I love AMD products (well processors and chipsets)
>
>
> ======= int.c (revised, and with brutal UNROLL option - see #define UNROLL
...
> #include <stdio.h>
> #include <stdlib.h>
> #include <time.h>
>
> #define SORT_ITER 10
> #define SORT_SIZE 10000
> void TestSort (void);
> void InitSort (int gaiTab[]);
> void DoSort(int gaiTab[]);
>
> // #define UNROLL 1
>
> void main (void)
> {
>  TestSort();
> }
>
> void TestSort (void)
> {
>  int i;
>  int *aiTab;
>  clock_t clkStart, clkStop;
>
>  printf("Testing Int -> Bubble sort ");
>  aiTab=(int *) malloc(SORT_SIZE*sizeof(int));
>  clkStart=clock();
> #ifndef UNROLL
> for (i=SORT_ITER; i>0; i--)  // strictly should be i>0
> #endif
>     {
>   InitSort(aiTab);
>   DoSort(aiTab);
> #ifdef UNROLL
>   InitSort(aiTab);
>   DoSort(aiTab);
>   InitSort(aiTab);
>   DoSort(aiTab);
>   InitSort(aiTab);
>   DoSort(aiTab);
>   InitSort(aiTab);
>   DoSort(aiTab);
>   InitSort(aiTab);
>   DoSort(aiTab);
>   InitSort(aiTab);
>   DoSort(aiTab);
>   InitSort(aiTab);
>   DoSort(aiTab);
>   InitSort(aiTab);
>   DoSort(aiTab);
>   InitSort(aiTab);
>   DoSort(aiTab);
> #endif
>     }
>       clkStop=clock();
>       printf("%d ms.\n", (((clkStop-clkStart)*1000)/CLK_TCK));
>  free(aiTab);
>
> }
>
> void InitSort (int paiTab[])
> {
>    int iCont;
>
>    for (iCont=SORT_SIZE; iCont>=0; iCont--)
>     paiTab[iCont]=SORT_SIZE-iCont;
> }
>
>
> void DoSort (int paiTab[])
> {
>  int Swap;
>  int Temp,I;
>
>  do
>  {
>   Swap = 0;
>   for (I = 0; I<(SORT_SIZE-1); I++)      // I=0..I<SORT_SIZE produces
incorrect results
> and reads and writes unallocated memory
>   if (paiTab[I] > paiTab[I+1])
>   {
>    Temp = paiTab[I];
>    paiTab[I] = paiTab[I+1];
>    paiTab[I+1] = Temp;
>    Swap = 1;
>   }
>  }
>  while (Swap);
> }
> =======
>
>
> "Javier Gutiérrez" <nikkho@nospam.hotmail.com> wrote in message news:a882ek$2atl$1@digitaldaemon.com...
> >     Above is the one generated by C++ Builder 6, and VC++ .NET.
> >     The only think I see is the offset calculation, Borland adds 4 to
the
> > offset, while DMC adds 1, and mul it in the Mov. As far as I know, it
should
> > result in the same speed...
> >     But in fact Borland code is faster, 6168 ms against 7390 ms for DMC.
> >
> >     As VC++ .NET, it seems the loop has been unrolled... Maybe this is
the
> > great advantage from 3374 ms...
> >     Why DMC have not unrolled it?
> >
> >
> > C++ Builder 6
>
> --------------------------------------------------------------------------
>
>
>
Forums