April 01, 2002
Hmm. The cpu specs advertise that complex addressing modes don't add extra time. Perhaps this is not true. -Walter

"John Culver" <jculver@btinternet.spamless.com> wrote in message news:a8844q$2c0c$1@digitaldaemon.com...
> Hi,
>     From my testing I doubt the speed difference is in the code generated
in DoSort() :
>
>     When I compiled with the following 2 fixes:
>     in DoSort() loop should be :
>       for (I = 0; I<(SORT_SIZE-1); I++)      // I=0..I<SORT_SIZE produces
incorrect
> results and reads and writes unallocated memory which could introduce
unknown delays
>     and in TestSort() loop should be :
>       for (i=SORT_ITER; i>0; i--)  // strictly should be i>0  to generate
SORT_ITER
> loops rather than SORT_ITER+1
>
> I get the following results for my system : Athlon 1.33 / Win98 SE (yuk!)
/ DMC 8.25
>
> With the code as is (fixed) I get approx.  5500ms ()
> If I manually unroll the i loop in TestSort() the appropriate 10 time I
consistently
> get  execution times of only 4000ms !!!
>
> Note that DoSort() and InitSort() have 100% identical code in this case
(according to
> obj2asm).
>
> So the unrolled code is noticably more efficient, so it looks like the
identical and
> apparently efficent code in DoSort() is being stuffed up by something
else. Perhaps my
> Athlons instruction translation techniques are doing something very
different due to
> the context (remember it's NOT really an x86 processor - it is really a
risc86 faking
> it), perhaps it's caches are messed up by the CS or SP alignment ?? It
certainly
> doesn't look like the low speed is the compiler generating poor code, as
the same code
> generates 2 very different speeds in only subtly different contexts.
>
> Unfortunately, these days if you do a one task computational benchmark you
are more
> likely to discover some subtle feature of your processor, not of your
compiler.
>
>
> JohnC
> PS To clarify things : I love AMD products (well processors and chipsets)
>
>
> ======= int.c (revised, and with brutal UNROLL option - see #define UNROLL
...
> #include <stdio.h>
> #include <stdlib.h>
> #include <time.h>
>
> #define SORT_ITER 10
> #define SORT_SIZE 10000
> void TestSort (void);
> void InitSort (int gaiTab[]);
> void DoSort(int gaiTab[]);
>
> // #define UNROLL 1
>
> void main (void)
> {
>  TestSort();
> }
>
> void TestSort (void)
> {
>  int i;
>  int *aiTab;
>  clock_t clkStart, clkStop;
>
>  printf("Testing Int -> Bubble sort ");
>  aiTab=(int *) malloc(SORT_SIZE*sizeof(int));
>  clkStart=clock();
> #ifndef UNROLL
> for (i=SORT_ITER; i>0; i--)  // strictly should be i>0
> #endif
>     {
>   InitSort(aiTab);
>   DoSort(aiTab);
> #ifdef UNROLL
>   InitSort(aiTab);
>   DoSort(aiTab);
>   InitSort(aiTab);
>   DoSort(aiTab);
>   InitSort(aiTab);
>   DoSort(aiTab);
>   InitSort(aiTab);
>   DoSort(aiTab);
>   InitSort(aiTab);
>   DoSort(aiTab);
>   InitSort(aiTab);
>   DoSort(aiTab);
>   InitSort(aiTab);
>   DoSort(aiTab);
>   InitSort(aiTab);
>   DoSort(aiTab);
>   InitSort(aiTab);
>   DoSort(aiTab);
> #endif
>     }
>       clkStop=clock();
>       printf("%d ms.\n", (((clkStop-clkStart)*1000)/CLK_TCK));
>  free(aiTab);
>
> }
>
> void InitSort (int paiTab[])
> {
>    int iCont;
>
>    for (iCont=SORT_SIZE; iCont>=0; iCont--)
>     paiTab[iCont]=SORT_SIZE-iCont;
> }
>
>
> void DoSort (int paiTab[])
> {
>  int Swap;
>  int Temp,I;
>
>  do
>  {
>   Swap = 0;
>   for (I = 0; I<(SORT_SIZE-1); I++)      // I=0..I<SORT_SIZE produces
incorrect results
> and reads and writes unallocated memory
>   if (paiTab[I] > paiTab[I+1])
>   {
>    Temp = paiTab[I];
>    paiTab[I] = paiTab[I+1];
>    paiTab[I+1] = Temp;
>    Swap = 1;
>   }
>  }
>  while (Swap);
> }
> =======
>
>
> "Javier GutiƩrrez" <nikkho@nospam.hotmail.com> wrote in message news:a882ek$2atl$1@digitaldaemon.com...
> >     Above is the one generated by C++ Builder 6, and VC++ .NET.
> >     The only think I see is the offset calculation, Borland adds 4 to
the
> > offset, while DMC adds 1, and mul it in the Mov. As far as I know, it
should
> > result in the same speed...
> >     But in fact Borland code is faster, 6168 ms against 7390 ms for DMC.
> >
> >     As VC++ .NET, it seems the loop has been unrolled... Maybe this is
the
> > great advantage from 3374 ms...
> >     Why DMC have not unrolled it?
> >
> >
> > C++ Builder 6
>
> --------------------------------------------------------------------------
>
>
>


1 2
Next ›   Last »