Compiler optimizations (I'm baffled) (page 4)

Bruno Medeiros wrote: > Thomas Kuehne wrote: >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> Bruno Medeiros schrieb am 2006-05-03: >>> Walter Bright wrote: >>>> Craig Black wrote: >>>>> This is >>>>> because integer division is essentially floating point division under the >>>>> hood. >>> I ran these tests and I got basicly the same results (the int division is slower). I am very intrigued and confused. Can you (or someone else) explain briefly why this is so? >>> One would think it would be the other way around (float being slower) or at least the same speed. It's true. For Pentium, IDIV takes 46 cycles, while FDIV takes 39. For PPro, PII and PIII, DIV takes 39, while FDIV takes 38. Not much difference. However, for P4 (and probably Athlon XP is similar), FDIV has a latency of 43 cycles, while DIV has a latency of 50, plus it executes 21 microcodes! It's not quite a factor of two, but it's close. In short -- Intel killed integer division on the P4.

It looks to me that your double division is actually encoded as float division ie 32 bits. I am not too familiar with this syntax but i believe double precision would look like this: fldd LC0 fdivd -12(%ebp) fstpd -8(%ebp) I am not surprised that the FPU version is faster than the int one. You will probably find that the Intel version is not the same as the AMD one either. In article <e3co5o$jth$1@digitaldaemon.com>, Bruno Medeiros says... > >Thomas Kuehne wrote: >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> Bruno Medeiros schrieb am 2006-05-03: >>> Walter Bright wrote: >>>> Craig Black wrote: >>>>> This is >>>>> because integer division is essentially floating point division under the >>>>> hood. >>> I ran these tests and I got basicly the same results (the int division >>> is slower). I am very intrigued and confused. Can you (or someone else) >>> explain briefly why this is so? >>> One would think it would be the other way around (float being slower) or >>> at least the same speed. >> >> >> The code doesn't necessarily show that int division is slower than float multiplication. >> >> What CPU are we talking about? >> >> A naive interpretation of the "benchmark" assumes a single execution pipe that does floating point and integer operations in sequence ... >> >> Even assuming a single pipe: Why is the SSE version faster? >> >> Does the benchmark measure the speed of int division against float multiplication? >> >> Does the benchmark measure the throughput of int division against float multiplication? >> >> Does the benchmark measure the throughput of int division of a set of numbers through a constant factor against float multiplication of the same set through (1 / constant factor)? >> >> Thomas >> >> >> >> -----BEGIN PGP SIGNATURE----- >> >> iD8DBQFEWRDO3w+/yD4P9tIRAs8lAJ9q62J8zf8U0HWzxtxQmMWasuU4ngCgwA21 >> 4M5nb9Z8ZXHevJiwylY/wGM= >> =QSyS >> -----END PGP SIGNATURE----- > >Hum, yes I should have been more specific. I only ran (a modified >version of) the latest test, which measured the throughput of int >division against double division (I hope...). >Let me just put the code: > >#include <stdio.h> >#include <time.h> > >//typedef double divtype; >typedef int divtype; > >int main() >{ > clock_t start = clock(); > > > divtype result = 0; > divtype div=1; > > for(int max = 100000000; div < max; div++) > { > result = (42 / div); > } > > > clock_t finish = clock(); > double duration = (double)(finish - start) / CLOCKS_PER_SEC; > printf("[%f] %2.2f seconds\n", double(result),duration); >} > >------------------------------------ >I ran the tests with GCC, with both -O0 and -O2, on an Athlon XP, and it both cases the typedef double divtype version was about twice as fast. The assembly code I get for line 17 is the following: > >*** INT: > >.stabn 68,0,17,LM6-_main >LM6: > movl $42, %edx > movl %edx, %eax > sarl $31, %edx > idivl -12(%ebp) > movl %eax, -8(%ebp) > >*** DOUBLE: > >.stabn 68,0,17,LM6-_main >LM6: > flds LC0 > fdivs -12(%ebp) > fstps -8(%ebp) > > >I have little idea what it is that it's doing. > >-- >Bruno Medeiros - CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D

It depends how big the dividend and divisor are. Smaller values can take much less time than larger ones. In article <e3cqjr$sek$1@digitaldaemon.com>, Don Clugston says... > >Bruno Medeiros wrote: >> Thomas Kuehne wrote: >>> -----BEGIN PGP SIGNED MESSAGE----- >>> Hash: SHA1 >>> >>> Bruno Medeiros schrieb am 2006-05-03: >>>> Walter Bright wrote: >>>>> Craig Black wrote: >>>>>> This is >>>>>> because integer division is essentially floating point division >>>>>> under the >>>>>> hood. >>>> I ran these tests and I got basicly the same results (the int >>>> division is slower). I am very intrigued and confused. Can you (or >>>> someone else) explain briefly why this is so? >>>> One would think it would be the other way around (float being slower) >>>> or at least the same speed. > >It's true. > >For Pentium, IDIV takes 46 cycles, while FDIV takes 39. >For PPro, PII and PIII, DIV takes 39, while FDIV takes 38. >Not much difference. > >However, for P4 (and probably Athlon XP is similar), >FDIV has a latency of 43 cycles, while DIV has a latency of 50, plus it >executes 21 microcodes! > >It's not quite a factor of two, but it's close. > >In short -- Intel killed integer division on the P4.

I think I remember you posting a while ago that you have quite an old machine :) In article <e3bdub$1dsu$3@digitaldaemon.com>, Walter Bright says... > >On my machine, integer is still faster. > >int: 3.7 >double: 4.2 > >Craig Black wrote: >> This is probably because you have SSE optimizations disabled. >> This one works even without SSE. This shows that integer division is >> slower. Also, if you change the division to multiplication, you will notice >> that integer multiplication is faster, which is what you would expect. >> >> #include <stdio.h> >> #include <conio.h> >> #include <time.h> >> >> //typedef int divtype; >> typedef double divtype; // This one is faster >> >> int main() >> { >> divtype result = 0; >> >> clock_t start, finish; >> double duration; >> >> start = clock(); >> divtype max = 100000000; >> for(divtype div=1; div<max; div++) >> { >> divtype i = max - div; >> result += i / div; >> } >> finish = clock(); >> duration = (double)(finish - start) / CLOCKS_PER_SEC; >> >> printf("[%f] %2.1f seconds\n",double(result),duration); >> } >> >>

May 04, 2006

Re: Compiler optimizations (I'm baffled)

Posted by Don Clugston
in reply to pmoore

Permalink

Don Clugston

Posted in reply to pmoore

Permalink

pmoore wrote:
> It looks to me that your double division is actually encoded as float division
> ie 32 bits. I am not too familiar with this syntax but i believe double
> precision would look like this:
> 
> fldd	LC0
> fdivd	-12(%ebp)
> fstpd	-8(%ebp)


That would be faster, actually (provided it's aligned sensibly). The precision only applies to the width of the load from memory. To reduce precision of the division itself (which increases the speed), you have to set the machine status word.

> I am not surprised that the FPU version is faster than the int one. You will
> probably find that the Intel version is not the same as the AMD one either.

That's definitely true.

> 
> 
> In article <e3co5o$jth$1@digitaldaemon.com>, Bruno Medeiros says...
>> Thomas Kuehne wrote:
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA1
>>>
>>> Bruno Medeiros schrieb am 2006-05-03:
>>>> Walter Bright wrote:
>>>>> Craig Black wrote:
>>>>>>  This is
>>>>>> because integer division is essentially floating point division under the
>>>>>> hood.
>>>> I ran these tests and I got basicly the same results (the int division is slower). I am very intrigued and confused. Can you (or someone else) explain briefly why this is so?
>>>> One would think it would be the other way around (float being slower) or at least the same speed.
>>>
>>> The code doesn't necessarily show that int division is slower than float
>>> multiplication.
>>>
>>> What CPU are we talking about?
>>>
>>> A naive interpretation of the "benchmark" assumes a single execution
>>> pipe that does floating point and integer operations in sequence ...
>>>
>>> Even assuming a single pipe: Why is the SSE version faster?
>>>
>>> Does the benchmark measure the speed of int division against float
>>> multiplication? 
>>>
>>> Does the benchmark measure the throughput of int division against float
>>> multiplication? 
>>>
>>> Does the benchmark measure the throughput of int division of a set of
>>> numbers through a constant factor against float multiplication of the
>>> same set through (1 / constant factor)?
>>>
>>> Thomas
>>>
>>>
>>>
>>> -----BEGIN PGP SIGNATURE-----
>>>
>>> iD8DBQFEWRDO3w+/yD4P9tIRAs8lAJ9q62J8zf8U0HWzxtxQmMWasuU4ngCgwA21
>>> 4M5nb9Z8ZXHevJiwylY/wGM=
>>> =QSyS
>>> -----END PGP SIGNATURE-----
>> Hum, yes I should have been more specific. I only ran (a modified version of) the latest test, which measured the throughput of int division against double division (I hope...).
>> Let me just put the code:
>>
>> #include <stdio.h>
>> #include <time.h>
>>
>> //typedef double divtype;
>> typedef int divtype;
>>
>> int main()
>> {
>>    clock_t start = clock();
>>
>>
>>    divtype result = 0;
>>    divtype div=1;
>>
>>    for(int max = 100000000; div < max; div++)
>>    {
>>      result = (42 / div);
>>    }
>>
>>
>>    clock_t finish = clock();
>>    double duration = (double)(finish - start) / CLOCKS_PER_SEC;
>>    printf("[%f] %2.2f seconds\n", double(result),duration);
>> }
>>
>> ------------------------------------
>> I ran the tests with GCC, with both -O0 and -O2, on an Athlon XP, and it both cases the typedef double divtype version was about twice as fast. The assembly code I get for line 17 is the following:
>>
>> *** INT:
>>
>> .stabn 68,0,17,LM6-_main
>> LM6:
>> 	movl	$42, %edx
>> 	movl	%edx, %eax
>> 	sarl	$31, %edx
>> 	idivl	-12(%ebp)
>> 	movl	%eax, -8(%ebp)
>>
>> *** DOUBLE:
>>
>> .stabn 68,0,17,LM6-_main
>> LM6:
>> 	flds	LC0
>> 	fdivs	-12(%ebp)
>> 	fstps	-8(%ebp)
>>
>>
>> I have little idea what it is that it's doing.
>>
>> -- 
>> Bruno Medeiros - CS/E student
>> http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
> 
>

Forums