Compiler optimizations (page 3) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Compiler optimizations (page 3)

May 02, 2006

Re: Compiler optimizations

Posted by Walter Bright
in reply to Craig Black

Walter Bright

Posted in reply to Craig Black

Craig Black wrote:
> These two functions show the difference that I'm talking about.  When
> benchmarking, pass in the same values for the divisor (pick a value between
> 2 and 30), and pass in a large enough value for total so that it will take
> some time for the CPU to finish.  Suprisingly, the second one is faster.

Multiplication is a much simpler operation than division, which is why it's faster. Secondly, the reason compilers don't do the optimization below is that it gives different results (different rounding).

If you're really interested in optimization, I suggest using a disassembler on the object files, it will answer a lot of questions.

> int divTest1(int divisor, int total)
> {
>   int sum = 0;
>   for(int i = 0; i < total; i++)
>   {
>     int quotient = i / divisor;
>     sum += quotient;
>   }
> }
> 
> int divTest2(int divisor, int total)
> {
>   int sum = 0;
>   double idiv = 1.0 / divisor;
>   for(int i = 0; i < total; i++)
>   {
>     int quotient = i * idiv;
>     sum += quotient;
>   }
> }

May 02, 2006

Re: Compiler optimizations

Posted by Walter Bright
in reply to Craig Black

Walter Bright

Posted in reply to Craig Black

Craig Black wrote:
>  This is
> because integer division is essentially floating point division under the
> hood.

Not exactly. What is true is that both kinds of division are based on the same 'shift and subtract' method (the same algorithm taught in 3rd grade for doing long division). If you're really interested in exactly how integer and floating point division work, get the Digital Mars CD which has complete source code to an IEEE 754 floating point emulator, as well as code to do integer division for long's.

www.digitalmars.com/shop.html

May 03, 2006

Re: Compiler optimizations (I'm baffled)

Posted by Bruno Medeiros
in reply to Walter Bright

Bruno Medeiros

Posted in reply to Walter Bright

Walter Bright wrote:
> Craig Black wrote:
>>  This is
>> because integer division is essentially floating point division under the
>> hood.
> 
> Not exactly. What is true is that both kinds of division are based on the same 'shift and subtract' method (the same algorithm taught in 3rd grade for doing long division). If you're really interested in exactly how integer and floating point division work, get the Digital Mars CD which has complete source code to an IEEE 754 floating point emulator, as well as code to do integer division for long's.
> 
> www.digitalmars.com/shop.html

I ran these tests and I got basicly the same results (the int division is slower). I am very intrigued and confused. Can you (or someone else) explain briefly why this is so?
One would think it would be the other way around (float being slower) or at least the same speed.

-- 
Bruno Medeiros - CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D

May 03, 2006

Re: Compiler optimizations (I'm baffled)

Posted by Walter Bright
in reply to Bruno Medeiros

Walter Bright

Posted in reply to Bruno Medeiros

Bruno Medeiros wrote:
> I ran these tests and I got basicly the same results (the int division is slower). I am very intrigued and confused. Can you (or someone else) explain briefly why this is so?
> One would think it would be the other way around (float being slower) or at least the same speed.

I don't know. I'm surprised at such results. Perhaps it's because Intel has put a lot of effort into their FPU they haven't into integer math.

May 03, 2006

Re: Compiler optimizations (I'm baffled)

Posted by Thomas Kuehne
in reply to Bruno Medeiros

Thomas Kuehne

Posted in reply to Bruno Medeiros

Attachments:

pgp.sig

Bruno Medeiros schrieb am 2006-05-03:
> Walter Bright wrote:
>> Craig Black wrote:
>>>  This is
>>> because integer division is essentially floating point division under the
>>> hood.
>
> I ran these tests and I got basicly the same results (the int division
> is slower). I am very intrigued and confused. Can you (or someone else)
> explain briefly why this is so?
> One would think it would be the other way around (float being slower) or
> at least the same speed.

The code doesn't necessarily show that int division is slower than float multiplication.

What CPU are we talking about?

A naive interpretation of the "benchmark" assumes a single execution pipe that does floating point and integer operations in sequence ...

Even assuming a single pipe: Why is the SSE version faster?

Does the benchmark measure the speed of int division against float multiplication?

Does the benchmark measure the throughput of int division against float multiplication?

Does the benchmark measure the throughput of int division of a set of numbers through a constant factor against float multiplication of the same set through (1 / constant factor)?

Thomas

May 03, 2006

Re: Compiler optimizations (I'm baffled)

Posted by Dave
in reply to Bruno Medeiros

Dave

Posted in reply to Bruno Medeiros

Bruno Medeiros wrote:
> Walter Bright wrote:
>> Craig Black wrote:
>>>  This is
>>> because integer division is essentially floating point division under the
>>> hood.
>>
>> Not exactly. What is true is that both kinds of division are based on the same 'shift and subtract' method (the same algorithm taught in 3rd grade for doing long division). If you're really interested in exactly how integer and floating point division work, get the Digital Mars CD which has complete source code to an IEEE 754 floating point emulator, as well as code to do integer division for long's.
>>
>> www.digitalmars.com/shop.html
> 
> I ran these tests and I got basicly the same results (the int division is slower). I am very intrigued and confused. Can you (or someone else) explain briefly why this is so?
> One would think it would be the other way around (float being slower) or at least the same speed.
> 

If you take a look at the assembly for div(), idiv is being executed twice - once for the division and once for the modulo. If you replace the modulo with the explicit calculation then the perf. will improve a lot because imul & sub is faster than idiv.

It'd be nice if dmd would do this optimization for us because calculating the integer quotient and subsequent remainder for the same dividend and divisor would seem to be pretty common operations.

May 03, 2006

Re: Compiler optimizations (I'm baffled)

Posted by Dave
in reply to Dave

Dave

Posted in reply to Dave

Dave wrote:
> Bruno Medeiros wrote:
>> Walter Bright wrote:
>>> Craig Black wrote:
>>>>  This is
>>>> because integer division is essentially floating point division under the
>>>> hood.
>>>
>>> Not exactly. What is true is that both kinds of division are based on the same 'shift and subtract' method (the same algorithm taught in 3rd grade for doing long division). If you're really interested in exactly how integer and floating point division work, get the Digital Mars CD which has complete source code to an IEEE 754 floating point emulator, as well as code to do integer division for long's.
>>>
>>> www.digitalmars.com/shop.html
>>
>> I ran these tests and I got basicly the same results (the int division is slower). I am very intrigued and confused. Can you (or someone else) explain briefly why this is so?
>> One would think it would be the other way around (float being slower) or at least the same speed.
>>
> 
> If you take a look at the assembly for div(), idiv is being executed twice - once for the division and once for the modulo. If you replace the modulo with the explicit calculation then the perf. will improve a lot because imul & sub is faster than idiv.
>

Obviously I was talking about the original pi.d implementation that's using the int modulo operator in the div() function:

quotient = b / divisor;
//remainder = b % divisor;
remainder = b - quotient * divisor; // a lot faster

> It'd be nice if dmd would do this optimization for us because calculating the integer quotient and subsequent remainder for the same dividend and divisor would seem to be pretty common operations.

Don't get me wrong - dmd does great for integer stuff IMHO, this is just one small issue I've noticed...

May 03, 2006

Re: Compiler optimizations

Posted by Walter Bright
in reply to Craig Black

Walter Bright

Posted in reply to Craig Black

On my machine, the int version is 1.7 seconds, the double is 3.2. I am unable to reproduce your results.

Craig Black wrote:
>> The second one is faster because you cheat.
> 
> Nope. Try this one. On my computer the floating point division is twice as
> fast.  I believe this is due to the overhead on converting the divisor from
> int to double before performing the division.
> 
> #include <stdio.h>
> #include <conio.h>
> #include <time.h>
> 
> //typedef int divtype;
> typedef double divtype; // this one is faster
> 
> int main()
> {
>    int result = 0;
> 
>    clock_t start, finish;
>    double  duration;
> 
>    start = clock();
>    for(divtype div=1; div<10000; div++)
>    {
>       for(int i=0; i<10000; i++)
>       {
>         result += i / div;
>       }
>    }
>    finish = clock();
>    duration = (double)(finish - start) / CLOCKS_PER_SEC;
> 
>    printf("[%i] %2.1f seconds\n",result,duration);
> }
> 
>

May 03, 2006

Re: Compiler optimizations

Posted by Walter Bright
in reply to Craig Black

Walter Bright

Posted in reply to Craig Black

On my machine, integer is still faster.

int: 3.7
double: 4.2

Craig Black wrote:
> This is probably because you have SSE optimizations disabled.
> This one works even without SSE.  This shows that integer division is
> slower.  Also, if you change the division to multiplication, you will notice
> that integer multiplication is faster, which is what you would expect.
> 
> #include <stdio.h>
> #include <conio.h>
> #include <time.h>
> 
> //typedef int divtype;
> typedef double divtype; // This one is faster
> 
> int main()
> {
>    divtype result = 0;
> 
>    clock_t start, finish;
>    double  duration;
> 
>    start = clock();
>    divtype max = 100000000;
>    for(divtype div=1; div<max; div++)
>    {
>      divtype i = max - div;
>      result += i / div;
>    }
>    finish = clock();
>    duration = (double)(finish - start) / CLOCKS_PER_SEC;
> 
>    printf("[%f] %2.1f seconds\n",double(result),duration);
> }
> 
>

May 04, 2006

Re: Compiler optimizations (I'm baffled)

Posted by Bruno Medeiros
in reply to Thomas Kuehne

Bruno Medeiros

Posted in reply to Thomas Kuehne

Thomas Kuehne wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Bruno Medeiros schrieb am 2006-05-03:
>> Walter Bright wrote:
>>> Craig Black wrote:
>>>>  This is
>>>> because integer division is essentially floating point division under the
>>>> hood.
>> I ran these tests and I got basicly the same results (the int division is slower). I am very intrigued and confused. Can you (or someone else) explain briefly why this is so?
>> One would think it would be the other way around (float being slower) or at least the same speed.
> 
> 
> The code doesn't necessarily show that int division is slower than float
> multiplication.
> 
> What CPU are we talking about?
> 
> A naive interpretation of the "benchmark" assumes a single execution
> pipe that does floating point and integer operations in sequence ...
> 
> Even assuming a single pipe: Why is the SSE version faster?
> 
> Does the benchmark measure the speed of int division against float
> multiplication? 
> 
> Does the benchmark measure the throughput of int division against float
> multiplication? 
> 
> Does the benchmark measure the throughput of int division of a set of
> numbers through a constant factor against float multiplication of the
> same set through (1 / constant factor)?
> 
> Thomas
> 
> 
> 
> -----BEGIN PGP SIGNATURE-----
> 
> iD8DBQFEWRDO3w+/yD4P9tIRAs8lAJ9q62J8zf8U0HWzxtxQmMWasuU4ngCgwA21
> 4M5nb9Z8ZXHevJiwylY/wGM=
> =QSyS
> -----END PGP SIGNATURE-----

Hum, yes I should have been more specific. I only ran (a modified version of) the latest test, which measured the throughput of int division against double division (I hope...).
Let me just put the code:

#include <stdio.h>
#include <time.h>

//typedef double divtype;
typedef int divtype;

int main()
{
   clock_t start = clock();


   divtype result = 0;
   divtype div=1;

   for(int max = 100000000; div < max; div++)
   {
     result = (42 / div);
   }


   clock_t finish = clock();
   double duration = (double)(finish - start) / CLOCKS_PER_SEC;
   printf("[%f] %2.2f seconds\n", double(result),duration);
}

------------------------------------
I ran the tests with GCC, with both -O0 and -O2, on an Athlon XP, and it both cases the typedef double divtype version was about twice as fast. The assembly code I get for line 17 is the following:

*** INT:

.stabn 68,0,17,LM6-_main
LM6:
	movl	$42, %edx
	movl	%edx, %eax
	sarl	$31, %edx
	idivl	-12(%ebp)
	movl	%eax, -8(%ebp)

*** DOUBLE:

.stabn 68,0,17,LM6-_main
LM6:
	flds	LC0
	fdivs	-12(%ebp)
	fstps	-8(%ebp)


I have little idea what it is that it's doing.

-- 
Bruno Medeiros - CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation