Thread overview | |||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
April 30, 2006 Compiler optimizations | ||||
---|---|---|---|---|
| ||||
So I've been playing around with the pi program. I ported it to C++ and I've been doing some benchmarks with MSVC. It seems that MSVC is a little faster when SSE2 instructions are enabled. It is slower when they are disabled. Which leads me to a question. Does the D backend make use of SSE 2 intstructions? I think the answer is yes but perhaps the optimizer is not quite as sophisticated as MSVC yet. Another question for someone who knows a lot about backend optimizations. I've noticed that integer division is faster when the the divisor is a constant. How does the compiler optimize this? Another thing. I know that integer division is essentially a floating point operation because there is no integer division instruction. I also read some on the internet and learned that Intel has a library that provides fixed-point math routines, which provides a division operation that outperforms floating-point division. Then I thought to myself, "Why don't compilers just use fixed-point division for dividing integers since it's faster?" I know some of you out there know way more than me so please all you smart people give me some input. -Craig |
April 30, 2006 Re: Compiler optimizations | ||||
---|---|---|---|---|
| ||||
Posted in reply to Craig Black | > Another thing. I know that integer division is essentially a floating point > operation because there is no integer division instruction. and what is "idiv" for you? (http://www.jegerlehner.ch/intel/opcode.html) > I also read > some on the internet and learned that Intel has a library that provides > fixed-point math routines, which provides a division operation that > outperforms floating-point division. Then I thought to myself, "Why don't > compilers just use fixed-point division for dividing integers since it's > faster?" a fixpoint floating somethimes outperforms an floating-point operating (with loss of precession) but it never outperforms an true integer division ciao dennis |
April 30, 2006 Re: Compiler optimizations | ||||
---|---|---|---|---|
| ||||
Posted in reply to dennis luehring | "dennis luehring" <dl.soluz@gmx.net> wrote in message news:e32c0m$1dsu$1@digitaldaemon.com... > > Another thing. I know that integer division is essentially a floating point > > operation because there is no integer division instruction. > > and what is "idiv" for you? > (http://www.jegerlehner.ch/intel/opcode.html) Please anyone, correct me if I'm wrong, but I think IDIV is a pseudo-instruction. 80x86 is notorious for those (as opposed to RISC). I read on the Intel website that intel processors do not have a hardware integer division instruction. Apparently it is somehow better to use the floating point instruction instead. The benchmarks that I've done are constistent with this. I found it was faster to perform a floating-point multiplication on an integer and then convert it back to an integer than just a single integer division. > > > I also read > > some on the internet and learned that Intel has a library that provides > > fixed-point math routines, which provides a division operation that > > outperforms floating-point division. Then I thought to myself, "Why don't > > compilers just use fixed-point division for dividing integers since it's faster?" > > a fixpoint floating somethimes outperforms an floating-point operating > (with loss of precession) > > but it never outperforms an true integer division > > ciao dennis Have you run benchmarks? -Craig |
April 30, 2006 Re: Compiler optimizations | ||||
---|---|---|---|---|
| ||||
Posted in reply to Craig Black | > Please anyone, correct me if I'm wrong, but I think IDIV is a > pseudo-instruction. 80x86 is notorious for those (as opposed to RISC). I > read on the Intel website that intel processors do not have a hardware > integer division instruction. Apparently it is somehow better to use the > floating point instruction instead. i don't know the truth - but why should all the compiler vendors around us (microsoft, intel, ...) produce the slower code if there is an faster solution? what is the reason (maybe it produce lager code - it this will produce more cache problems...) > The benchmarks that I've done are constistent with this. I found it was > faster to perform a floating-point multiplication on an integer and then > convert it back to an integer than just a single integer division. can you post your benchmark source? > Have you run benchmarks? no - but i can run your benchmark... |
April 30, 2006 Re: Compiler optimizations | ||||
---|---|---|---|---|
| ||||
Posted in reply to dennis luehring | "dennis luehring" <dl.soluz@gmx.net> wrote in message news:e32dfl$1f53$1@digitaldaemon.com... > > Please anyone, correct me if I'm wrong, but I think IDIV is a pseudo-instruction. 80x86 is notorious for those (as opposed to RISC). I > > read on the Intel website that intel processors do not have a hardware integer division instruction. Apparently it is somehow better to use the > > floating point instruction instead. > > i don't know the truth - but why should all the compiler vendors around us (microsoft, intel, ...) produce the slower code if there is an faster solution? what is the reason (maybe it produce lager code - it this will produce more cache problems...) I'm not saying that there is a faster solution, but there might be. Mainly, I'm just trying to learn more so that I will be better at optimizing my code. > > The benchmarks that I've done are constistent with this. I found it was faster to perform a floating-point multiplication on an integer and then convert it back to an integer than just a single integer division. > > can you post your benchmark source? > > > Have you run benchmarks? > > no - but i can run your benchmark... These two functions show the difference that I'm talking about. When benchmarking, pass in the same values for the divisor (pick a value between 2 and 30), and pass in a large enough value for total so that it will take some time for the CPU to finish. Suprisingly, the second one is faster. -Craig int divTest1(int divisor, int total) { int sum = 0; for(int i = 0; i < total; i++) { int quotient = i / divisor; sum += quotient; } } int divTest2(int divisor, int total) { int sum = 0; double idiv = 1.0 / divisor; for(int i = 0; i < total; i++) { int quotient = i * idiv; sum += quotient; } } |
April 30, 2006 Re: Compiler optimizations | ||||
---|---|---|---|---|
| ||||
Posted in reply to Craig Black | This doesn't really surprise me. If I am reading it correctly, the major difference comes from only doing the division once. If you did the division every time, the results might not be the same.
Craig Black wrote:
> These two functions show the difference that I'm talking about. When benchmarking, pass in the same values for the divisor (pick a value between 2 and 30), and pass in a large enough value for total so that it will take some time for the CPU to finish. Suprisingly, the second one is faster.
>
> -Craig
>
> int divTest1(int divisor, int total)
> {
> int sum = 0;
> for(int i = 0; i < total; i)
> {
> int quotient = i / divisor;
> sum = quotient;
> }
> }
>
> int divTest2(int divisor, int total)
> {
> int sum = 0;
> double idiv = 1.0 / divisor;
> for(int i = 0; i < total; i)
> {
> int quotient = i * idiv;
> sum = quotient;
> }
> }
|
April 30, 2006 Re: Compiler optimizations | ||||
---|---|---|---|---|
| ||||
Posted in reply to Craig Black | small change
> int divTest2(int divisor, int total)
> {
> int sum = 0;
> for(int i = 0; i < total; i++)
> {
> int quotient = i * ( 1.0 / divisor ); // !!!!!!!!
> sum += quotient;
> }
> }
|
April 30, 2006 Re: Compiler optimizations | ||||
---|---|---|---|---|
| ||||
Posted in reply to dennis luehring |
>
> > int divTest2(int divisor, int total)
> > {
> > int sum = 0;
> > for(int i = 0; i < total; i++)
> > {
> > int quotient = i * ( 1.0 / divisor ); // !!!!!!!!
> > sum += quotient;
> > }
> > }
I don't see what you are trying to say here.
-Craig
|
April 30, 2006 Re: Compiler optimizations | ||||
---|---|---|---|---|
| ||||
Posted in reply to James Pelcis | > This doesn't really surprise me. If I am reading it correctly, the major difference comes from only doing the division once. If you did the division every time, the results might not be the same.
I don't think you realize what is happening here ... we are dealing with integer division vs. floating point multiplication. With the floating point multiplication we have
1) conversion from int to a floating point
2) the floating point multiplication
3) convertion from floating point back to int
Although CPUs have gotten faster at converting to and from floating points, the conversion is not trivial to performance. What's suprising is that these three operations are faster than a singe integer division. This is because integer division is essentially floating point division under the hood.
-Craig
|
April 30, 2006 Re: Compiler optimizations | ||||
---|---|---|---|---|
| ||||
Posted in reply to dennis luehring | If you want to get a direct comparison between floating point and integer division, then use
> > int divTest2(double divisor, int total)
> > {
> > int sum = 0;
> > for(int i = 0; i < total; i++)
> > {
> > int quotient = i / divisor; // !!!!!!!!
> > sum += quotient;
> > }
> > }
|
Copyright © 1999-2021 by the D Language Foundation