Thread overview | ||||||||
---|---|---|---|---|---|---|---|---|
|
November 11, 2008 Branch Prediction strange results | ||||
---|---|---|---|---|
| ||||
I have found an interesting small article about optimization, so I've tried the code in C and D, and I have found strange results (the D code shows timings opposite of the article). This is the article, look at the "Branch Prediction" section: http://www.ddj.com/184405848 The C code: http://codepad.org/QSGIije4 And its asm (MinGW 4.2.1): http://codepad.org/c7ZRiXGI The similar D code: http://codepad.org/slhcSJEA Its asm (DMD 1.036): http://codepad.org/AjlraEs9 There is also about 2X performance difference. Bye, bearophile |
November 12, 2008 Re: Branch Prediction strange results | ||||
---|---|---|---|---|
| ||||
Posted in reply to bearophile | if (i % 4 == 1) { if (i % 4 == 0) { counter1++; } else { counter2++; } } else { if (i % 4 == 2) { counter3++; } else { counter4++; } } this is incorrect |
November 12, 2008 Re: Branch Prediction strange results | ||||
---|---|---|---|---|
| ||||
Posted in reply to Kagamin | You didn't run your code, lol. |
November 12, 2008 Re: Branch Prediction strange results | ||||
---|---|---|---|---|
| ||||
Posted in reply to Kagamin | Kagamin: > this is incorrect Thank you for spotting the silly bug, I'll fix it now. (It seems it's more easy to leave bugs in such kind of code because it does nothing useful). But note that in both programs: #define FIRST static if (1) { So the first part only is run in both D and C code, not the wrong one... So the code is like this: #include "stdio.h" int main() { int counter0 = 0, counter1 = 0, counter2 = 0, counter3 = 0; int i = 300000000; while (i--) { // 0.63 s if (i % 4 == 0) { counter0++; } else if (i % 4 == 1) { counter1++; } else if (i % 4 == 2) { counter2++; } else { counter3++; } } printf("%d %d %d %d\n", counter0, counter1, counter2, counter3); return 0; } So the problem and timings of the first part I have shown are correct still :-) Bye, bearophile |
November 12, 2008 Re: Branch Prediction strange results | ||||
---|---|---|---|---|
| ||||
Posted in reply to bearophile | bearophile wrote:
> I have found an interesting small article about optimization, so I've tried the code in C and D, and I have found strange results (the D code shows timings opposite of the article).
> This is the article, look at the "Branch Prediction" section:
> http://www.ddj.com/184405848
>
> The C code:
> http://codepad.org/QSGIije4
> And its asm (MinGW 4.2.1):
> http://codepad.org/c7ZRiXGI
>
> The similar D code:
> http://codepad.org/slhcSJEA
> Its asm (DMD 1.036):
> http://codepad.org/AjlraEs9
>
> There is also about 2X performance difference.
>
> Bye,
> bearophile
Are you running it on a Pentium 4? Pentium 4 has *horrific* branch misprediction (minimum 24 cycles, 45 uops). No other processor is nearly as bad, eg it's 15 cycles on Core2; it was just 4 cycles on PMMX.
|
November 12, 2008 Re: Branch Prediction strange results | ||||
---|---|---|---|---|
| ||||
Posted in reply to Don | Don: > Are you running it on a Pentium 4? Pentium 4 has *horrific* branch misprediction (minimum 24 cycles, 45 uops). No other processor is nearly as bad, eg it's 15 cycles on Core2; it was just 4 cycles on PMMX. Sorry, I am using a Core2 @ 2GHz. The fixed C code with timings: #include "stdio.h" //#define FIRST int main() { int counter0 = 0, counter1 = 0, counter2 = 0, counter3 = 0; int i = 300000000; while (i--) { #ifdef FIRST // 0.63 s if (i % 4 == 0) { counter0++; } else if (i % 4 == 1) { counter1++; } else if (i % 4 == 2) { counter2++; } else { counter3++; } #else // 0.66 s if (i & 2) { if (i & 1) { counter3++; } else { counter2++; } } else { if (i & 1) { counter1++; } else { counter0++; } } #endif } printf("%d %d %d %d\n", counter0, counter1, counter2, counter3); return 0; } Fixed D code with timings: void main() { int counter0, counter1, counter2, counter3; int i = 300000000; while (i--) static if (0) { // 1.24 s if (i % 4 == 0) { counter0++; } else if (i % 4 == 1) { counter1++; } else if (i % 4 == 2) { counter2++; } else { counter3++; } } else { // 1.01 s if (i & 2) { if (i & 1) { counter3++; } else { counter2++; } } else { if (i & 1) { counter1++; } else { counter0++; } } } printf("%d %d %d %d\n", counter0, counter1, counter2, counter3); } As you can see the C version (GCC 4.2.1-dw2) is twice faster than the D one, and it shows the scan as faster than the binary search, as says the article I have linked. Bye, bearophile |
Copyright © 1999-2021 by the D Language Foundation