Loop optimization (page 4)

Walter Bright wrote: > Don wrote: >> bearophile wrote: >>> kai: >>>> Any ideas? Am I somehow not hitting a vital compiler optimization? >>> >>> DMD compiler doesn't perform many optimizations, especially on floating point computations. >> >> More precisely: >> In terms of optimizations performed, DMD isn't too far behind gcc. But it performs almost no optimization on floating point. Also, the inliner doesn't yet support the newer D features (this won't be hard to fix) and the scheduler is based on Pentium1. > > Have to be careful when talking about floating point optimizations. For example, > > x/c => x * 1/c > > is not done because of roundoff error. Also, > > 0 * x => 0 > > is also not done because it is not a correct replacement if x is a NaN. The most glaring limitation of the FP optimiser is that it seems to never keep values in the FP stack. So that it will often do: FSTP x FLD x instead of FST x Fixing this would probably give a speedup of ~20% on almost all FP code, and would unlock the path to further optimisation.

On Fri, 14 May 2010 12:40:52 -0400, bearophile <bearophileHUGS@lycos.com> wrote: > Steven Schveighoffer: >> In C/C++, the default value for doubles is 0. > > I think in C and C++ the default value for doubles is "uninitialized" (that is anything). You are probably right. All I did to figure this out is print out the first element of the array in my C++ version of kai's code. So it may be arbitrarily set to 0. -Steve

Hello Don, > The most glaring limitation of the FP optimiser is that it seems to > never keep values in the FP stack. So that it will often do: > FSTP x > FLD x > instead of FST x > Fixing this would probably give a speedup of ~20% on almost all FP > code, and would unlock the path to further optimisation. Does DMD have the ground work for doing FP keyhole optimizations? That sound like an easy one. -- ... <IXOYE><

bearophile wrote: > So I have added an extra "unsafe floating point" optimization: > > ldc -O3 -release -inline -enable-unsafe-fp-math -output-s test In my view, such switches are bad news, because: 1. very few people understand the issues regarding wrong floating point optimizations 2. even those that do, are faced with a switch that doesn't really define what unsafe fp optimizations it is doing, so there's no way to tell how it affects their code 3. the behavior of such a switch may change over time, breaking one's carefully written code 4. most of those optimizations can be done by hand if you want to, meaning that then their behavior will be reliable, portable and correct for your application 5. in my experience with such switches, almost nobody uses them, and the few that do use them wrongly 6. they add clutter, complexity, confusion and errors to the documentation 7. they use it, their code doesn't work correctly, they blame the compiler/language and waste the time of the tech support people

May 17, 2010

Re: Loop optimization

Posted by bearophile
in reply to Walter Bright

Permalink

bearophile

Posted in reply to Walter Bright

Permalink

Walter Bright:

>In my view, such switches are bad news, because:<

The Intel compiler, Microsoft compiler, GCC and LLVM have a similar switch (fp:fast in the Microsoft compiler, -ffast-math on GCC, etc). So you might send your list of comments to the devs of each of those four compilers.

I have used the "unsafe fp" switch in LDC to run faster my small raytracers, with good results. So I use it now and then where max precision is not important and small errors are not going to ruin the output.

I have asked the LLVM head developer to improve this optimization on LLVM, because in my opinion it's not aggressive enough, to put LLVM on par with GCC. So LDC too will probably get better on this, in future. This unsafe optimization is off on default, so if you don't like it you can avoid it. Its presence in LDC has caused zero problems to me so far in LDC (because when I need safer/more precise results I don't use it).


>4. most of those optimizations can be done by hand if you want to, meaning that then their behavior will be reliable, portable and correct for your application<

This is true for any optimization.

Bye,
bearophile

bearophile wrote: > Walter Bright: > >> In my view, such switches are bad news, because:< > > The Intel compiler, Microsoft compiler, GCC and LLVM have a similar switch > (fp:fast in the Microsoft compiler, -ffast-math on GCC, etc). So you might > send your list of comments to the devs of each of those four compilers. If I agreed with everything other vendors did with their compilers, I wouldn't have built my own <g>.

On 05/17/2010 01:15 AM, Walter Bright wrote: > bearophile wrote: >> DMD compiler doesn't perform many optimizations, > > This is simply false. DMD does an excellent job with integer and pointer operations. It does a so-so job with floating point. Interesting to note, relative to my earlier experience with D vs. C++ speed: http://www.digitalmars.com/pnews/read.php?server=news.digitalmars.com&group=digitalmars.D.learn&artnum=19567 I'll have to try and put together a no-floating-point bit of code to make a comparison. Best wishes, -- Joe

kai wrote: > Here is a boiled down test case: > > void main (string[] args) > { > double [] foo = new double [cast(int)1e6]; > for (int i=0;i<1e3;i++) > { > for (int j=0;j<1e6-1;j++) > { > foo[j]=foo[j]+foo[j+1]; > } > } > } > > Any ideas? for (int j=0;j<1e6-1;j++) The j<1e6-1 is a floating point operation. It should be redone as an int one: j<1_000_000-1

Walter Bright: > for (int j=0;j<1e6-1;j++) > > The j<1e6-1 is a floating point operation. It should be redone as an int one: > j<1_000_000-1 The syntax "1e6" can represent an integer value of one million as perfectly and as precisely as "1_000_000", but traditionally in many languages the exponential syntax is used to represent floating point values only, I don't know why. If the OP wants a short syntax to represent one million, this syntax can be used in D2: foreach (j; 0 .. 10^^6) Bye, bearophile

Forums