Loop optimization (page 3)

Ali Ã‡ehreli: > I don't see it in the spec. Is that an old or an unintended feature? It's a compiler bug, don't use that bracket less syntax in your programs. Don is fighting to fix such problems (and I have written several posts and bug reports on that stuff). Bye, bearophile

May 16, 2010

Re: Loop optimization

Posted by Don
in reply to strtr

Permalink

Don

Posted in reply to strtr

Permalink

strtr wrote:
> == Quote from Don (nospam@nospam.com)'s article
>> strtr wrote:
>>> == Quote from bearophile (bearophileHUGS@lycos.com)'s article
>>>> But the bigger problem in your code is that you are performing operations on
>>> NaNs (that's the default initalization of FP values in D), and operations on NaNs
>>> are usually quite slower.
>>>
>>> I didn't know that. Is it the same for inf?
>> Yes, nan and inf are usually the same speed. However, it's very CPU
>> dependent, and even *within* a CPU! On Pentium 4, for example, for x87,
>> nan is 200 times slower than a normal value (!), but on Pentium 4 SSE
>> there's no speed difference at all between nan and normal. I think
>> there's no speed difference on AMD, but I'm not sure.
>> There's almost no documentation on it at all.
> 
> Thanks!
> NaNs being slower I can understand but inf might well be a value you want to use.

Yes. What's happened is that none of the popular programming languages support special IEEE values, so they're given very low priority by chip designers. In the Pentium 4 case, they're implemented entirely in microcode. A 200X slowdown is really significant.

However, the bit pattern for NaN is 0xFFFF..., which is the same as a negative integer, so an uninitialized floating-point variable has a quite high probability of being a NaN. I'm certain there's a lot of C programs out there which are inadvertantly using NaNs.

Jérôme M. Berger wrote: > div0 wrote: >> Jérôme M. Berger wrote: >>> That depends. In C/C++, the default value for any global variable >>> is to have all bits set to 0 whatever that means for the actual data >>> type. >> No it's not, it's always uninitialized. >> > According to the C89 standard and onwards it *must* be initialized > to 0. If it isn't then your implementation isn't standard compliant > (needless to say, gcc, Visual, llvm, icc and dmc are all standard > compliant, so you won't have any difficulty checking). Ah, I only do C++, where the standard is to not initialise. I didn't know the two specs had diverged like that. - -- My enormous talent is exceeded only by my outrageous laziness. http://www.ssTk.co.uk

"div0" <div0@users.sourceforge.net> wrote: >>> Jérôme M. Berger wrote: >>>> That depends. In C/C++, the default value for any global variable >>>> is to have all bits set to 0 whatever that means for the actual data >>>> type. > Ah, I only do C++, where the standard is to not initialise. No, in C++ all *global or static* variables are zero-initialized. By default, stack variables are default-initialized, which means that doubles in stack can have any value (they are uninitialized). The C-function calloc is required to fill the newly allocated memory with zero bit pattern; malloc is not required to initialize anything. Fresh heap areas given by malloc may have zero bit pattern, but one should really make no assumptions on this. -- Jouko

div0 wrote: > Jérôme M. Berger wrote: >> div0 wrote: >>> Jérôme M. Berger wrote: >>>> That depends. In C/C++, the default value for any global variable >>>> is to have all bits set to 0 whatever that means for the actual data >>>> type. >>> No it's not, it's always uninitialized. >>> >> According to the C89 standard and onwards it *must* be initialized >> to 0. If it isn't then your implementation isn't standard compliant >> (needless to say, gcc, Visual, llvm, icc and dmc are all standard >> compliant, so you won't have any difficulty checking). > > Ah, I only do C++, where the standard is to not initialise. I didn't know the two specs had diverged like that. > The specs haven't diverged and C++ has mostly the same behaviour as C where global variables are concerned. The only difference is that if the global variable is a class with a constructor, then that constructor gets called after the memory is zeroed out. Jerome -- mailto:jeberger@free.fr http://jeberger.free.fr Jabber: jeberger@jabber.fr

Don wrote: > bearophile wrote: >> kai: >>> Any ideas? Am I somehow not hitting a vital compiler optimization? >> >> DMD compiler doesn't perform many optimizations, especially on floating point computations. > > More precisely: > In terms of optimizations performed, DMD isn't too far behind gcc. But it performs almost no optimization on floating point. Also, the inliner doesn't yet support the newer D features (this won't be hard to fix) and the scheduler is based on Pentium1. Have to be careful when talking about floating point optimizations. For example, x/c => x * 1/c is not done because of roundoff error. Also, 0 * x => 0 is also not done because it is not a correct replacement if x is a NaN.

bearophile wrote: > DMD compiler doesn't perform many optimizations, This is simply false. DMD does an excellent job with integer and pointer operations. It does a so-so job with floating point. There are probably over a thousand optimizations at all levels that dmd does with integer and pointer code. Compare the generated code with and without -O. Even without -O, dmd does a long list of optimizations (such as common subexpression elimination).

Walter Bright: > This is simply false. DMD does an excellent job with integer and pointer > operations. It does a so-so job with floating point. > There are probably over a thousand optimizations at all levels that dmd does > with integer and pointer code. You are of course right, I understand your feelings, I am a stupid -.- I must be more precise in my posts. You are right that surely dmd performs numerous optimizations. What I meant to say was a comparison with other compilers, particularly ldc. And even then generic words about a generic comparison aren't useful. So I am sorry. Bye, bearophile

On 5/16/2010 4:15 PM, Walter Bright wrote: > bearophile wrote: >> DMD compiler doesn't perform many optimizations, > > This is simply false. DMD does an excellent job with integer and pointer operations. It does a so-so job with floating point. > > There are probably over a thousand optimizations at all levels that dmd does with integer and pointer code. > > Compare the generated code with and without -O. Even without -O, dmd does a long list of optimizations (such as common subexpression elimination). While it's false that DMD doesn't do many optimizations. It's true that it's behind more modern compiler optimizers. I've been working to fix some of the grossly bad holes in dmd's inliner which is one are that's just obviously lacking (see bug 2008). But gcc and ldc (and likely msvc though I lack any direct knowledge) are simply a decade or so ahead. It's not a criticism of dmd or a suggestion that the priorities are in the wrong place, just a point of fact. They've got larger teams of people and are spending significant time on just improving and adding optimizations. Later, Brad

Walter Bright: > is not done because of roundoff error. Also, > 0 * x => 0 > is also not done because it is not a correct replacement if x is a NaN. I have done a little experiment, compiling this D1 code with LDC: import tango.stdc.stdio: printf; void main(char[][] args) { double x = cast(double)args.length; double y = 0 * x; printf("%f\n", y); } I think the asm generated by ldc shows what you say: ldc -O3 -release -inline -output-s test _Dmain: pushl %ebp movl %esp, %ebp andl $-16, %esp subl $32, %esp movsd .LCPI1_0, %xmm0 movd 8(%ebp), %xmm1 orps %xmm0, %xmm1 subsd %xmm0, %xmm1 pxor %xmm0, %xmm0 mulsd %xmm1, %xmm0 movsd %xmm0, 4(%esp) movl $.str, (%esp) call printf xorl %eax, %eax movl %ebp, %esp popl %ebp ret $8 So I have added an extra "unsafe floating point" optimization: ldc -O3 -release -inline -enable-unsafe-fp-math -output-s test _Dmain: subl $12, %esp movl $0, 8(%esp) movl $0, 4(%esp) movl $.str, (%esp) call printf xorl %eax, %eax addl $12, %esp ret $8 GCC has similar switches. Bye, bearophile

Forums