D for the Win (page 5)

August 24, 2014

Re: D for the Win

Posted by Mike
in reply to ketmar

Permalink

Mike

Posted in reply to ketmar

Permalink

On Sunday, 24 August 2014 at 13:13:58 UTC, ketmar via Digitalmars-d-announce wrote:
> On Sun, 24 Aug 2014 12:51:10 +0000
> Mike via Digitalmars-d-announce <digitalmars-d-announce@puremagic.com>
> wrote:
>
> ps.
>> 6. This change (https://github.com/nsf/pnoise/commit/baadfe20c7ae6aa900cb0e4188aa9d20bea95918)
>
> with GDC has no effect at all.

If I undo all of Edmund Smith's changes from today, use C's floor, and remove all the excessive function attributes, I get this

http://dpaste.dzfl.pl/1b564efb423e
=== gcc -O3:
       0.141484117 seconds time
=== D (dmd):
       0.446634464 seconds time
=== D (ldc2):
       0.191059330 seconds time
=== D (gdc):
       0.226455762 seconds time


Then I add change only #6 above, and remove the excessive function attributes, I get this:
http://dpaste.dzfl.pl/f525adab909c
=== gcc -O3:
       0.137815809 seconds time
=== D (dmd):
       0.480525196 seconds time
=== D (ldc2):
       0.139659135 seconds time
=== D (gdc):
       0.131637220 seconds time

Approaching twice as fast for GDC.  That's significant to me.

Also, all those optimization flags should already be on with -O3.  Here are the flags I'm using:

gcc -std=c99 -O3 -o bin_test_c_gcc test.c -lm
dmd -ofbin_test_d_dmd -O -noboundscheck -inline -release test.d
ldc2 -O3 -ofbin_test_d_ldc test.d -release
gdc -O3 -o bin_test_d_gdc test.d -frelease

Maybe I'll make a pull request for it.  I don't think users should have to decorate their code like a Christmas tree and use a bunch of special compiler flags to get a well-behaved binary.

Mike

Mike: > Then I add change only #6 above, and remove the excessive function attributes, > Maybe I'll make a pull request for it. I don't think users should have to decorate their code like a Christmas tree I don't agree, function attributes are not excessive, they are idiomatic in D. Bye, bearophile

On Sun, 24 Aug 2014 13:44:07 +0000 Mike via Digitalmars-d-announce <digitalmars-d-announce@puremagic.com> wrote: hm. for my "GDC 4.9.1. git HEAD" #6 has no effect at all. p.s. it's unfair to specify "-msse3 -mfpmath=sse" for gcc and not for gdc. gdc can use this flags too! (yeah, the effect is great: sse3 variant is ~2.5 times faster).

On Sun, 24 Aug 2014 13:44:07 +0000 Mike via Digitalmars-d-announce <digitalmars-d-announce@puremagic.com> wrote: p.s. what i did is this: auto tm = Timer(); tm.start; foreach (; 0..100) { auto n2d = Noise2DContext(0); foreach (i; 0..100) { foreach (y; 0..256) { foreach (x; 0..256) { auto v = n2d.get(x * 0.1f, y * 0.1f) * 0.5f + 0.5f; pixels[y*256+x] = v; } } } } tm.stop; writeln(tm.toString); Timer is my simple timer class which uses MonoTime to measure intervals. this shows ~22 seconds for both variants, with #6 and without #6. and 57 seconds for variants without sse3 flags. ;-)

On Sunday, 24 August 2014 at 14:04:22 UTC, ketmar via Digitalmars-d-announce wrote: > On Sun, 24 Aug 2014 13:44:07 +0000 > Mike via Digitalmars-d-announce <digitalmars-d-announce@puremagic.com> > wrote: > > p.s. what i did is this: > > auto tm = Timer(); > tm.start; > foreach (; 0..100) { > auto n2d = Noise2DContext(0); > foreach (i; 0..100) { > foreach (y; 0..256) { > foreach (x; 0..256) { > auto v = n2d.get(x * 0.1f, y * 0.1f) * > 0.5f + 0.5f; > pixels[y*256+x] = v; > } > } > } > } > tm.stop; > writeln(tm.toString); > > Timer is my simple timer class which uses MonoTime to measure intervals. > this shows ~22 seconds for both variants, with #6 and without #6. > > and 57 seconds for variants without sse3 flags. ;-) I'm guessing the dependency is probably due to our configure/build of GDC. I'm using Arch Linux 64's default GDC from their repository. Perhaps it's configured in a way that has these optimizations on by default. It probably should. Mike

On Sunday, 24 August 2014 at 14:09:03 UTC, Mike wrote: > I'm guessing the dependency is probably due to our configure/build of GDC. I'm using Arch Linux 64's default GDC from their repository. Perhaps it's configured in a way that has these optimizations on by default. It probably should. > "dependency" --> "discrepancy"

On Sun, 24 Aug 2014 14:09:02 +0000 Mike via Digitalmars-d-announce <digitalmars-d-announce@puremagic.com> wrote: > 64's default GDC i think that 64-bit gcc/gdc turns sse optimisations on anyway, 'cause there is no x86_64-capable CPUs without sse. and i'm on x86 arch.

On 24 Aug 2014 14:09, "ketmar via Digitalmars-d-announce" < digitalmars-d-announce@puremagic.com> wrote: > > On Sun, 24 Aug 2014 12:51:10 +0000 > Mike via Digitalmars-d-announce <digitalmars-d-announce@puremagic.com> > wrote: > > > 5. Using C's floor instead of D's floor. - very significant (why?) > gcc/clang inlines floorf(). > > gdc generates calls to floor() in both cases, C floor() is just faster. > i.e. gdc fails to see that floor() can be converted to intrinsic. > > the same thing with DMD i believe. That's because floor isn't an intrinsic. The crippling speed issue was the fact that floor computed and returned at real precision. On recent (sandybridge?) CPU's, it was found that x87 does more ill than good. So I changed it to a template in Phobos (and did some nice tidy ups in the process). This will be pulled down in the 2.066 merge. Speed improvements were discussed in the PR and in the original pnoise thread. Though it's very likely that a hand optimised SSE3 assembly implementation in C's mathlib might still be faster. Iain.

On Sun, 24 Aug 2014 16:16:43 +0100 Iain Buclaw via Digitalmars-d-announce <digitalmars-d-announce@puremagic.com> wrote: > That's because floor isn't an intrinsic. The crippling speed issue was the fact that floor computed and returned at real precision. i'm testing on x86, and the difference between 'call floorf' and inlining is significant. gcc inlines floorf() call, and gdc does not. i don't know anything about x86_64 though.

On 24 Aug 2014 16:26, "ketmar via Digitalmars-d-announce" < digitalmars-d-announce@puremagic.com> wrote: > > On Sun, 24 Aug 2014 16:16:43 +0100 > Iain Buclaw via Digitalmars-d-announce > <digitalmars-d-announce@puremagic.com> wrote: > > > That's because floor isn't an intrinsic. The crippling speed issue was the fact that floor computed and returned at real precision. > i'm testing on x86, and the difference between 'call floorf' and inlining is significant. gcc inlines floorf() call, and gdc does not. > Inline is not quite correct. Floor is a function recognised by the compiler, so if the backend knows an instruction for it, it will favour that intrinsic over calling an external function. Iain

Forums