August 24, 2014 Re: D for the Win | ||||
---|---|---|---|---|
| ||||
Posted in reply to ketmar | On Sunday, 24 August 2014 at 13:13:58 UTC, ketmar via Digitalmars-d-announce wrote: > On Sun, 24 Aug 2014 12:51:10 +0000 > Mike via Digitalmars-d-announce <digitalmars-d-announce@puremagic.com> > wrote: > > ps. >> 6. This change (https://github.com/nsf/pnoise/commit/baadfe20c7ae6aa900cb0e4188aa9d20bea95918) > > with GDC has no effect at all. If I undo all of Edmund Smith's changes from today, use C's floor, and remove all the excessive function attributes, I get this http://dpaste.dzfl.pl/1b564efb423e === gcc -O3: 0.141484117 seconds time === D (dmd): 0.446634464 seconds time === D (ldc2): 0.191059330 seconds time === D (gdc): 0.226455762 seconds time Then I add change only #6 above, and remove the excessive function attributes, I get this: http://dpaste.dzfl.pl/f525adab909c === gcc -O3: 0.137815809 seconds time === D (dmd): 0.480525196 seconds time === D (ldc2): 0.139659135 seconds time === D (gdc): 0.131637220 seconds time Approaching twice as fast for GDC. That's significant to me. Also, all those optimization flags should already be on with -O3. Here are the flags I'm using: gcc -std=c99 -O3 -o bin_test_c_gcc test.c -lm dmd -ofbin_test_d_dmd -O -noboundscheck -inline -release test.d ldc2 -O3 -ofbin_test_d_ldc test.d -release gdc -O3 -o bin_test_d_gdc test.d -frelease Maybe I'll make a pull request for it. I don't think users should have to decorate their code like a Christmas tree and use a bunch of special compiler flags to get a well-behaved binary. Mike |
August 24, 2014 Re: D for the Win | ||||
---|---|---|---|---|
| ||||
Posted in reply to Mike | Mike: > Then I add change only #6 above, and remove the excessive function attributes, > Maybe I'll make a pull request for it. I don't think users should have to decorate their code like a Christmas tree I don't agree, function attributes are not excessive, they are idiomatic in D. Bye, bearophile |
August 24, 2014 Re: D for the Win | ||||
---|---|---|---|---|
| ||||
Posted in reply to Mike Attachments: | On Sun, 24 Aug 2014 13:44:07 +0000 Mike via Digitalmars-d-announce <digitalmars-d-announce@puremagic.com> wrote: hm. for my "GDC 4.9.1. git HEAD" #6 has no effect at all. p.s. it's unfair to specify "-msse3 -mfpmath=sse" for gcc and not for gdc. gdc can use this flags too! (yeah, the effect is great: sse3 variant is ~2.5 times faster). |
August 24, 2014 Re: D for the Win | ||||
---|---|---|---|---|
| ||||
Posted in reply to Mike Attachments: | On Sun, 24 Aug 2014 13:44:07 +0000 Mike via Digitalmars-d-announce <digitalmars-d-announce@puremagic.com> wrote: p.s. what i did is this: auto tm = Timer(); tm.start; foreach (; 0..100) { auto n2d = Noise2DContext(0); foreach (i; 0..100) { foreach (y; 0..256) { foreach (x; 0..256) { auto v = n2d.get(x * 0.1f, y * 0.1f) * 0.5f + 0.5f; pixels[y*256+x] = v; } } } } tm.stop; writeln(tm.toString); Timer is my simple timer class which uses MonoTime to measure intervals. this shows ~22 seconds for both variants, with #6 and without #6. and 57 seconds for variants without sse3 flags. ;-) |
August 24, 2014 Re: D for the Win | ||||
---|---|---|---|---|
| ||||
Posted in reply to ketmar | On Sunday, 24 August 2014 at 14:04:22 UTC, ketmar via Digitalmars-d-announce wrote:
> On Sun, 24 Aug 2014 13:44:07 +0000
> Mike via Digitalmars-d-announce <digitalmars-d-announce@puremagic.com>
> wrote:
>
> p.s. what i did is this:
>
> auto tm = Timer();
> tm.start;
> foreach (; 0..100) {
> auto n2d = Noise2DContext(0);
> foreach (i; 0..100) {
> foreach (y; 0..256) {
> foreach (x; 0..256) {
> auto v = n2d.get(x * 0.1f, y * 0.1f) *
> 0.5f + 0.5f;
> pixels[y*256+x] = v;
> }
> }
> }
> }
> tm.stop;
> writeln(tm.toString);
>
> Timer is my simple timer class which uses MonoTime to measure intervals.
> this shows ~22 seconds for both variants, with #6 and without #6.
>
> and 57 seconds for variants without sse3 flags. ;-)
I'm guessing the dependency is probably due to our configure/build of GDC. I'm using Arch Linux 64's default GDC from their repository. Perhaps it's configured in a way that has these optimizations on by default. It probably should.
Mike
|
August 24, 2014 Re: D for the Win | ||||
---|---|---|---|---|
| ||||
Posted in reply to Mike | On Sunday, 24 August 2014 at 14:09:03 UTC, Mike wrote:
> I'm guessing the dependency is probably due to our configure/build of GDC. I'm using Arch Linux 64's default GDC from their repository. Perhaps it's configured in a way that has these optimizations on by default. It probably should.
>
"dependency" --> "discrepancy"
|
August 24, 2014 Re: D for the Win | ||||
---|---|---|---|---|
| ||||
Posted in reply to Mike Attachments: | On Sun, 24 Aug 2014 14:09:02 +0000 Mike via Digitalmars-d-announce <digitalmars-d-announce@puremagic.com> wrote: > 64's default GDC i think that 64-bit gcc/gdc turns sse optimisations on anyway, 'cause there is no x86_64-capable CPUs without sse. and i'm on x86 arch. |
August 24, 2014 Re: D for the Win | ||||
---|---|---|---|---|
| ||||
Attachments:
| On 24 Aug 2014 14:09, "ketmar via Digitalmars-d-announce" < digitalmars-d-announce@puremagic.com> wrote: > > On Sun, 24 Aug 2014 12:51:10 +0000 > Mike via Digitalmars-d-announce <digitalmars-d-announce@puremagic.com> > wrote: > > > 5. Using C's floor instead of D's floor. - very significant (why?) > gcc/clang inlines floorf(). > > gdc generates calls to floor() in both cases, C floor() is just faster. > i.e. gdc fails to see that floor() can be converted to intrinsic. > > the same thing with DMD i believe. That's because floor isn't an intrinsic. The crippling speed issue was the fact that floor computed and returned at real precision. On recent (sandybridge?) CPU's, it was found that x87 does more ill than good. So I changed it to a template in Phobos (and did some nice tidy ups in the process). This will be pulled down in the 2.066 merge. Speed improvements were discussed in the PR and in the original pnoise thread. Though it's very likely that a hand optimised SSE3 assembly implementation in C's mathlib might still be faster. Iain. |
August 24, 2014 Re: D for the Win | ||||
---|---|---|---|---|
| ||||
Attachments: | On Sun, 24 Aug 2014 16:16:43 +0100 Iain Buclaw via Digitalmars-d-announce <digitalmars-d-announce@puremagic.com> wrote: > That's because floor isn't an intrinsic. The crippling speed issue was the fact that floor computed and returned at real precision. i'm testing on x86, and the difference between 'call floorf' and inlining is significant. gcc inlines floorf() call, and gdc does not. i don't know anything about x86_64 though. |
August 24, 2014 Re: D for the Win | ||||
---|---|---|---|---|
| ||||
Attachments:
| On 24 Aug 2014 16:26, "ketmar via Digitalmars-d-announce" < digitalmars-d-announce@puremagic.com> wrote: > > On Sun, 24 Aug 2014 16:16:43 +0100 > Iain Buclaw via Digitalmars-d-announce > <digitalmars-d-announce@puremagic.com> wrote: > > > That's because floor isn't an intrinsic. The crippling speed issue was the fact that floor computed and returned at real precision. > i'm testing on x86, and the difference between 'call floorf' and inlining is significant. gcc inlines floorf() call, and gdc does not. > Inline is not quite correct. Floor is a function recognised by the compiler, so if the backend knows an instruction for it, it will favour that intrinsic over calling an external function. Iain |
Copyright © 1999-2021 by the D Language Foundation