June 20, 2014
On Friday, 20 June 2014 at 13:46:26 UTC, Mattcoder wrote:
> On Friday, 20 June 2014 at 13:14:04 UTC, dennis luehring wrote:
>> write, printf etc. performance is benchmarked also - so not clear
>> if pnoise is super-fast but write is super-slow etc...
>
> Indeed and using Windows (At least 8), the size of command-window (CMD) interferes in the result drastically... for example: running this test with console maximized will take: 2.58s while the same test but in small window: 2.11s!

Before I wrote the above, I briefly ran the benchmark on my local (OS X) machine, and verified that the bulk of the time is indeed spent in the noise calculation loop (with stdout piped into /dev/null). Still, the LDC-compiled code is only about half as fast as the Clang-compiled version, and there is no good reason why it should be.

My new guess is a difference in inlining heuristics (note also that the Rust version uses inlining hints). The big difference between GCC and Clang might be a hint that the performance drop is caused by a rather minute difference in optimizer tuning.

Thus, we really need somebody to sit down with a profiler/disassembler and figure out what is going on.

David
June 20, 2014
On 6/20/14, 9:32 AM, Nick Treleaven wrote:
> Hi,
> A Perlin noise benchmark was quoted in this reddit thread:
>
> http://www.reddit.com/r/rust/comments/289enx/c0de517e_where_is_my_c_replacement/cibn6sr
>
>
> It apparently shows the 3 main D compilers producing slower code than
> Go, Rust, gcc, clang, Nimrod:
>
> https://github.com/nsf/pnoise#readme
>
> I initially wondered about std.random, but got this response:
>
> "Yeah, but std.random is not used in that benchmark, it just initializes
> 256 random vectors and permutates 256 sequential integers. What spins in
> a loop is just plain FP math and array read/writes. I'm sure it can be
> done faster, maybe D compilers are bad at automatic inlining or
> something. "
>
> Obviously this is only one person's benchmark, but I wondered if people
> would like to check their code and suggest reasons for the speed deficit.

I just tried it with ldc and it's faster (faster than Go, slower than Ni. But this is still slower than other languages. And other languages keep the array bounds check on...
June 20, 2014
Nick Treleaven:

> A Perlin noise benchmark was quoted in this reddit thread:
>
> http://www.reddit.com/r/rust/comments/289enx/c0de517e_where_is_my_c_replacement/cibn6sr

This should be compiled with LDC2, it's more idiomatic and a little faster than the original D version:
http://dpaste.dzfl.pl/8d2ff04b62d3

I have already seen that if I inline Noise2DContext.get in the main manually the program gets faster (but not yet fast enough).

Bye,
bearophile
June 20, 2014
> http://dpaste.dzfl.pl/8d2ff04b62d3

Sorry for the awful tabs.

Bye,
bearophile
June 20, 2014
If I add this import in Noise2DContext.getGradients the run-time decreases a lot (I am now just two times slower than gcc with -Ofast):

import core.stdc.math: floor;

Bye,
bearophile
June 20, 2014
 GO BEAROPHILE YOU CAN DO IT

On Friday, 20 June 2014 at 15:24:38 UTC, bearophile wrote:
> If I add this import in Noise2DContext.getGradients the run-time decreases a lot (I am now just two times slower than gcc with -Ofast):
>
> import core.stdc.math: floor;
>
> Bye,
> bearophile

June 20, 2014
On Friday, 20 June 2014 at 15:24:38 UTC, bearophile wrote:
> If I add this import in Noise2DContext.getGradients the run-time decreases a lot (I am now just two times slower than gcc with -Ofast):
>
> import core.stdc.math: floor;
>
> Bye,
> bearophile

Was just about to post that if I cheat and replace usage of floor(x) with cast(float)cast(int)x, ldc2 is almost down to gcc speeds (119.6ms average over 100 full executions vs gcc 102.7ms).

It stood out in the callgraph. Because profiling before optimizing.
June 20, 2014
So this is the best so far version:

http://dpaste.dzfl.pl/8dae9b359f27

I don't show the version with manually inlined function.

(I have also seen that GCC generates on my cpu a little faster code if I don't use sse registers.)

Bye,
bearophile
June 20, 2014
On Friday, 20 June 2014 at 16:02:56 UTC, bearophile wrote:
> So this is the best so far version:
>
> http://dpaste.dzfl.pl/8dae9b359f27

Just one note, with the last version of DMD:

dmd -O -noboundscheck -inline -release pnoise.d
pnoise.d(42): Error: pure function 'pnoise.Noise2DContext.getGradients' cannot c
all impure function 'core.stdc.math.floor'
pnoise.d(43): Error: pure function 'pnoise.Noise2DContext.getGradients' cannot c
all impure function 'core.stdc.math.floor'

Matheus.
June 20, 2014
Am 20.06.2014 17:09, schrieb bearophile:
> Nick Treleaven:
>
>> A Perlin noise benchmark was quoted in this reddit thread:
>>
>> http://www.reddit.com/r/rust/comments/289enx/c0de517e_where_is_my_c_replacement/cibn6sr
>
> This should be compiled with LDC2, it's more idiomatic and a
> little faster than the original D version:
> http://dpaste.dzfl.pl/8d2ff04b62d3
>
> I have already seen that if I inline Noise2DContext.get in the
> main manually the program gets faster (but not yet fast enough).
>
> Bye,
> bearophile
>

it does not makes sense to "optmized" this example more and more - it should be fast with the original version (except the missing finals on the virtuals)