June 20, 2014
On Friday, 20 June 2014 at 18:29:35 UTC, Mattcoder wrote:
> On Friday, 20 June 2014 at 16:02:56 UTC, bearophile wrote:
>> So this is the best so far version:
>>
>> http://dpaste.dzfl.pl/8dae9b359f27
>
> Just one note, with the last version of DMD:
>
> dmd -O -noboundscheck -inline -release pnoise.d
> pnoise.d(42): Error: pure function 'pnoise.Noise2DContext.getGradients' cannot c
> all impure function 'core.stdc.math.floor'
> pnoise.d(43): Error: pure function 'pnoise.Noise2DContext.getGradients' cannot c
> all impure function 'core.stdc.math.floor'
>
> Matheus.

Sorry, I forgot this:

Beside the error above, which for now I'm using:

immutable float x0f = cast(int)x; //x.floor;
immutable float y0f = cast(int)y; //y.floor;

Just to compile, your version here is twice faster than the original one.

Matheus.
June 20, 2014
On Friday, 20 June 2014 at 18:32:22 UTC, dennis luehring wrote:
> it does not makes sense to "optmized" this example more and more - it should be fast with the original version (except the missing finals on the virtuals)

Oh please, let him continue, I'm really learning a lot with these optimizations.

Matheus.
June 20, 2014
Mattcoder:

> Just one note, with the last version of DMD:

Yes, I know, at the top of the file I have specified it's for ldc2.

Bye,
bearophile
June 20, 2014
Mattcoder:

> Beside the error above, which for now I'm using:
>
> immutable float x0f = cast(int)x; //x.floor;
> immutable float y0f = cast(int)y; //y.floor;
>
> Just to compile,

If you remove the calls to floor, you are avoiding the main problem to fix.

Bye,
bearohile
June 20, 2014
dennis luehring:

> it does not makes sense to "optmized" this example more and more - it should be fast with the original version

But the original code is not fast. So someone has to find what's broken. I have shown part of the broken parts to fix (floor on ldc2).

Also, the original code is not written in a fully idiomatic way, also because unfortunately today the "lazy" way to write D code is not always the best/right way (example: you have to add ton of immutable/const, and annotations, because immutability is not the default), so a code fix is good.

Bye,
bearophile
June 20, 2014
Nick Treleaven:

> A Perlin noise benchmark was quoted in this reddit thread:

And a simple benchmark for D ranges/parallelism:

Bye,
bearophile
June 20, 2014
Nick Treleaven:

> A Perlin noise benchmark was quoted in this reddit thread:

And a simple benchmark for D ranges/parallelism:

http://www.reddit.com/r/programming/comments/28mub4/clash_of_the_lambdas_comparing_lambda_performance/

Bye,
bearophile
June 21, 2014
Am 20.06.2014 22:44, schrieb bearophile:
> dennis luehring:
>
>> it does not makes sense to "optmized" this example more and
>> more - it should be fast with the original version
>
> But the original code is not fast. So someone has to find what's
> broken. I have shown part of the broken parts to fix (floor on
> ldc2).
>
> Also, the original code is not written in a fully idiomatic way,
> also because unfortunately today the "lazy" way to write D code
> is not always the best/right way (example: you have to add ton of
> immutable/const, and annotations, because immutability is not the
> default), so a code fix is good.
>
> Bye,
> bearophile
>

as long as you find out its a library thing

the c version is without any annotations and immutable/const the fastest - so whats the problem with D here, it can't(shouln't) be that one needs to work/change that much on such simple code to reach c speed
June 21, 2014
On Saturday, 21 June 2014 at 05:00:25 UTC, dennis luehring wrote:
> Am 20.06.2014 22:44, schrieb bearophile:
>> dennis luehring:
>>
>>> it does not makes sense to "optmized" this example more and
>>> more - it should be fast with the original version
>>
>> But the original code is not fast. So someone has to find what's
>> broken. I have shown part of the broken parts to fix (floor on
>> ldc2).
>>
>> Also, the original code is not written in a fully idiomatic way,
>> also because unfortunately today the "lazy" way to write D code
>> is not always the best/right way (example: you have to add ton of
>> immutable/const, and annotations, because immutability is not the
>> default), so a code fix is good.
>>
>> Bye,
>> bearophile
>>
>
> as long as you find out its a library thing
>
> the c version is without any annotations and immutable/const the fastest - so whats the problem with D here, it can't(shouln't) be that one needs to work/change that much on such simple code to reach c speed

bearophile's work is very valuable regardless of what the cause is, as it provides a pretty decent hint of what could be improved for anybody investigating the issue.

This is not to say that we wouldn't need to fix our compilers (in end user terms, i.e. compiler + standard library) to make those examples fast – zero-cost abstractions are one of the main strengths of D.

David
March 24, 2015
On Friday, 20 June 2014 at 12:32:39 UTC, Nick Treleaven wrote:
> Hi,
> A Perlin noise benchmark was quoted in this reddit thread:
>
> http://www.reddit.com/r/rust/comments/289enx/c0de517e_where_is_my_c_replacement/cibn6sr
>
> It apparently shows the 3 main D compilers producing slower code than Go, Rust, gcc, clang, Nimrod:
>
> https://github.com/nsf/pnoise#readme
>
> I initially wondered about std.random, but got this response:
>
> "Yeah, but std.random is not used in that benchmark, it just initializes 256 random vectors and permutates 256 sequential integers. What spins in a loop is just plain FP math and array read/writes. I'm sure it can be done faster, maybe D compilers are bad at automatic inlining or something. "
>
> Obviously this is only one person's benchmark, but I wondered if people would like to check their code and suggest reasons for the speed deficit.

I saw this thread when searching for something on the site, been a few months since anyone posted-

I fixed the D flags, gdc is now about 15% faster than the second fastest in the benchmark(C - gcc) which obviously puts D in first.
some notes:

LDC is missing _tons_ of inline opportunities, killing it in comparison to GDC. I think GDC inlined pretty much everything. LDC is about 50% slower.

Also, AFAICT there's no fast-math switch for LDC(enabling this for GDC might actually be compromising it though : ) )

I think LDC turns the floor in std.math into the same as the stdc one, but GDC does not. std.math.floor is still abysmally slow, I thought it was because it was still using reals but that does not seem to be the case. GDC slows to a crawl(10-20x slower) if you replace the stdc floor with the one in std.math(just remove the alias)

I thought this might be interesting to someone(i.e, LDC/GDC folks or phobos math folks)

bye.