D outperformed by C++, what am I doing wrong? (page 3)

On Sunday, 13 August 2017 at 09:15:48 UTC, amfvcg wrote: > > Change the parameter for this array size to be taken from stdin and I assume that these optimizations will go away. This is paramount for all of the testing, examining, and comparisons that are discussed in this thread. Full information is given to the compiler, and you are basically testing the constant folding power of the compilers (not unimportant). No runtime calculation is needed for the sum. Your program could be optimized to the following code: ``` void main() { MonoTime beg = MonoTime.currTime; MonoTime end = MonoTime.currTime; writeln(end-beg); writeln(50000000); } ``` So actually you should be more surprised that the reported time is not equal to near-zero (just the time between two `MonoTime.currTime` calls)! Instead of `iota(1,1000000)`, you should initialize the array with random numbers with a randomization seed given by the user (e.g. commandline argument or stdin). Then, the program will actually have to do the runtime calculations that I assume you are expecting it to perform. - Johan

On Sunday, 13 August 2017 at 09:41:39 UTC, Johan Engelen wrote: > On Sunday, 13 August 2017 at 09:08:14 UTC, Petar Kirov [ZombineDev] wrote: >> [...] > >> [...] > > Execution of sum_subranges is already O(1), because the calculation of the sum is delayed: the return type of the function is not `uint`, it is `MapResult!(sum, <range>)` which does a lazy evaluation of the sum. > > - Johan Heh, yeah you're absolutely right. I was just about to correct myself, when I saw your reply. Don't know how I missed such an obvious thing :D

August 13, 2017

Re: D outperformed by C++, what am I doing wrong?

Posted by Petar Kirov [ZombineDev]
in reply to Johan Engelen

Permalink

Petar Kirov [ZombineDev]

Posted in reply to Johan Engelen

Permalink

On Sunday, 13 August 2017 at 09:56:44 UTC, Johan Engelen wrote:
> On Sunday, 13 August 2017 at 09:15:48 UTC, amfvcg wrote:
>>
>> Change the parameter for this array size to be taken from stdin and I assume that these optimizations will go away.
>
> This is paramount for all of the testing, examining, and comparisons that are discussed in this thread.
> Full information is given to the compiler, and you are basically testing the constant folding power of the compilers (not unimportant).

I agree that in general this is not the right way to benchmark. I however am interested specifically in the pattern matching / constant folding abilities
of the compiler. I would have expected `sum(iota(1, N + 1))` to be replaced with `(N*(N+1))/2`. LDC already does this optimization in some cases. I have opened an issue for some of the rest: https://github.com/ldc-developers/ldc/issues/2271

> No runtime calculation is needed for the sum. Your program could be optimized to the following code:
> ```
> void main()
> {
>     MonoTime beg = MonoTime.currTime;
>     MonoTime end = MonoTime.currTime;
>     writeln(end-beg);
>     writeln(50000000);
> }
> ```
> So actually you should be more surprised that the reported time is not equal to near-zero (just the time between two `MonoTime.currTime` calls)!

On Posix, `MonoTime.currTime`'s implementation uses clock_gettime(CLOCK_MONOTONIC, ...) which quite a bit more involved than simply using the rdtsc instruciton on x86. See: http://linuxmogeb.blogspot.bg/2013/10/how-does-clockgettime-work.html

On Windows, `MonoTime.currTime` uses QueryPerformanceCounter, which on Win 7 and later uses the rdtsc instruction, which makes it quite streamlined. In some testing I did several months ago QueryPerformanceCounter had really good latency and precision (though I forgot the exact numbers I got).

> Instead of `iota(1,1000000)`, you should initialize the array with random numbers with a randomization seed given by the user (e.g. commandline argument or stdin). Then, the program will actually have to do the runtime calculations that I assume you are expecting it to perform.
>

Agreed, though I think Phobos's unpredictableSeed does an ok job w.r.t. seeding, so unless you want to repeat the benchmark on the exact same dataset, something like this does a good job:

T[] generate(T)(size_t size)
{
    import std.algorithm.iteration : map;
    import std.range : array, iota;
    import std.random : uniform;

    return size.iota.map!(_ => uniform!T()).array;
}

On Sunday, 13 August 2017 at 06:09:39 UTC, amfvcg wrote: > Hi all, > I'm solving below task: > > given container T and value R return sum of R-ranges over T. An example: > input : T=[1,1,1] R=2 > output : [2, 1] > > input : T=[1,2,3] R=1 > output : [1,2,3] > (see dlang unittests for more examples) > > > Below c++ code compiled with g++-5.4.0 -O2 -std=c++14 runs on my machine in 656 836 us. > Below D code compiled with dmd v2.067.1 -O runs on my machine in ~ 14.5 sec. > > Each language has it's own "way of programming", and as I'm a beginner in D - probably I'm running through bushes instead of highway. Therefore I'd like to ask you, experienced dlang devs, to shed some light on "how to do it dlang-way". > > ... > From time to time the forum gets questions like these. It would be nice it we had blog articles concerning high performance programming in D, a kind of best practice approach, everything from programming idioms, to compilers, and compiler flags.

Forums