Modern C++ Lamentations (page 9)

March 05, 2019
Re: Modern C++ Lamentations
Posted by Dukc
in reply to Steven Schveighoffer
Permalink
Dukc
Posted in reply to Steven Schveighoffer
Permalink
On Friday, 4 January 2019 at 16:21:40 UTC, Steven Schveighoffer wrote:
> On 1/4/19 9:00 AM, Dukc wrote:
>>>
>> 
>> Isn't the main problem with performance of the Timon's range loop that it uses arbitrary-sized integers (BigInts)?
>
> Atila's version of that code doesn't use bigints: https://github.com/atilaneves/pythagoras/blob/master/range.d#L24
>
> The major problem with the D range implementation is that the compiler isn't able to find the optimization of hoisting the multiplication of the outer indexes out of the inner loop.
>
> See my responses to Atila in this thread.
>
> -Steve

Now, I'm replying to an old theard, I hope you're still interested enough to warrant necrobumping. The thing is, I did do some additional testing of the range version, and I think I found out a way to make the compiler to find the quoted optimization without doing it manually.

You just have to move the sqr calculations to where the data is still nested:

import std.experimental.all;
import std.datetime.stopwatch : AutoStart, StopWatch;

alias then(alias a)=(r)=>map!a(r).joiner;
void main(){
    auto sw = StopWatch(AutoStart.no);
    int total;

    if (true)
    {   sw.start;
        scope (success) sw.stop;
        auto triples=recurrence!"a[n-1]+1"(1L)
            .then!(z=>iota(1,z+1).then!(x=>iota(x,z+1).map!(y=>tuple(x,y,z))))
            .filter!((t)=>t[0]^^2+t[1]^^2==t[2]^^2)
            .until!(t=>t[2] >= 500);
        triples.each!((x,y,z){ total += x+y+z; });
    }

    writefln("Old loop time is %s microseconds", sw.peek.total!"usecs"); // 118_614
    sw.reset;

    if (true)
    {   sw.start;
        scope (success) sw.stop;
        auto triples=recurrence!"a[n-1]+1"(1L)
            .then!(z=>iota(1,z+1).then!
            (   x=>iota(x,z+1)
                .map!(y=> tuple(x,y,z))
                .filter!((t)=>t[0]^^2+t[1]^^2==t[2]^^2))
            )
            .until!(t=>t[2] >= 500);
        triples.each!((x,y,z){ total += x+y+z; });
    }

    writefln("New loop time is %s microseconds", sw.peek.total!"usecs"); // 21_936
    writeln(total); // to force the compiler to do the calculations
    return;
}

See, no manual caching of the squares. And the improvement is over 5X (dub --build=release --compiler=ldc2), which should bring it close to the other versions in the blog entry.
Forums