January 04, 2019
On Friday, 4 January 2019 at 22:33:57 UTC, Ethan wrote:
> And to add to this. *ANY* large project benefits from reducing to smaller DLLs and a corresponding static .lib to load them.
>
> The MSVC team has come around in the last five years to love game developers specifically because we create massive codebases that tax the compiler. Remember how I said our binary sizes were something up near 100MB on Quantum Break during one of my DConf talks? Good luck linking that in one hit.

Adding to myself. Pretty sure the default for the Windows UCRT (Universal C Runtime) is to be dynamically linked these days.
March 05, 2019
On Friday, 4 January 2019 at 16:21:40 UTC, Steven Schveighoffer wrote:
> On 1/4/19 9:00 AM, Dukc wrote:
>>>
>> 
>> Isn't the main problem with performance of the Timon's range loop that it uses arbitrary-sized integers (BigInts)?
>
> Atila's version of that code doesn't use bigints: https://github.com/atilaneves/pythagoras/blob/master/range.d#L24
>
> The major problem with the D range implementation is that the compiler isn't able to find the optimization of hoisting the multiplication of the outer indexes out of the inner loop.
>
> See my responses to Atila in this thread.
>
> -Steve

Now, I'm replying to an old theard, I hope you're still interested enough to warrant necrobumping. The thing is, I did do some additional testing of the range version, and I think I found out a way to make the compiler to find the quoted optimization without doing it manually.

You just have to move the sqr calculations to where the data is still nested:

import std.experimental.all;
import std.datetime.stopwatch : AutoStart, StopWatch;

alias then(alias a)=(r)=>map!a(r).joiner;
void main(){
    auto sw = StopWatch(AutoStart.no);
    int total;

    if (true)
    {   sw.start;
        scope (success) sw.stop;
        auto triples=recurrence!"a[n-1]+1"(1L)
            .then!(z=>iota(1,z+1).then!(x=>iota(x,z+1).map!(y=>tuple(x,y,z))))
            .filter!((t)=>t[0]^^2+t[1]^^2==t[2]^^2)
            .until!(t=>t[2] >= 500);
        triples.each!((x,y,z){ total += x+y+z; });
    }

    writefln("Old loop time is %s microseconds", sw.peek.total!"usecs"); // 118_614
    sw.reset;

    if (true)
    {   sw.start;
        scope (success) sw.stop;
        auto triples=recurrence!"a[n-1]+1"(1L)
            .then!(z=>iota(1,z+1).then!
            (   x=>iota(x,z+1)
                .map!(y=> tuple(x,y,z))
                .filter!((t)=>t[0]^^2+t[1]^^2==t[2]^^2))
            )
            .until!(t=>t[2] >= 500);
        triples.each!((x,y,z){ total += x+y+z; });
    }

    writefln("New loop time is %s microseconds", sw.peek.total!"usecs"); // 21_936
    writeln(total); // to force the compiler to do the calculations
    return;
}

See, no manual caching of the squares. And the improvement is over 5X (dub --build=release --compiler=ldc2), which should bring it close to the other versions in the blog entry.
1 2 3 4 5 6 7 8 9
Next ›   Last »