May 25, 2020
On Monday, 25 May 2020 at 03:19:51 UTC, Adam D. Ruppe wrote:
> All this was done on a stock DMD64 D Compiler v2.091.1 on linux. Several enhancements were merged to dmd master recently that may change the results btw.
>
> quick test: simple impl: 2s, 1.57G. phobos impl: 2.3s, 1.97G
>
> hand impl: 1.1s, 725M. w/ static if: 1.2s, 775M
>
> Decent time improvement across the board... but the binary got bigger at 6.4M :( (prolly druntime's fault more than anything else but idk, im not focusing on that rn)

Could you also try with this version, please?

template staticMap(alias F, T...)
{
    mixin(staticMapHelper!(T.length));
}

private enum staticMapHelper(size_t length) = staticMapHelperGen!(length);

private string staticMapHelperGen(size_t length)()
{
    string res = "alias staticMap = AliasSeq!(";
    static foreach (i; 0 .. length)
    {
        if (i) res ~= ", ";
        res ~= "F!(T[" ~ i.stringof ~ "])";
    }
    res ~= ");";
    return res;
}

static foreach and stringof were used so I didn't have to pull in `format` or `to!string`, which had issues in std.meta.

I'm seeing some improvement over 2.086 at least; it should be about equivalent to the hand-unrolled version in master.
May 25, 2020
On Monday, 25 May 2020 at 06:18:13 UTC, FeepingCreature wrote:

> static foreach and stringof were used so I didn't have to pull in `format` or `to!string`, which had issues in std.meta.
>
> I'm seeing some improvement over 2.086 at least; it should be about equivalent to the hand-unrolled version in master.

How could you miss the state-of-the-art?) Preallocate the result string and use the string counter hack!
http://dpldocs.info/this-week-in-d/Blog.Posted_2020_05_11.html#dmd-and-static-foreach
May 25, 2020
On Monday, 25 May 2020 at 06:18:13 UTC, FeepingCreature wrote:

>
> static foreach and stringof were used so I didn't have to pull in `format` or `to!string`, which had issues in std.meta.
>
> I'm seeing some improvement over 2.086 at least; it should be about equivalent to the hand-unrolled version in master.

State-of-the-art is to preallocate the result string and use the string counter hack)) http://dpldocs.info/this-week-in-d/Blog.Posted_2020_05_11.html#dmd-and-static-foreach

Stefan's work could make all this unnecessary but alas.
May 25, 2020
On Monday, 25 May 2020 at 06:18:13 UTC, FeepingCreature wrote:
> private enum staticMapHelper(size_t length) = staticMapHelperGen!(length);

lol, also called it staticMapHelper when doing this myself!

Solid result though: 2s, 810MB.

I'm kinda surprised it beat the crap out of my mixin version.

Mine was 3s, 1.2G using a mixin expression (alias staticMap = mixin(helper()). Changing it to a mixin declaration (the alias is in the mixin, like yours) brought it to 2.2s, 924MB. Interesting. And changing the helper from a template to a function gives another small win: 2.2s but 906MB.


> static foreach and stringof were used so I didn't have to pull in `format` or `to!string`, which had issues in std.meta.

Yeah, format and to!string are both surprisingly heavy. Just using to!string(int) even in runtime stuff has a large hit.

May 25, 2020
On Monday, 25 May 2020 at 07:26:52 UTC, Max Samukha wrote:
> State-of-the-art is to preallocate the result string and use the string counter hack))

In this case, it doesn't matter much because the string is generated only once per unique length, then it is cached thanks to the template arg in the helper enum.

So the CTFE only actually happens once, however you do it this way.

The preallocation becomes important when the generated string becomes large because it avoids the compiler leaking intermediates and wasting a LOT of time realloc and copying again - when it is just a few hundred bytes like this it doesn't really matter.

When the CTFE expression finishes btw the compiler does free the memory pool. So even if you did this little thing over and over, it can be OK. But again when the generated string becomes very large, the peak memory can be prohibitive (it reaches tens of gigabytes of pure waste surprisingly quickly).

BTW on ctfe cache vs recalculate it depends on how much reuse you have. In my jni.d, the first version had an inline string replace in a loop. It took 30 GB of memory [!!!] and took several minutes to compile.

Simply moving that out and pregenerating a `static immutable` so CTFE was only invoked once, took it down to 3 GB and 25 seconds. Still slow, but huge, huge improvement.

But that's because there were only actually two variants of that string and it was fairly large. So caching made sense (static immutable btw is a bit better than a helper template holding the result - both cache but it is like a single variable vs a hash table; in fact that's how it is implemented). But if there were hundreds of variants instead of two, the cache itself would get very expensive.
May 25, 2020
On Monday, 25 May 2020 at 13:00:59 UTC, Adam D. Ruppe wrote:
> Solid result though: 2s, 810MB.

That was using the new dmd master, more specifically it was 1.6s that I rounded up. With the released dmd, 1.9s, 750MB.

either way, the mixin helps here - but that may not be true if there was a variety of lengths. This test did fixed length because I wanted to specifically prove that arg length isn't as important as number of instantiations here...

we should prolly test a variety of lengths too, it could change a lot.
May 25, 2020
On Monday, 25 May 2020 at 05:55:07 UTC, Walter Bright wrote:
> Thank you. This is good information. I'm curious how the new staticMap compares:
>
> https://github.com/dlang/phobos/pull/7490

Using dmd master: 1.8s 1.2G
Using released dmd: 2.1s, 1.1G

(I think something in dmd master has a higher fixed memory cost, so remember to compare apples and apples.)

So yes, the static if version is an improvement over what it has now, at least for these cases. We should probably make another test rig to see more variety...
May 25, 2020
On Monday, 25 May 2020 at 13:13:11 UTC, Adam D. Ruppe wrote:

>
> In this case, it doesn't matter much because the string is generated only once per unique length, then it is cached thanks to the template arg in the helper enum.

Yes, I overlooked the enum. My version of static map just calls the generating function directly.

>
> So the CTFE only actually happens once, however you do it this way.
>+ When the CTFE expression finishes btw the compiler does free
> the memory pool. So even if you did this little thing over and over, it can be OK. But again when the generated string becomes very large, the peak memory can be prohibitive (it reaches tens of gigabytes of pure waste surprisingly quickly).
>
> BTW on ctfe cache vs recalculate it depends on how much reuse you have. In my jni.d, the first version had an inline string replace in a loop. It took 30 GB of memory [!!!] and took several minutes to compile.
>
> Simply moving that out and pregenerating a `static immutable` so CTFE was only invoked once, took it down to 3 GB and 25 seconds. Still slow, but huge, huge improvement.
>
> But that's because there were only actually two variants of that string and it was fairly large. So caching made sense (static immutable btw is a bit better than a helper template holding the result - both cache but it is like a single variable vs a hash table; in fact that's how it is implemented).
> But if there were hundreds of variants instead of two, the cache itself would get very expensive.

Right. This is applicable to memoization in general.


May 25, 2020
On Sunday, 24 May 2020 at 23:40:48 UTC, Andrei Alexandrescu wrote:
> Consider staticMap's current version (there are a few upcoming changes that don't change the point):
>
> [...]

Manu's DIP does solve this problem.
In O(n) guaranteed.
May 25, 2020
On 5/25/20 1:55 AM, Walter Bright wrote:
> On 5/24/2020 8:19 PM, Adam D. Ruppe wrote:
>> I ran a test to compare the two implementations. First was comparing 10,000 references to the same instance.
> 
> Thank you. This is good information. I'm curious how the new staticMap compares:
> 
> https://github.com/dlang/phobos/pull/7490

This is reasonable. I've seen a *lot* worse in optimizing libraries. A *LOT*. Another possible improvement: binary search through the special cases.