Jump to page: 1 26  
Page
Thread overview
Divide & Conquer divides, but doesn't conquer
May 25, 2020
Timon Gehr
May 25, 2020
Timon Gehr
May 26, 2020
Walter Bright
May 25, 2020
Adam D. Ruppe
May 25, 2020
Walter Bright
May 25, 2020
Adam D. Ruppe
May 25, 2020
Walter Bright
May 25, 2020
Stefan Koch
May 27, 2020
Nick Treleaven
May 27, 2020
Stefan Koch
May 27, 2020
Stefan Koch
May 25, 2020
Stefan Koch
May 25, 2020
Adam D. Ruppe
May 25, 2020
Stefan Koch
May 25, 2020
FeepingCreature
May 25, 2020
Max Samukha
May 25, 2020
Stefan Koch
May 25, 2020
Stefan Koch
May 25, 2020
Stefan Koch
May 25, 2020
Adam D. Ruppe
May 25, 2020
Timon Gehr
May 25, 2020
Stefan Koch
May 25, 2020
Tove
May 25, 2020
Timon Gehr
May 25, 2020
Stefan Koch
May 25, 2020
Timon Gehr
May 25, 2020
Stefan Koch
May 25, 2020
Timon Gehr
May 25, 2020
Stefan Koch
May 25, 2020
Adam D. Ruppe
May 25, 2020
Stefan Koch
May 25, 2020
Walter Bright
May 25, 2020
Max Samukha
May 25, 2020
Adam D. Ruppe
May 25, 2020
Max Samukha
May 25, 2020
Adam D. Ruppe
May 25, 2020
Adam D. Ruppe
May 25, 2020
Max Samukha
May 25, 2020
Adam D. Ruppe
May 26, 2020
Max Samukha
May 25, 2020
FeepingCreature
May 25, 2020
Paul Backus
May 27, 2020
johnsmith101
May 25, 2020
Stefan Koch
May 24, 2020
Consider staticMap's current version (there are a few upcoming changes that don't change the point):

template staticMap(alias F, T...)
{
    static if (T.length == 0)
    {
        alias staticMap = AliasSeq!();
    }
    else static if (T.length == 1)
    {
        alias staticMap = AliasSeq!(F!(T[0]));
    }
    else
    {
        alias staticMap =
            AliasSeq!(
                staticMap!(F, T[ 0  .. $/2]),
                staticMap!(F, T[$/2 ..  $ ]));
    }
}

In the instantiation staticMap!(pred, A, B, C, D, E, F, G, H), recursion will instantiate:

staticMap!(pred, A, B, C, D)
staticMap!(pred, A, B)
staticMap!(pred, A)
staticMap!(pred, B)
staticMap!(pred, C, D)
staticMap!(pred, C)
staticMap!(pred, D)
staticMap!(pred, E. F, G, H)
staticMap!(pred, E, F)
staticMap!(pred, E)
staticMap!(pred, F)
staticMap!(pred, G, H)
staticMap!(pred, G)
staticMap!(pred, H)

Total: 15 instantiations including original. This is because a full tree with 8 leaves is being created. Generally, a tree with N leaves has about 2N - 1 nodes total (exactly so when N is a power of 2).

The more naive version would simply use linear decomposition:

template staticMap(alias F, T...)
{
    static if (T.length == 0)
    {
        alias staticMap = AliasSeq!();
    }
    else
    {
        alias staticMap =
            AliasSeq!(
                F!(T[0]),
                staticMap!(F, T[1 .. $]));
    }
}

There' no more need to handle the 1-element case, which is already a plus. But that's not the topic here. The instantiations created are:

staticMap!(pred, B, C, D, E, F, G, H)
staticMap!(pred, C, D, E, F, G, H)
staticMap!(pred, D, E, F, G, H)
staticMap!(pred, E, F, G, H)
staticMap!(pred, F, G, H)
staticMap!(pred, G, H)
staticMap!(pred, H)

Total: 8 instantiations including original. That's about half the current number of instantiations.

So it seems divide et impera doesn't quite help with certain recursive templates.
May 25, 2020
On 25.05.20 01:40, Andrei Alexandrescu wrote:
> ...
> 
> So it seems divide et impera doesn't quite help with certain recursive templates.

Assuming the cost of a template instantiation is linear in the number of template arguments, the first version has running time Θ(n log n) while the second version has running time Θ(n²).
May 24, 2020
On 5/24/20 8:44 PM, Timon Gehr wrote:
> On 25.05.20 01:40, Andrei Alexandrescu wrote:
>> ...
>>
>> So it seems divide et impera doesn't quite help with certain recursive templates.
> 
> Assuming the cost of a template instantiation is linear in the number of template arguments, the first version has running time Θ(n log n) while the second version has running time Θ(n²).

Of course. However, I'm assuming most of a template instantiation's cost is fixed.

There are other algorithms looking at in std.meta. Consider Reverse:

template Reverse(TList...)
{
    static if (TList.length <= 1)
    {
        alias Reverse = TList;
    }
    else
    {
        alias Reverse =
            AliasSeq!(
                Reverse!(TList[$/2 ..  $ ]),
                Reverse!(TList[ 0  .. $/2]));
    }
}

The recurrence formula for complexity is:

T(0) = k;
T(1) = k;
T(n) = 2 * T(n/2) + k

Again, I assume instantiating AliasSeq with two lists of any lengths has the same cost k. That makes complexity linear. So is the complexity of the simpler implementation:

template Reverse(TList...)
{
    static if (TList.length <= 1)
    {
        alias Reverse = TList;
    }
    else
    {
        alias Reverse =
            AliasSeq!(Reverse!(TList[1  .. $]), TList[0]);
    }
}

But this insantiates half the templates, so it goes easier on resources. This makes the use of the halving strategy not only an unnecessary distraction, but a less efficient choice.

There are other unnecessary (and/or unnecessarily baroque) artifacts in std.meta, such as EraseAllN. Far as I can tell it's unnecessary because NoDuplicates can be implemented without it (and if I'm not wrong, at the same complexity).

std.meta is due for a serious review.
May 24, 2020
Another one:

template Filter(alias pred, TList...)
{
    static if (TList.length == 0)
    {
        alias Filter = AliasSeq!();
    }
    else static if (TList.length == 1)
    {
        static if (pred!(TList[0]))
            alias Filter = AliasSeq!(TList[0]);
        else
            alias Filter = AliasSeq!();
    }
    else
    {
        alias Filter =
            AliasSeq!(
                Filter!(pred, TList[ 0  .. $/2]),
                Filter!(pred, TList[$/2 ..  $ ]));
    }
}

Why cut the problem size in half if it doesn't improve complexity?
May 25, 2020
On 25.05.20 03:25, Andrei Alexandrescu wrote:
> 
> Again, I assume instantiating AliasSeq with two lists of any lengths has the same cost k.

I don't think this is a realistic assumption with the current compiler implementation. Expression tuples are stored as arrays.
May 24, 2020
On 5/24/20 9:48 PM, Andrei Alexandrescu wrote:
> Another one:
> 
> template Filter(alias pred, TList...)
> {
>      static if (TList.length == 0)
>      {
>          alias Filter = AliasSeq!();
>      }
>      else static if (TList.length == 1)
>      {
>          static if (pred!(TList[0]))
>              alias Filter = AliasSeq!(TList[0]);
>          else
>              alias Filter = AliasSeq!();
>      }
>      else
>      {
>          alias Filter =
>              AliasSeq!(
>                  Filter!(pred, TList[ 0  .. $/2]),
>                  Filter!(pred, TList[$/2 ..  $ ]));
>      }
> }
> 
> Why cut the problem size in half if it doesn't improve complexity?

Because it doesn't work for large sets unless you do this.

If you do a "linear" recursive template, it fails after about 1000 levels. A divide-and-conquer will not get that deep.

So in essence, it's a solution for a compiler restriction, not for performance.

-Steve
May 24, 2020
On 5/24/20 10:32 PM, Steven Schveighoffer wrote:
> On 5/24/20 9:48 PM, Andrei Alexandrescu wrote:
>> Another one:
>>
>> template Filter(alias pred, TList...)
>> {
>>      static if (TList.length == 0)
>>      {
>>          alias Filter = AliasSeq!();
>>      }
>>      else static if (TList.length == 1)
>>      {
>>          static if (pred!(TList[0]))
>>              alias Filter = AliasSeq!(TList[0]);
>>          else
>>              alias Filter = AliasSeq!();
>>      }
>>      else
>>      {
>>          alias Filter =
>>              AliasSeq!(
>>                  Filter!(pred, TList[ 0  .. $/2]),
>>                  Filter!(pred, TList[$/2 ..  $ ]));
>>      }
>> }
>>
>> Why cut the problem size in half if it doesn't improve complexity?
> 
> Because it doesn't work for large sets unless you do this.
> 
> If you do a "linear" recursive template, it fails after about 1000 levels. A divide-and-conquer will not get that deep.
> 
> So in essence, it's a solution for a compiler restriction, not for performance.
> 
> -Steve

I see, thanks.
May 25, 2020
I ran a test to compare the two implementations. First was comparing 10,000 references to the same instance.

0.47user 0.11system 0:00.58elapsed 100%CPU (0avgtext+0avgdata 399688maxresident)k

No significant difference between implementations. This is expected since it caches based on arguments.

Second test compared Phobos' impl vs the simple impl but this time with 10,000 different random instantiations: http://arsdnet.net/dcode/ala.d

I'd comment one impl out, uncomment the other, rerun /usr/bin/test dmd ala.d as well as dmd -c ala.d.

               simple impl     phobos impl
compile time:         3.1s            3.7s
dmd peak RAM:         1.5G            1.9G
 binary size:         6.2M            6.2M

The generated files (bin and object) are the same between them, but the simple implementation beats the Phobos impl in both compile time and dmd memory.

Note that all the user-visible static maps had 9 arguments, but the two separate implementations do different things.

It is possible my test is flawed, I encourage you to create your own and see what happens.


I also tried a string mixin version inside the static map template and it took 4.0s and used 1.2G of memory. Win on RAM, but lost on speed. (It was not the CTFE that hit it - that got cached - it was the mixin being run on each unique argument list).

And a hand-written static map because the number of arguments is known ahead of time: 1.4s compile, 0.7G memory. Absolutely crushed it. It might be worth seeing what lengths of static map are most common in the wild and just having hand expansions. Adding static if has a cost though (upped my test to 1.5s/720MB). But not as high as template constraints (1.5s/740MB). But this is all getting increasingly artificial.

So I am willing to say number of template instantiations is the probable key metric.


All this was done on a stock DMD64 D Compiler v2.091.1 on linux. Several enhancements were merged to dmd master recently that may change the results btw.

quick test: simple impl: 2s, 1.57G. phobos impl: 2.3s, 1.97G

hand impl: 1.1s, 725M. w/ static if: 1.2s, 775M

Decent time improvement across the board... but the binary got bigger at 6.4M :( (prolly druntime's fault more than anything else but idk, im not focusing on that rn)
May 24, 2020
On 5/24/2020 8:19 PM, Adam D. Ruppe wrote:
> I ran a test to compare the two implementations. First was comparing 10,000 references to the same instance.

Thank you. This is good information. I'm curious how the new staticMap compares:

https://github.com/dlang/phobos/pull/7490
May 25, 2020
On Sunday, 24 May 2020 at 23:40:48 UTC, Andrei Alexandrescu wrote:
> Consider staticMap's current version (there are a few upcoming changes that don't change the point):
> 
>...
>
> In the instantiation staticMap!(pred, A, B, C, D, E, F, G, H), recursion will instantiate:
>
> staticMap!(pred, A, B, C, D)
> staticMap!(pred, A, B)
> staticMap!(pred, A)
> staticMap!(pred, B)
> staticMap!(pred, C, D)
> staticMap!(pred, C)
> staticMap!(pred, D)
> staticMap!(pred, E. F, G, H)
> staticMap!(pred, E, F)
> staticMap!(pred, E)
> staticMap!(pred, F)
> staticMap!(pred, G, H)
> staticMap!(pred, G)
> staticMap!(pred, H)
>
> Total: 15 instantiations including original. This is because a full tree with 8 leaves is being created. Generally, a tree with N leaves has about 2N - 1 nodes total (exactly so when N is a power of 2).
>
> The more naive version would simply use linear decomposition:
> ...
> There' no more need to handle the 1-element case, which is already a plus. But that's not the topic here. The instantiations created are:
>
> staticMap!(pred, B, C, D, E, F, G, H)
> staticMap!(pred, C, D, E, F, G, H)
> staticMap!(pred, D, E, F, G, H)
> staticMap!(pred, E, F, G, H)
> staticMap!(pred, F, G, H)
> staticMap!(pred, G, H)
> staticMap!(pred, H)
>
> Total: 8 instantiations including original. That's about half the current number of instantiations.
>
> So it seems divide et impera doesn't quite help with certain recursive templates.

To add to the good previous points: the divide-and-conquer strategy can take advantage of caching for the leaf instantiations with one parameter. That is, 8 template instantiations can trivially be reused, and the others can sometimes be reused if they hit a common subtree. The linear instantiation can only be reused once or twice at the very end.
« First   ‹ Prev
1 2 3 4 5 6