Thread overview
first numbers: type function vs template
Aug 21, 2020
Stefan Koch
Aug 21, 2020
Per Nordlöw
Aug 21, 2020
Stefan Koch
Aug 21, 2020
Stefan Koch
August 21, 2020
Good Evening,

I have been talking about type functions for a while now, and have claimed theoretical performance improvements when compared to templates.
As well as nicer syntax.

Unfortunately, in the example I am about to present the type function syntax is not going to look better than the template.

---
class C0{}
class C1 : C0{}
// 497 class definitions omitted for brevity
alias CX = C498;

version (templ)
{
    template type_h_template(T)
    {
        static if (is(T S == super) && S.length)
        {
            enum type_h_template = ( T.stringof ~ " -> " ~ S[0].stringof ~ "\n" ~ .type_h_template!(S));
        }
        else
        {
            enum type_h_template = "";
        }
    }

    static assert(type_h_template!(CX));
}
else
{
    string type_hierachy(alias T)
    {
        string result;

        alias base_class;
        // for now this is a typefunction-only __trait
        base_class = __traits(getBaseClass, T);
        while(is(base_class))
        {
            result ~= T.stringof ~ " -> " ~ base_class.stringof ~ "\n";
            T = base_class;
            base_class = __traits(getBaseClass, T);
        }

        return result;
    }


   static assert(type_hierachy(CX));
}
---

Note: that the template recursion limit is 500 which is why we can only go up to C498, the type function can work with virtually infinitely deep hierarchies

I now run the following command to benchmark:
hyperfine "generated/linux/release/64/dmd test_getBaseClass.d -sktf -o-" "generated/linux/release/64/dmd test_getBaseClass.d -sktf -version=templ -o-" -w 20 -r 500


And it results in:
Benchmark #1: generated/linux/release/64/dmd test_getBaseClass.d -sktf -o-
  Time (mean ± σ):      21.4 ms ±   2.8 ms    [User: 16.8 ms, System: 4.9 ms]
  Range (min … max):    14.0 ms …  29.3 ms    500 runs

Benchmark #2: generated/linux/release/64/dmd test_getBaseClass.d -sktf -version=templ -o-
  Time (mean ± σ):      33.0 ms ±   2.6 ms    [User: 26.9 ms, System: 6.3 ms]
  Range (min … max):    24.9 ms …  41.6 ms    500 runs

Summary
  'generated/linux/release/64/dmd test_getBaseClass.d -sktf -o-' ran
    1.54 ± 0.24 times faster than 'generated/linux/release/64/dmd test_getBaseClass.d -sktf -version=templ -o-'

Which shows that the type function is 1.5x faster for the chosen task, given naive implementations of both the type function and the template.

In terms of memory a quick best out of three reveals:
19080k for the template  and
14328k for the type function.

Which is a reduction by roughly 25%.

I am rather pleased by those numbers, given that type functions are still very much proof of concept and rather unoptimized.
August 21, 2020
On Friday, 21 August 2020 at 19:13:30 UTC, Stefan Koch wrote:
> In terms of memory a quick best out of three reveals:
> 19080k for the template  and
> 14328k for the type function.
>
> Which is a reduction by roughly 25%.
>
> I am rather pleased by those numbers, given that type functions are still very much proof of concept and rather unoptimized.

Nice.

BTW, is newCTFE's latency to high to give further speedups for this type function?
August 21, 2020
On Friday, 21 August 2020 at 19:13:30 UTC, Stefan Koch wrote:
> And it results in:
> Benchmark #1: generated/linux/release/64/dmd test_getBaseClass.d -sktf -o-
>   Time (mean ± σ):      21.4 ms ±   2.8 ms    [User: 16.8 ms, System: 4.9 ms]
>   Range (min … max):    14.0 ms …  29.3 ms    500 runs
>
> Benchmark #2: generated/linux/release/64/dmd test_getBaseClass.d -sktf -version=templ -o-
>   Time (mean ± σ):      33.0 ms ±   2.6 ms    [User: 26.9 ms, System: 6.3 ms]
>   Range (min … max):    24.9 ms …  41.6 ms    500 runs
>
> Summary
>   'generated/linux/release/64/dmd test_getBaseClass.d -sktf -o-' ran
>     1.54 ± 0.24 times faster than 'generated/linux/release/64/dmd test_getBaseClass.d -sktf -version=templ -o-'

Actually I was mislead by the results.

I forgot to subtract the overhead.
A file with the content "class C0{}"
takes on the same machine and comparable load 8.5 milliseconds.
Let's be generous and round it down to 8 milliseconds.

That means the real speedup is (33.0 - 8) / (21.4 - 8).
And with that we are at almost 1.9x on average.
If we compare the two min values: (24.9 - 8) / (14.0 - 8)
We are even at 2.8x.


August 21, 2020
On Friday, 21 August 2020 at 22:05:31 UTC, Per Nordlöw wrote:
> On Friday, 21 August 2020 at 19:13:30 UTC, Stefan Koch wrote:
>> In terms of memory a quick best out of three reveals:
>> 19080k for the template  and
>> 14328k for the type function.
>>
>> Which is a reduction by roughly 25%.
>>
>> I am rather pleased by those numbers, given that type functions are still very much proof of concept and rather unoptimized.
>
> Nice.
>
> BTW, is newCTFE's latency to high to give further speedups for this type function?

I am not sure.
It would depend on how much your introspection has to do.
And how many types you have to serialize.

newCTFE is an independent virtual machine with it's own ABI.
In order for it to work with types the types would have to be serialized and given a binary representation suitable for the VM to work with.
Currently I am undecided on how a binary representation would look.
Since that depends on what properties I want type functions to be able to use.
(type-size, type-members, data-layout, name, mangle, UDAs, baseClasses, vtbls, and so on)
I hope that Andrei's work on type info can inform my decision on that.
But all that is far in the future.