October 02, 2020
On Friday, 2 October 2020 at 14:45:20 UTC, Stefan Koch wrote:
> now measure memory consumption during compilation :)

I did, it is there but small.

template:
14,096 KB

literal:
13,984 KB

your type function:
14,068 KB


In some individual runs there was enough variation where each one would come up on top and at bottom randomly but these are the average of 10 runs on my box.

Compile time too small to measure, all averaged 0.02s.

It is hard to say this would even matter if we did 1,000 of them.... let's try it.

added up top:

static foreach(i; 0 .. 1000)
mixin("struct i", i, " {}");


and in the function:

    static foreach(i; 0 .. 1000)
    mixin("static immutable convMatrix", i, " = makeConvMatrix(Byte, Ubyte, Short, Ushort, Int, Uint, Long, Ulong);");

    printf("%s\n", convMatrix0.ptr);



With type function:

0.37s, 71,756 KB

with template pattern:

0.06s, 34,944 KB

with string literal:

0.06s, 33,976 KB


The type function is a clear loser! But this is in part because the template is reusing its cached result. Let's break that by using those structs I generated

template pattern with 1,000 unique instances:

0.83s, 372,992 KB

type function with 1,000 unique instances:

 Error: function test.makeConvMatrix(alias[] types...) is not callable using argument types (i18, byte, ubyte, short, ushort, int, uint, long, ulong)
  cannot pass argument i18 of type i18 to parameter alias[] types...
[snip etc same thing for all 1000 cases]


Lovely. But I just grabbed your git branch so some in-progress work is to be expected to be incomplete...

What about a template?

test.d-mixin-42(42):        cannot pass argument test!18 of type test!18 to parameter alias[] types...


Is there a way I can pass it a list of unique types easily?


Probably fair to say it would perform about the same as before since the type function shouldn't be caching intermediate results (which is what makes the template fast before but bloat memory now), but would be nice to prove it.

PS for other readers, the template thing isn't *just* a cache, like it can't simply discard entries to it.... yet. With this specific pattern, the compiler actually *should* be able to free it  the helper function and even discard the whole cache in theory, but in practice that can only be done in specialized cases and it is very difficult to realize.
October 02, 2020
On Friday, 2 October 2020 at 15:11:18 UTC, Adam D. Ruppe wrote:
>
> The type function is a clear loser! But this is in part because the template is reusing its cached result.

Indeed.
You can fix that by wrapping the type function inside a template to make use of the instance caching, if you are expecting it to be called with the same arguments.

I would assume this to happen less in actual use-cases though.
Also keep in mind that CTFE can be sped up by 10x when switching to byte code interpreter.
You can't say the same for templates.
October 02, 2020
On Friday, 2 October 2020 at 15:11:18 UTC, Adam D. Ruppe wrote:
>
> type function with 1,000 unique instances:
>
>  Error: function test.makeConvMatrix(alias[] types...) is not callable using argument types (i18, byte, ubyte, short, ushort, int, uint, long, ulong)
>   cannot pass argument i18 of type i18 to parameter alias[] types...
> [snip etc same thing for all 1000 cases]
>
>
> Lovely. But I just grabbed your git branch so some in-progress work is to be expected to be incomplete...
>
> What about a template?
>
> test.d-mixin-42(42):        cannot pass argument test!18 of type test!18 to parameter alias[] types...

Indeed. A am currently fixing this bug.
The problem is that structs and classes are not considered types ... they are symbols.
Generally dmd doesn't like it if you pass types to functions and I have to circumvent the resistance ;)
October 02, 2020
On Friday, 2 October 2020 at 15:36:46 UTC, Stefan Koch wrote:
> On Friday, 2 October 2020 at 15:11:18 UTC, Adam D. Ruppe wrote:
>> [...]
>
> Indeed. A am currently fixing this bug.
> The problem is that structs and classes are not considered types ... they are symbols.
> Generally dmd doesn't like it if you pass types to functions and I have to circumvent the resistance ;)

Should be fixed now.
October 02, 2020
On Friday, 2 October 2020 at 15:11:18 UTC, Adam D. Ruppe wrote:
> template pattern with 1,000 unique instances:
>
> 0.83s, 372,992 KB
>
> type function with 1,000 unique instances:

With bug fixed so it builds got:

0.45s, 85,000 KB


So there is a win, and there's a few reasons for it:

1) in the template the function is saved in RAM despite never being needed again. The typefunction impl is actually reused. This could potentially be optimized in the current code (I tried and failed though, kept segfaulting, but someone who knows dmd better might be able to pull it off)

2) The template function is larger. All those tuple foreaches are unrolled, leading to a large function. The typefunction does not do this.

I wonder if the type function's foreach implementation could be used in the tuple case too. At least if there's no variable declaration based on the iterated variable it seems plausible.

3) This also is kinda the other extreme. All reused args is the template's extreme case (one instance reused) and zero reused is the TF's extreme case (it runs each time because it legitimately must). I suspect most real code will land somewhere in the middle.


But still 2x as fast and 1/4 the memory with today's implementations is a real victory in this case.
October 02, 2020
On Friday, 2 October 2020 at 17:38:17 UTC, Adam D. Ruppe wrote:
> On Friday, 2 October 2020 at 15:11:18 UTC, Adam D. Ruppe wrote:
>> template pattern with 1,000 unique instances:
>>
>> 0.83s, 372,992 KB
>>
>> type function with 1,000 unique instances:
>
> With bug fixed so it builds got:
>
> 0.45s, 85,000 KB
>
>
> So there is a win, and there's a few reasons for it:
>
> 1) in the template the function is saved in RAM despite never being needed again. The typefunction impl is actually reused. This could potentially be optimized in the current code (I tried and failed though, kept segfaulting, but someone who knows dmd better might be able to pull it off)
>
> 2) The template function is larger. All those tuple foreaches are unrolled, leading to a large function. The typefunction does not do this.
>
> I wonder if the type function's foreach implementation could be used in the tuple case too. At least if there's no variable declaration based on the iterated variable it seems plausible.
>
> 3) This also is kinda the other extreme. All reused args is the template's extreme case (one instance reused) and zero reused is the TF's extreme case (it runs each time because it legitimately must). I suspect most real code will land somewhere in the middle.
>
>
> But still 2x as fast and 1/4 the memory with today's implementations is a real victory in this case.

Thanks Adam and Stefan for getting us some concrete performance data.

As Adam notes, performance against actual source might differ.

I'm more excited about the ease of learning and better debugability of type functions than any compile time performance gains but I imagine we'll realize additional speedups if type functions are widely adopted.  Iteration will displace recursion in many places and we'll see a lot more "left leaning" code (early returns).

October 02, 2020
On Friday, 2 October 2020 at 19:13:43 UTC, Bruce Carneal wrote:
> I'm more excited about the ease of learning and better debugability of type functions than any compile time performance gains but I imagine we'll realize additional speedups if type functions are widely adopted.  Iteration will displace recursion in many places and we'll see a lot more "left leaning" code (early returns).

Eh, I'm still pretty skeptical. Notice that the code in this conversion matrix case is *identical* for type function and template, except for the header.

template version:

template makeConvMatrix(types...) {
  string helper() {
    /* same stuff */
  }
  enum makeConvMatrix = helper();
}

type function version:

string makeConvMatrix(alias[] types ...) {
   /* same stuff */
}


And then the call of course has the ! in the template version and doesn't for the type function version. But the actual function implementation is identical, the generated binary is identical. So very little actual difference here aside from the compile speed+ram mix in different cases.

Using the index pattern too (which actually compiles slower in most cases), you can get similar looking code:

template Filter(alias Pred, Args...) {
        private const(char)[] helper() {
                assert(__ctfe);
                size_t[Args.length] keep;
                size_t pos = 0;
                foreach(idx, alias arg; Args)
                        if(Pred!arg) keep[pos++] = idx;
                // generic helper function that expands to
                // AliasSeq!(Args[keep[0]], Args[keep[1]], ...)
                return makeResult(keep[0 .. pos]);
        }

        alias Filter = mixin(helper());
}


So again a normal loop building a normal array. Nothing too fancy here. But in my tests, Phobos' existing implementation is actually better in CT speed and memory.

The potential I see in the type functions are:

1) faster builds with less memory. I'm putting this under "maybe" since there's some cases I've tried where it wins, some where it loses. So I'm skeptical but if it works out it is worth it.

2) making compiler-tuples more natural to work with. Right now they are a bizarre beast you kinda have to trick into doing anything. The type function (I have been joking to stefan to call it Static Tuple Evaluation Function or STEF :P) might make them feel just like any other array which would be really cool and can make things nicer that right now must be string mixins at the usage point.

The current test implementation can't actually do this, but there's potential in making it work.

3) maybe just educate us on individual pieces we can borrow in other parts of the language. Its not-unrolled tuple foreach I think is a win, its CTFE-only nature is a clear win. Paul Backus' idea of static foreach declarations actually being ref might be an example of this too. These things might be extracted even if the overall concept doesn't work out.


We'll see where it goes.
1 2 3 4 5 6 7 8 9 10 11
Next ›   Last »