Jump to page: 1 2 3
Thread overview
New blog post on the cost of compile time
Jan 16, 2023
Arjan
Feb 18, 2023
Nick Treleaven
Feb 18, 2023
Nick Treleaven
Feb 18, 2023
Paul Backus
Feb 19, 2023
ryuukk_
Feb 19, 2023
Nick Treleaven
Jan 16, 2023
FeepingCreature
Jan 16, 2023
max haughton
Jan 16, 2023
Hipreme
Jan 17, 2023
H. S. Teoh
Jan 18, 2023
Walter Bright
Jan 19, 2023
Commander Zot
Jan 19, 2023
Commander Zot
January 15, 2023

In this post: https://forum.dlang.org/post/tm3p0p$2js2$1@digitalmars.com

I mentioned:

>

I did a test on something I was working on for my talk, and I'm going to write a blog post about it, because I'm kind of stunned at the results.

Well, I finally got around to it:

https://www.schveiguy.com/blog/2023/01/the-cost-of-compile-time-in-d/

Let me know what you think.

-Steve

January 16, 2023

On Monday, 16 January 2023 at 04:30:25 UTC, Steven Schveighoffer wrote:

>

https://www.schveiguy.com/blog/2023/01/the-cost-of-compile-time-in-d/

Let me know what you think.

I think it is marveleous. Wondering are there any down sides to using typeof?

January 16, 2023

On Monday, 16 January 2023 at 04:30:25 UTC, Steven Schveighoffer wrote:

>

In this post: https://forum.dlang.org/post/tm3p0p$2js2$1@digitalmars.com

I mentioned:

>

I did a test on something I was working on for my talk, and I'm going to write a blog post about it, because I'm kind of stunned at the results.

Well, I finally got around to it:

https://www.schveiguy.com/blog/2023/01/the-cost-of-compile-time-in-d/

Let me know what you think.

-Steve

Looks like :handwaves: given a 2.4Ghz processor, that'd be 150k cycles per ReturnType instantiation? Not super much, but not nothing either. If that distributes over five templates, it'd be something like 30k for a template instantiation in general. For something that hits the allocator a few times, that seems ... about right?

Indicating to me that if we want this to be fast, we have to find a way to make a template instantiation do less. I think that's gonna be a hard sell, given that the instantiation absolutely has to make a copy of the entire template body AST given the compiler design as it is.

Looking at a profiler, how would you say these cycles are distributed between "copy tree" and "walk tree for semantic"?

January 16, 2023

On Monday, 16 January 2023 at 09:12:59 UTC, FeepingCreature wrote:

>

On Monday, 16 January 2023 at 04:30:25 UTC, Steven Schveighoffer wrote:

>

In this post: https://forum.dlang.org/post/tm3p0p$2js2$1@digitalmars.com

I mentioned:

>

I did a test on something I was working on for my talk, and I'm going to write a blog post about it, because I'm kind of stunned at the results.

Well, I finally got around to it:

https://www.schveiguy.com/blog/2023/01/the-cost-of-compile-time-in-d/

Let me know what you think.

-Steve

Looks like :handwaves: given a 2.4Ghz processor, that'd be 150k cycles per ReturnType instantiation? Not super much, but not nothing either. If that distributes over five templates, it'd be something like 30k for a template instantiation in general. For something that hits the allocator a few times, that seems ... about right?

Processors these days are both faster than that and have pretty good IPC compiling D code so it's worse.

>

Indicating to me that if we want this to be fast, we have to find a way to make a template instantiation do less. I think that's gonna be a hard sell, given that the instantiation absolutely has to make a copy of the entire template body AST given the compiler design as it is.

A lot of these copies are made defensively — some of them are actually required, or at least require a different theoretical model of compilation to be avoided, whereas others are basically just made to avoid mutating the original AST. Some of this just boils down to not having a const-first attitude, other things are harder.

Making more things const makes avoiding these copies easier. I was fiddling around with a dmd patch that automatically spits out a diff that adds const to things where it can.

>

Looking at a profiler, how would you say these cycles are distributed between "copy tree" and "walk tree for semantic"?

A lot of dmd compilation is spent doing memcpy. However I think it's actually mostly caused by CTFE arrays being shunted around.

The thing to do for those arrays is probably to refcount them so the copy on write can be done in place for the hopefully common case.

January 16, 2023
On 1/16/23 3:10 AM, Arjan wrote:
> On Monday, 16 January 2023 at 04:30:25 UTC, Steven Schveighoffer wrote:
>> https://www.schveiguy.com/blog/2023/01/the-cost-of-compile-time-in-d/
>>
>> Let me know what you think.
> 
> I think it is marveleous. Wondering are there any down sides to using typeof?

In this instance, no. This was just a case of "oh, look, there's ReturnType, so I can just use that" instead of trying to actively avoid using the tools in std.traits.

But in general, we still want to be able to use the cool tools that Phobos gives us without too much penalty, so the larger problem still remains -- templates just should be better performing.

In general, Phobos templates should try to avoid using simple wrappers for internal things. One thing I didn't discuss in the post is that the `ReturnType` instances here are only ever going to be instantiated *once*, and on something that is *never used* (the lambda function). Once the boolean for `isInputRange` is decided, there is no reason to keep that stuff around. Some way to cull those templates from the cache would be most welcome.

-Steve
January 16, 2023

On 1/16/23 4:12 AM, FeepingCreature wrote:

>

On Monday, 16 January 2023 at 04:30:25 UTC, Steven Schveighoffer wrote:

>

In this post: https://forum.dlang.org/post/tm3p0p$2js2$1@digitalmars.com

I mentioned:

>

I did a test on something I was working on for my talk, and I'm going to write a blog post about it, because I'm kind of stunned at the results.

Well, I finally got around to it:

https://www.schveiguy.com/blog/2023/01/the-cost-of-compile-time-in-d/

Let me know what you think.

Looks like :handwaves: given a 2.4Ghz processor, that'd be 150k cycles per ReturnType instantiation? Not super much, but not nothing either. If that distributes over five templates, it'd be something like 30k for a template instantiation in general. For something that hits the allocator a few times, that seems ... about right?

Indicating to me that if we want this to be fast, we have to find a way to make a template instantiation do less. I think that's gonna be a hard sell, given that the instantiation absolutely has to make a copy of the entire template body AST given the compiler design as it is.

Absolutely, I welcome any improvements that bring the current phobos into line with the improved version. I would imagine some penalty for ReturnType, regardless of how much can be improved. And of course, there's the whole question of "is this the right abstraction to use?". Would there be a better way to write ReturnType that doesn't cost as much, maybe using CTFE?

I don't know enough about the actual implementation, so its hard for me to have a productive discussion on it. All I can do is try things and measure.

-Steve

January 16, 2023

On Monday, 16 January 2023 at 04:30:25 UTC, Steven Schveighoffer wrote:

>

In this post: https://forum.dlang.org/post/tm3p0p$2js2$1@digitalmars.com

I mentioned:

>

I did a test on something I was working on for my talk, and I'm going to write a blog post about it, because I'm kind of stunned at the results.

Well, I finally got around to it:

https://www.schveiguy.com/blog/2023/01/the-cost-of-compile-time-in-d/

Let me know what you think.

-Steve

Pretty interesting this post!
Although that really makes me sad because it only shows we can't really create helpers or use D effectively. I wish we could create a better thing for that.

January 17, 2023
On Sun, Jan 15, 2023 at 11:30:25PM -0500, Steven Schveighoffer via Digitalmars-d wrote: [...]
> https://www.schveiguy.com/blog/2023/01/the-cost-of-compile-time-in-d/
[...]

Honestly, I find templates like ReturnType in std.traits a bit of a code smell. Same thing as Parameters, and a whole bunch of others.  Yes, it has a pretty-sounding name, and yes, you get to avoid writing that squint-inducing __traits(...) syntax, but if you take a step back, it just begs the question, why can't we do this directly in the language itself?

You pointed out that there are various reasons for it -- no easy way of getting an instance of the type, need to handle different kinds of callables, etc., but to me, those are all merely circumstantial issues. It begs the question, why *isn't* there a construct to obtain an instance of some type T (even if hypothetical, for the purposes of introspection)?  After all, the compiler knows T inside-out, and ought to be able to cook up a (virtual) instance of it.

The crux of the problem is that the in spite of D's metaprogramming prowess being often promoted, the language *itself* doesn't let you do certain common things easily.  It lets you use, e.g., typeof() in certain cases, but in other cases you need this or that workaround or paraphrasis, and so another std.traits wrapper template is born.  If the language had instead been extended so that you could, for example, extract the return type of some given callable directly, say typeof(return(myfunc)), then none of this would have been necessary in the first place.

Having wrappers in Phobos for doing certain things makes sense when a particular feature or introspective capability is still new / newly discovered: it wasn't anticipated so the language doesn't have a construct to express it directly, so a Phobos template helps to wrap it up in a nicer, more friendly and usable form for end users to use.  But once a particular construct has become recurrent and a standard part of D idiom, it deserves to be baked into the language directly. Especially when doing so eliminates a lot of the collateral costs.


T

-- 
What's an anagram of "BANACH-TARSKI"?  BANACH-TARSKI BANACH-TARSKI.
January 18, 2023
On 1/17/2023 1:58 PM, H. S. Teoh wrote:
> If the
> language had instead been extended so that you could, for example,
> extract the return type of some given callable directly, say
> typeof(return(myfunc)), then none of this would have been necessary in
> the first place.

https://dlang.org/spec/expression.html#is_expression

    int func();

    static if (is(typeof(func) R == return))
        pragma(msg, R);

prints:

    int

The implementation of std.traits.ReturnType is:

    template ReturnType(alias func)
    if (isCallable!func)
    {
        static if (is(FunctionTypeOf!func R == return))
            alias ReturnType = R;
        else
            static assert(0, "argument has no return type");
    }

ReturnType can do a little more than the raw IsExpression, as it can identify:

    struct G
    {
        int opCall (int i) { return 1;}
    }

January 19, 2023
On 1/18/23 5:20 PM, Walter Bright wrote:
> On 1/17/2023 1:58 PM, H. S. Teoh wrote:
>> If the
>> language had instead been extended so that you could, for example,
>> extract the return type of some given callable directly, say
>> typeof(return(myfunc)), then none of this would have been necessary in
>> the first place.
> 
> https://dlang.org/spec/expression.html#is_expression
> 
>      int func();
> 
>      static if (is(typeof(func) R == return))
>          pragma(msg, R);
> 
> prints:
> 
>      int
> 
> The implementation of std.traits.ReturnType is:
> 
>      template ReturnType(alias func)
>      if (isCallable!func)
>      {
>          static if (is(FunctionTypeOf!func R == return))
>              alias ReturnType = R;
>          else
>              static assert(0, "argument has no return type");
>      }
> 
> ReturnType can do a little more than the raw IsExpression, as it can identify:
> 
>      struct G
>      {
>          int opCall (int i) { return 1;}
>      }
> 

I didn't think of making ReturnType simplified (we know in this case the thing is not anything except a normal lambda function). I did this now:

```d
template RT(alias sym) {
    static if(is(typeof(sym) R == return))
        alias RT = R;
    else
        static assert(false, "bad");
}

...

else version(useIsExpr)
{
    enum isInputRange(R) = is(typeof(R.init) == R)
    && is(RT!((R r) => r.empty) == bool)
    && (is(typeof((return ref R r) => r.front)) || is(typeof(ref (return ref R r) => r.front)))
    && !is(RT!((R r) => r.front) == void)
    && is(typeof((R r) => r.popFront));
}
```

The result is still not as good as just using typeof directly, but much much better. When compared to a direct typeof, it adds about 0.15s of total compile time for 10000 instances, and adds 100MB more memory usage.

My point still stands -- *inside* a constraint template, you should avoid using all kinds of convenience templates if you can help it. Nobody cares about the implementation of `isInputRange`, as long as it gives the right answer.

Now, Adam has a point (in his comment on my blog) that if you are *already* using such convenience templates elsewhere on the *same parameters*, then this can have a negative effect on overall performance, because the caching of the template answer can speed up the compilation. In this case, the template instantiation is guaranteed to be unique since these are lambda expressions, so that doesn't apply.

I think we can all agree though that it is less than ideal to have to worry about the internal details of how templates are implemented.

-Steve
« First   ‹ Prev
1 2 3