I've had an extended Discord call taking a look at the codebase. Now, these are only my own thoughts, but I'd share them anyway:
- This is a fairly pedestrian codebase. No CTFE craziness, restrained "normal" use of templates. It's exactly the sort of code that D is supposed to be fast at.
- To be fair, his computer isn't the fastest. But it's an 8core AMD, so DMD's lack of internal parallelization hurts it here. This will only get worse in the future.
- And sure, there's a bunch of somewhat quadratic templates that explode a bit. But!
But!
- It's all "pedestrian" use. Containers with lots of members instantiated with lots of types.
- The compiler doesn't surface what is fast and what is slow and doesn't give you a way to notice it, no -vtemplates isn't enough, we need a way to tell the time taken not just the number of instantiations.
- But also if we're talking about number of instantiations,
hasUDA
andgetUDA
lead the pack. I think the way these work is just bad - I've rewritten all my ownhasUDA
/getUDA
code to be of the formudaIndex!(U, __traits(getAttributes, T))
- instantiating a unique copy for every combination of field and UDA is borderline quadratic - but that didn't help much even though-vtemplates
hinted that it should.-vtemplates
needs compiler time attributed to template recursively. - LLVM is painful. Unavoidable, but painful. Probably twice the compile time of the ldc2 run was in the LLVM backend.
- There was no smoking gun. It's not like "ah yeah, this thing, just don't do it." It's a lot of code that instantiates a lot of genuine workhorse templates (99% "function with type" or "struct with type"), and it was okay for a long time and then it wasn't.
I really think the primary issue here is just that D gives you a hundred tools to dig yourself in a hole, and has basically no tools to dig yourself out of it, and if you do so you have to go "against the grain" of how the language wants to be used. And like, as an experienced dev I know the tricks of how to optimize templates, and I've sunk probably a hundred hours into this for my two libs at work alone, but this is folk knowledge, it's not part of the stdlib, or the spec, or documented anywhere at all. Like if (__ctfe) return;
. Like udaIndex!(__traits)
. Like is(T : U*, U)
instead of isPointer
. Like making struct methods templates so they're only compiled when needed. Like moving recursive types out of templates to reduce the compilation time. Like keeping your unique instantiations as low as possible by querying information with traits at the site of instantiation. Like -v
to see where time is spent. Like ... and so on. This goes for every part of the language, not just templates.
DMD is fast. DMD is even fast for what it does. But DMD is not as fast as it implicitly promises when templates are advertised, and DMD does not expose enough good ways to make your code fast again when you've fallen in a hole.