Jump to page: 1 2
Thread overview
Reducing variadic template combinatorics (C++ was onto something)
1 day ago
monkyyy
1 day ago
Daniel N
1 day ago
Dennis
23 hours ago
Dennis
23 hours ago
Paul Backus
20 hours ago
monkyyy
3 hours ago
Quirin Schroll
1 day ago

In a conversation on a pull request for one of my libraries, I came across an interesting revelation. Variadic template functions waste resources for the most part, because much of the time, you don't care about the relationship between the parameters. In many such functions, you just process them in a loop.

Let's start with the traditional variadic D pattern. I'll write a function which writes each individual parameter on its own line:

void writelines(T...)(T values)
{
    import std.stdio;
    static foreach(v; values)
        writeln(v);
}

Every combination of every type creates a new template instantiation. We only save on instantiations when 2 calls happen to match all their types.

But this function that is instantiated is all just calls to writeln! It's not an interesting function, nor is it really worthy of applying a combinatoric solution. We aren't getting any special optimization by having the entire list in view at once.

Let's take an example call:

writelines(1, 2, 3, 4);

Note how we had to type out each parameter individually, in the same order they would be processed in the loop. Well, we can write this ourselves!

writeln(1); writeln(2); writeln(3); writeln(4);

This accomplishes the same thing, but instead of a template instantiation per group of writes, we only get one instantiation of writeln for integers. We effectively have removed the combinatorics.

Now, this is quite the ugly solution! We have to repeat the call for each one.

But notice how we have written the same exact list! But instead of ", ", our separator is "); writeln(".

What if we could reduce that separator? Maybe we can use an opCall?

struct WriteLines
{
    ref opCall(T)(T val) {
        import std.stdio;
        writeln(val);
        return this;
    }
}

enum writelines2 = WriteLines.init;

Now, how does this look?

writelines2(1)(2)(3)(4);

Our separator has changed into ")(". A little nicer, but still looks weird.

But more importantly, we have one instantiation of the opCall for all integers. I can write any number of integers, or any combination of integers and strings, or anything else, and we only get one instantiation per type. The combinatorics are gone, and yet I'm mostly writing the same thing.

How about we try an operator? Wait, isn't there another language that does this?

struct WriteLinesCpp
{
    ref opBinary(string s: "<<", T)(T val) {
        import std.stdio;
        writeln(val);
        return this;
    }
}
enum coutlines = WriteLinesCpp.init;

And the usage is as you would expect:

coutlines << 1 << 2 << 3 << 4;

Again, the benefit here is less combinatorics -- one instantiation per type -- and less junk functions which are unrolling the unrolled loops that we typed in the first place.

But... I still want to write writelines(1, 2, 3, 4). The ergonomics there are nice! Is there some way we can capture this same reduction in complexity while still keeping the nice syntax?

I'll leave it up to the experts here to think about. I can probably think of ways, but I'm sure they would not stand up to scrutiny.

-Steve

1 day ago

On Tuesday, 14 October 2025 at 04:30:49 UTC, Steven Schveighoffer wrote:

>

In a conversation on a pull request for one of my libraries, I came across an interesting revelation. Variadic template functions waste resources for the most part, because much of the time, you don't care about the relationship between the parameters. In many such functions, you just process them in a loop.

[...]

I have fun ideas but youd need a benchmark; while I think I know how to estimate the big O of dmd, theres always the fun edge cases

1 day ago

On Tuesday, 14 October 2025 at 04:30:49 UTC, Steven Schveighoffer wrote:

>

Now, how does this look?

writelines2(1)(2)(3)(4);

-Steve

It's probably not what you want...
[1,2,3,4].writeln;
... but I think the syntax is sufficiently nice.

1 day ago

Very interesting thing you're bringing up, I'll share my thoughts.

On Tuesday, 14 October 2025 at 04:30:49 UTC, Steven Schveighoffer wrote:

>
void writelines(T...)(T values)
{
    import std.stdio;
    static foreach(v; values)
        writeln(v);
}

Every combination of every type creates a new template instantiation. We only save on instantiations when 2 calls happen to match all their types.

True, but the instantiations are all very short, so they are cheap to process and get inlined in an optimized build. In theory, it's not much worse than writing the calls yourself like in your second example:

>
writeln(1); writeln(2); writeln(3); writeln(4);

Much more problematic is:

void writelines(T...)(T values)
{
    static foreach(v; values)
    {
        // *hundreds of lines of implementation for writeln(v)*
    }
}

Because then you duplicate the implementation multiple times, even though it's probably 99% the same for int, uint, const(int), immutable(uint) etc. + every variadic combination of those. So it's best practice to reduce the number of possible template parameter values as early as possible before getting to an implementation. Don't instantiate impl!T for every integral type when impl!(T.sizeof) also works.

Here's a real example of employing this technique: https://github.com/dlang/druntime/pull/3852

>

How about we try an operator? (...)
And the usage is as you would expect:

coutlines << 1 << 2 << 3 << 4;

The problem here is that << is meant to do arithmetic, not construct a list. In Java you can write System.out.println("" + 1 + 2 + 3 + 4);, which is better, but + is still also used for arithmetic so it's still confusing (as seen by the "" + required to string concatenate rather than add). Now D has its own operator that naturally builds a list: ~

However, that often needlessly GC allocates, and it doesn't convert other types to strings. A pattern I often write is:

void writeInTag(ref OutBuffer buf, int x)
{
    buf ~= "<";
    buf ~= x;
    buf ~= ">";
}

I'd rather write:

buf ~= "<" ~ x ~ ">";

That doesn't work because I don't want string ~ int, I want it to use my OutBuffer ~ int implementation, which requires a different operator precedence:

(buf ~= "<") ~ x ~ ">";

This looks bad, so I've considered swapping out the assignment operator for something with lower precedence:

buf ~ "<" ~ x ~ ">";

But now we're back in C++ operator overloading abuse territory. An unsuspecting programmer would see this as a typo, creating a concatenated value and discarding it with no side effect. The only unambiguous one-liners I found to work so far are format strings or interpolated strings:

buf.format("<%s>", x);
buf ~= i"<$(x)>";
>

But... I still want to write writelines(1, 2, 3, 4). The ergonomics there are nice! Is there some way we can capture this same reduction in complexity while still keeping the nice syntax?

I'd say: just write a template helper that does nothing but forward each argument to another call. If that's somehow significantly worse than manually writing 4 function calls, investigate why that is and see if that can be fixed.

1 day ago

On Tuesday, 14 October 2025 at 12:30:23 UTC, Dennis wrote:

>

Very interesting thing you're bringing up, I'll share my thoughts.

On Tuesday, 14 October 2025 at 04:30:49 UTC, Steven Schveighoffer wrote:

>
void writelines(T...)(T values)
{
    import std.stdio;
    static foreach(v; values)
        writeln(v);
}

Every combination of every type creates a new template instantiation. We only save on instantiations when 2 calls happen to match all their types.

True, but the instantiations are all very short, so they are cheap to process and get inlined in an optimized build. In theory, it's not much worse than writing the calls yourself like in your second example:

>
writeln(1); writeln(2); writeln(3); writeln(4);

Yes, but it's not just runtime cost. There's cost to the compile time.

What I'm exploring is that we have an unrolled loop -- the parameter list. If somehow we can harness this into avoiding the different processing per call to reach the same conclusion, that's what I'm looking at. Whether this results in some kind of "best practice" or some special handling the compiler can use to save some compile time, I'm not sure. But I feel there is definitely some benefit that can be seen here.

>

Much more problematic is:

void writelines(T...)(T values)
{
    static foreach(v; values)
    {
        // *hundreds of lines of implementation for writeln(v)*
    }
}

Because then you duplicate the implementation multiple times, even though it's probably 99% the same for int, uint, const(int), immutable(uint) etc. + every variadic combination of those. So it's best practice to reduce the number of possible template parameter values as early as possible before getting to an implementation. Don't instantiate impl!T for every integral type when impl!(T.sizeof) also works.

It's also the same implementation for int, int, int, int!

the hundreds of lines of implementation is not only repeated for different types, it's also repeated for the same type with different parameter configuration.

In other words, writelines(1, 2) and writelines(1, 2, 3) creates 5 loop bodies, one for each parameter, whereas writelines2(1)(2) and writelines2(1)(2)(3) creates one "body" for int, and that's it.

The compiler compiles and processes all variations of function calls independently, and sees no benefit from realizing all the loop bodies are the same.

>

Here's a real example of employing this technique: https://github.com/dlang/druntime/pull/3852

Nice, yes this is part of it.

> >

How about we try an operator? (...)
And the usage is as you would expect:

coutlines << 1 << 2 << 3 << 4;

The problem here is that << is meant to do arithmetic, not construct a list. In Java you can write System.out.println("" + 1 + 2 + 3 + 4);, which is better, but + is still also used for arithmetic so it's still confusing (as seen by the "" + required to string concatenate rather than add).

I did not properly convey that I don't want this solution. It's just an exploration of what is possible with the current language.

I dislike C++ iostream operators, and prefer the parameter list style for writeln. But I had never considered that there is a compile time benefit to the way C++ is doing it.

> >

But... I still want to write writelines(1, 2, 3, 4). The ergonomics there are nice! Is there some way we can capture this same reduction in complexity while still keeping the nice syntax?

I'd say: just write a template helper that does nothing but forward each argument to another call. If that's somehow significantly worse than manually writing 4 function calls, investigate why that is and see if that can be fixed.

Right, if nothing else it is a "best practice" on how to avoid template bloat. But if there is a way we can convey to the compiler this pattern, it would be very nice!

-Steve

23 hours ago

On Tuesday, 14 October 2025 at 06:54:24 UTC, Daniel N wrote:

>

On Tuesday, 14 October 2025 at 04:30:49 UTC, Steven Schveighoffer wrote:

>

Now, how does this look?

writelines2(1)(2)(3)(4);

It's probably not what you want...
[1,2,3,4].writeln;
... but I think the syntax is sufficiently nice.

So this won't work if the types are different.

writelines("value", 1);
writelines2("value")(1);
// ["value", 1].writeln; // does not work

-Steve

23 hours ago

On Tuesday, 14 October 2025 at 04:30:49 UTC, Steven Schveighoffer wrote:

>

But... I still want to write writelines(1, 2, 3, 4). The ergonomics there are nice! Is there some way we can capture this same reduction in complexity while still keeping the nice syntax?

The answer to this, of course, is Lisp-style AST macros. You write some code that describes the AST transformation from writelines(1, 2, 3, 4); to writeln(1); writeln(2); writeln(3); writeln(4);, and the compiler executes this code at compile time and replaces the original AST with the result. Because the macro is totally ephemeral, it does not need to have space allocated for each expansion in the template cache, nor a unique name generated for the symbol table, nor time spent inlining the wrapper functions during optimization (or failing to), nor time spent during codegen and linking to process those wrapper functions.

Since D will never have AST macros, we are condemned for eternity to suffer from either template bloat or awkward syntax.

23 hours ago

On Tuesday, 14 October 2025 at 14:44:56 UTC, Steven Schveighoffer wrote:

>

Right, if nothing else it is a "best practice" on how to avoid template bloat. But if there is a way we can convey to the compiler this pattern, it would be very nice!

I don't see the difference between the two, isn't the following both a "best practice" as well as conveying the pattern to the compiler?

void f(T...)(T args) if (T.length > 1)
{
    foreach (arg; args)
        f(arg);
}

void f(T)(T arg)
{
    // Implementation
}
22 hours ago

On Tuesday, 14 October 2025 at 15:35:48 UTC, Dennis wrote:

>

On Tuesday, 14 October 2025 at 14:44:56 UTC, Steven Schveighoffer wrote:

>

Right, if nothing else it is a "best practice" on how to avoid template bloat. But if there is a way we can convey to the compiler this pattern, it would be very nice!

I don't see the difference between the two, isn't the following both a "best practice" as well as conveying the pattern to the compiler?

void f(T...)(T args) if (T.length > 1)
{
    foreach (arg; args)
        f(arg);
}

void f(T)(T arg)
{
    // Implementation
}

Like if there were some way for the function author to identify that the variadic args are going to be looped independently, the compiler can save the code generation per type.

I don't know, maybe it's not worth the effort.

The idea is to keep the normal parameter list style, but the compiler rewrites it to cacheable templates. Either by being clued in that the variadic list is going to be processed this way, or by detecting it itself.

For example, if it can tell that the loop body doesn't use anything except the element, then it could rewrite the function as a standalone template automatically. Sort of like a lambda can detect whether it can be a function or delegate based on whether it uses any data from the surrounding frame.

I just feel like when your function is just doing a foreach over the tuple, it's a waste for the function to even exist, since you created the tuple in the first place!

The whole impetus for this is someone in one of my projects wants to change a function from a typesafe variadic into a template variadic. I'm hesitant to bloat the code this way for the sake of a new feature.

I've written a lot of variadic functions where the code mostly just loops over the variadic and does the same thing to each parameter independently. The bloat is always annoying but it's the best user API available in D.

-Steve

20 hours ago

On Tuesday, 14 October 2025 at 15:29:04 UTC, Paul Backus wrote:

>

On Tuesday, 14 October 2025 at 04:30:49 UTC, Steven Schveighoffer wrote:

>

But... I still want to write writelines(1, 2, 3, 4). The ergonomics there are nice! Is there some way we can capture this same reduction in complexity while still keeping the nice syntax?

The answer to this, of course, is Lisp-style AST macros. You write some code that describes the AST transformation from writelines(1, 2, 3, 4); to writeln(1); writeln(2); writeln(3); writeln(4);, and the compiler executes this code at compile time and replaces the original AST with the result. Because the macro is totally ephemeral, it does not need to have space allocated for each expansion in the template cache, nor a unique name generated for the symbol table, nor time spent inlining the wrapper functions during optimization (or failing to), nor time spent during codegen and linking to process those wrapper functions.

Since D will never have AST macros, we are condemned for eternity to suffer from either template bloat or awkward syntax.

d has "ephemeral" compile time execution, I just dont know how much is deleted.

« First   ‹ Prev
1 2