January 10, 2024
On 1/10/2024 5:53 PM, Nickolay Bukreyev wrote:
> Exactly. Let me try to explain why DIP1036 is doing what it is doing. For illustrative purposes, I’ll be drastically simplifying code; please excuse me for that.

Thank you for the explanation. It was entirely missing from the spec, and I overlooked it in the code. (This is why reverse engineering a spec from code is not so easy.) It is indeed clever.

As for it being a required feature of string interpolation to do this processing at compile time, that's a nice feature, not a must have.

The enum proposal is to obviate the requirement for a header and footer template, which is a big improvement.

January 11, 2024

On Wednesday, 10 January 2024 at 19:53:48 UTC, Walter Bright wrote:

> >

And you can get rid of the runtime overhead by adding a pragma(inline, true) writeln overload. (I guess with DMD that will still bloat the executable,

I didn't mention the other kind of bloat - the rather massive number and size of template names being generated that go into the object file, as well as all the uncalled functions generated only to be removed by the linker.

Yes, DIP1036e has a lot of extra templates generated, and the mangled name is going to be large.

Let's skip for a moment the template that writeln will generate (which I agree isn't ideal, but also is somewhat par for the course).

This shouldn't be a huge problem for the interpolation types because the type doesn't get included in the binary. It is a big problem for the toString function, because that is included.

However, we can mitigate the ones that return null:

string __interpNull() => null;

struct InterpolatedExpression(string expr)
{
  alias toString = __interpNull;
}

... // and so on

I tested this and it does work. So this reduces all the toString member functions from InterpolatedExpression (and InterpolationPrologue and InterpolationEpilog, but those are not templated structs anyway) to one function in the binary.

But we can't do this for InterpolatedLiteral (which by the way is improperly described in Atila's DIP, the associated toString member function should return the literal).

We can do possibly a couple things here to mitigate:

  1. We can modify how std.format works so it will accept the following as a toString hook:
struct S
{
   enum toString = "I am an S";
}

This means, no function calls, no extra long symobls in the binary (since it's an enum, it should not go in), and I think even the compilation will be faster.

  1. We modify it to be aware of InterpolationLiteral types, and avoid depending on the toString API. After all, we own both Phobos and druntime, we can coordinate the release.

And as a further suggestion, though this is kind of off-topic, we may look into ways to have templates that don't make it into the binary explicitly. Basically, they are marked as shims or forwarders by the library author, and just serve as a way to write nicer syntax. This could help in more than just the interpolation DIP.

>

As far as I can tell, the only advantage of DIP1036 is the use of inserted templates to "key" the tuples to specific functions. Isn't that what the type system is supposed to do? Maybe the real issue is that a format string should be a different type than a conventional string.

No. While I agree that having a different type makes it more useful and easier to hook, there is a fundamental problem being solved with the compile-time literals being passed to the function. Namely, tremendous power is available to validate, parse, prepare, etc. string data at compile time, for use during runtime. This simply is not possible with 1027.

The runtime benefits are huge:

  • No need to allocate anything (@nogc, -betterC, etc. all available)
  • You get compiler errors instead of runtime errors (if you put in the work)
  • It's possible generate "perfect forwarding" to another function that does use another form. For example, printf.
  • If you inline the call, it can be as if you called the forwarded function directly with the exactly correct parameters.

And I want to continue to point out, that a constructed "format string" mechanism just is inferior, regardless if it is another type, as long as you don't need formatting specifiers (and arguably, it's just a difference in taste otherwise). The compiler parsed it out, it knows the separate pieces. Giving those pieces directly to the library is both the most efficient way, and also the most obvious way. The "format string" mechanism, while making sense for writef, must add an element of complexity to the receiving function, since it now has to know what "language" the translated string is. e.g. with DIP1027, one must know that %s is special and what it represents, and the user must know to escape %s to avoid miscommunication. With 1036e, there is no format string, so there is no complication there, or confusion. The value being passed is right where you would expect it, and you don't have to parse a separate thing to know.

Note in YAIDIP, this was done partly through an interpolation header, which had all the compile-time information, and then strings and interpolated data were interspersed. I find this also a workable solution, and could even do without the strings being passed interspersed (as I said, we have control over writeln and text), but I think the ordering of the tuple to match what the actual string literal looks like is so intuitive, and we would be losing that if we did some kind of "format header" mechanism.

-Steve

January 11, 2024
On 11/01/2024 2:53 PM, Nickolay Bukreyev wrote:
> I’d say DIP1036, as we see it now, relies on a clever workaround of a limitation imposed by the language. If that limitation is gone, the DIP will become simpler.

Another potential solution would be to allow passing metadata on the function call side, to the function.

Consider:

``i"prefix${expr:format}suffix"``

Could be:

```d
func("prefix", @format("format") expr, "suffix");

void func(T...)(T args) {
	pragma(msg, __traits(getAttributes, args[1])); // format("format")
}

```

This is so much simpler than what 1036e is.

But it does require another language feature.
January 10, 2024
On 1/10/2024 7:07 AM, Nickolay Bukreyev wrote:
> Zero-sized structs are never passed as arguments. Inlining is not necessary to get rid of them.

Structs with no fields have a size of 1 byte for D and C++ structs, and 0 or 4 for C structs (depending on the target). The rationale for a non-zero size is so that different structs instances will be at different addresses.

```d
struct S { }

void foo(S s);

void test(S s)
{
    foo(s);
}
```

```
                push    RBP
                mov     RBP,RSP
                sub     RSP,8
                push    dword ptr 010h[RBP]
                call      _D5test43fooFSQm1SZv@PC32
                add     RSP,010h
                pop     RBP
                ret
```
January 11, 2024

On Thursday, 11 January 2024 at 02:21:17 UTC, Walter Bright wrote:

>

As for it being a required feature of string interpolation to do this processing at compile time, that's a nice feature, not a must have.

Importance of the ability to do processing at compile time was stated by:

>

The enum proposal is to obviate the requirement for a header and footer template, which is a big improvement.

Header and footer are not templates; InterpolatedLiteral and InterpolatedExpression are. Yes, the latter two can be replaced by enums iff it becomes possible to pass arbitrary expressions to alias parameters. And I agree it would be a big improvement.

>

Structs with no fields have a size of 1 byte for D and C++ structs, and 0 or 4 for C structs (depending on the target).

Yes, I mistakenly wrote, zero-sized, when I meant, empty.

January 11, 2024

On Thursday, 11 January 2024 at 02:35:00 UTC, Richard (Rikki) Andrew Cattermole wrote:

>
void func(T...)(T args) {
    pragma(msg, __traits(getAttributes, args[1])); // format("format")
}

Sorry, I don’t understand how this can possibly work. After func template is instantiated, its T is bound to, e.g., AliasSeq!(string, int, string). args is just a local variable of type AliasSeq!(string, int, string). How can __traits know what attributes were attached at call site?

If, on the other hand, attributes do affect the type, then IMHO

func("prefix", @format("format") expr, "suffix");

is not much different than

func("prefix", format!"format"(expr), "suffix");

I.e., we can do it already.

January 11, 2024
On 11/01/2024 5:31 PM, Nickolay Bukreyev wrote:
> On Thursday, 11 January 2024 at 02:35:00 UTC, Richard (Rikki) Andrew Cattermole wrote:
>> ```d
>> void func(T...)(T args) {
>>     pragma(msg, __traits(getAttributes, args[1])); // format("format")
>> }
>> ```
> 
> Sorry, I don’t understand how this can possibly work. After `func` template is instantiated, its `T` is bound to, e.g., `AliasSeq!(string, int, string)`. `args` is just a local variable of type `AliasSeq!(string, int, string)`. How can `__traits` know what attributes were attached at call site?
> 
> If, on the other hand, attributes do affect the type, then IMHO
> 
> ```d
> func("prefix", @format("format") expr, "suffix");
> ```
> 
> is not much different than
> 
> ```d
> func("prefix", format!"format"(expr), "suffix");
> ```
> 
> I.e., we can do it already.

This has side effects. It affects ``ref`` and ``out``. It also affects lifetime analysis.

So we can't do it currently.

But yes, it affects the type, without being in the type system explicitly as it is meta data.
January 11, 2024

On Thursday, 11 January 2024 at 04:34:33 UTC, Richard (Rikki) Andrew Cattermole wrote:

>

This has side effects. It affects ref and out. It also affects lifetime analysis.

So we can't do it currently.

But yes, it affects the type, without being in the type system explicitly as it is meta data.

Thank you for the clarification. I see a downside that pretty much any generic code should strip the annotations off its arguments after it inspected them, to reduce template bloating. However, we are probably going off-topic.

January 11, 2024
On 1/10/24 20:53, Walter Bright wrote:
> On 1/9/2024 2:38 PM, Timon Gehr wrote:
>  > %s7 8 9
> 
> Yes, I used writeln instead of writefln. The similarity between the two names is a source of error, but if that was a festering problem we'd have seen a lot of complaints about it by now.
> ...

My point was with DIP1036e it either works or does not compile, not that you called the wrong function.

> 
>> And you can get rid of the runtime overhead by adding a `pragma(inline, true)` `writeln` overload. (I guess with DMD that will still bloat the executable,
> 
> Try it and see.
> 
> I didn't mention the other kind of bloat - the rather massive number and size of template names being generated that go into the object file, as well as all the uncalled functions generated only to be removed by the linker.
> ...

I understand the drawbacks of DIP1036e which it shares with most non-trivial metaprogramming. D underdelivers in this department at the moment, but this still remains one of the key selling points of D.

The issue is that DIP1027 is worse than DIP1036e. DIP1027 is also worse than nothing. It has been rejected for good reason. For some reason you however keep insisting it is essentially as useful as DIP1036e. That's just not the case.

I think a much better answer to DIP1036e than a DIP1027 revival would have been to add a -preview=experimental-DIP1036e flag and do a call to action to resolve language issues and limitations that force DIP1036e to generate bloat. Maybe there would have been an even better way to handle this.

> As far as I can tell, the only advantage of DIP1036 is the use of inserted templates to "key" the tuples to specific functions.

Well, this is not the case, that is not the only advantage.

> Isn't that what the type system is supposed to do? Maybe the real issue is that a format string should be a different type than a conventional string. For example:
> 
> ```d
> extern (C) pragma(printf) int printf(const(char*), ...);
> 
> enum Format : string;
> 
> void foo(Format f) { printf("Format %s\n", f.ptr); }
> void foo(string s) { printf("string %s\n", s.ptr); }
> 
> void main()
> {
>      Format f = cast(Format)"f";
>      foo(f);
>      string s = "s";
>      foo(s);
> }
> ```
> which prints:
> 
> Format f
> string s
> 
> If we comment out `foo(string s)`:
> 
> test2.d(14): Error: function `test2.foo(Format f)` is not callable using argument types `(string)`
> test2.d(14):        cannot pass argument `s` of type `string` to parameter `Format f`
> 
> If we comment out `foo(Format s)`:
> 
> string f
> string s
> 
> This means that if execi()'s first parameter is of type `Format`, and the istring generates the format string with type `Format`, this key will fit the lock. A string generated by other means, such as `.text`, will not fit that lock.
> 

Well, this is a step in the right direction, but rest assured if this was the only advantage of DIP1036e, then Adam would have gone with this suggestion. I am almost sure this is one of the ideas he discarded.
January 11, 2024
On 1/11/24 03:21, Walter Bright wrote:
> 
> As for it being a required feature of string interpolation to do this processing at compile time, that's a nice feature, not a must have.

As far as I am concerned it is a must-have. For example, this is what prevents the SQL injection attack, it's a safety guarantee.