January 27, 2021
#DIP1036

Full Disclosure: I am not favorably disposed to this, as it is fairly complicated and uses the GC.

> It can bind to a parameter list, but it does not have a type by itself.

Makes no sense. What is it doing by "binding" to a parameter list? The examples make no sense, either, because assert doesn't have a parameter list.

> idup

What does this function look like?

> requires the GC

D needs to move away from such constructs.

> interp and idup

Not clear when interp is called and when idup is called.

> With proper library definitions, if usage of a string interpolation is an error, this DIP does not specify the language of the error condition. It is our preference that the resulting error of the idup call is emitted instead of the failed sequence match.

Finish this rather than hand wave.

> functions which accept interp literals

what are "interp literals" ?

> Because the type interp!"..." is not implicitly convertible to any other type

Why wouldn't it be?

> This design is intentional to trigger the implicit idup call whenever it is used for conventional string-accepting functions.

I don't know how this might fit in with overload resolution.

> "Best effort" functions

I don't know what the definition of "best effort" is when applied to a function.

> What became clear as the prior version was reviewed was that the complexity of specifying format while transforming into a parameter sequence was not worth adding to the language.

I didn't think that was the conclusion. This DIP is much more complicated.

> Because the interp template type will provide a toString member, it will pass properly to functions such as writeln or text and work as expected without any changes to the existing functions.

It won't work generally, however:

    void foo(string);
    struct S { string toString(); }
    void test() { S s; foo(s); }

fails.

> To pass two sequential interpolation strings to a function that accepts interpolation strings, concatenation is not needed—separating the string literals by a comma will suffice.

This will have weird consequences for overloading, i.e. distinguishing one combined argument from two distinct arguments.

> The complete specification of these translations is left up to the eventual implementors and language maintainers.

In my experience, doing the detail design of things often reveals a fatal flaw.

> Compiler implementation

This section appears to confuse a definition of of the feature with its implementation. It really should be labeled "Overload Resolution".

I am totally confused why it refers to InterpolationString for matching purposes, and yet says InterpolationSequence and InterpolationLiteral are used for function overloading. Can't have it both ways.

> with no further attempt to rewrite the sequence.

Does that mean there are multiple rewrites under other conditions?

> In the case where it does not match, the InterpolationString will be rewritten as a call to a druntime library function named idup.

Does this imply a two-pass approach to overload resolutions? Try and fail, then try again with rewrites?

> If multiple InterpolationString tokens are used in a parameter list, the call must match for the resulting expansion of all InterpolationString tokens, or the entire expression will fail to match.

Which expansion, as there are two different expansions?

What about variadic parameters? Lazy parameters?

No examples given of trivial and non-trivial overload matches illustrating each step of this process.

The reason I'm being pedantic on the overloading is we've done hand-wavy overload rules before (alias this, cough cough) and eventually found out it was unworkable.
January 28, 2021
On Wednesday, 27 January 2021 at 10:33:53 UTC, Mike Parker wrote:
> [snip]

The DIP states that foo(i"a:${a}, ${b}.") is rewritten as `foo(Interp!"a:", a, Interp!", ", b, Interp!".")`. It think it's better to rewrite it as `foo(Interp!"a:", Interp!typeof(a)(a), Interp!", ", Interp!typeof(b)(b), Interp!".")`. That way, `foo` has easier time introspecting which came from the interpolated string.

The type of interpolated string literal is very special cased. The DIP states it is not an alias sequence, but that it behaves like one when passed to a function. And if that does not compile, it is treated as string instead. This is going to be full of all sorts of corner cases.

Let me suggest an alternative: the user manually chooses the type. For example, `i"hello ${world}"` would be rewritten as `idup(Interp!"hello ", Interp!typeof(world)(world))`, and `I"hello ${world}"` would be `AliasSeq!(Interp!"hello ", Interp!typeof(world)(world))`. And with latter I mean an honest alias sequence, not one with a special cased `.length` or anything like that.

January 28, 2021
On Thursday, 28 January 2021 at 08:35:34 UTC, Dukc wrote:
> On Wednesday, 27 January 2021 at 10:33:53 UTC, Mike Parker wrote:
>> [snip]
>
> That way, `foo` has easier time introspecting which came from the interpolated string.

Meant: that way `foo` has easier time introspecting which arguments came from the interpolated string. I meant that the interpolated string might not be the only argument passed to `foo`.


January 28, 2021
On Thursday, 28 January 2021 at 08:35:34 UTC, Dukc wrote:
> `I"hello ${world}"` would be `AliasSeq!(Interp!"hello ", Interp!typeof(world)(world))`. And with latter I mean an honest alias sequence, not one with a special cased `.length` or anything like that.

Error again, this would not compile. Replace "alias sequence" with "expanded tuple" so that the rewritten snipped would be `tuple(Interp!"hello ", Interp!typeof(world)(world)).expand`. I don't mean that the compiler would rewrite the string to use `std.typecons.Tuple`, but that the resulting expanded tuple would be implemented just the same way.

January 28, 2021
On 1/28/21 1:00 AM, Walter Bright wrote:
> On 1/27/2021 4:09 AM, Atila Neves wrote:
>> auto msg1 = i"Hello, ${name}, this is your ${visits}${post(visits)} time visiting";
>> auto msg2 = ir"Hello, ${name}, this is your ${visits}${post(visits)} time visiting";
>> auto msg3 = i`Hello, ${name}, this is your ${visits}${post(visits)} time visiting`;
>> auto msg3 = q{Hello, ${name}, this is your ${visits}${post(visits)} time visiting};
>>
>> I'm guessing msg3 was supposed to be `iq{...}`?
> 
> And shouldn't it be msg4?

Yes it should. I clearly didn't proofread this part very well. :(

-Steve
January 28, 2021
On 1/28/21 3:35 AM, Dukc wrote:
> On Wednesday, 27 January 2021 at 10:33:53 UTC, Mike Parker wrote:
>> [snip]
> 
> The DIP states that foo(i"a:${a}, ${b}.") is rewritten as `foo(Interp!"a:", a, Interp!", ", b, Interp!".")`. It think it's better to rewrite it as `foo(Interp!"a:", Interp!typeof(a)(a), Interp!", ", Interp!typeof(b)(b), Interp!".")`. That way, `foo` has easier time introspecting which came from the interpolated string.

First, I don't think it's critical for overloading, and will simply add to the template bloat. What are you going to do differently with `a` than you would with `Interp!(typeof(a))(a)`?

Second, this removes any ref possibilities for the parameters.

The parameters are guaranteed to start and end with an InterpolationLiteral, so one can assume that non-literal arguments are interspersed inside the literal.

> The type of interpolated string literal is very special cased. The DIP states it is not an alias sequence, but that it behaves like one when passed to a function. And if that does not compile, it is treated as string instead. This is going to be full of all sorts of corner cases.

I was fully aware that this would be the most controversial part. I feel like it will not be full of corner cases, but I'm not sure. Can you specify any?

Consider a normal string literal can be used as a string, immutable(char)*, wstring, or dstring. I find it very similar to this feature, and I don't feel like there are a lot of corner cases there.

> Let me suggest an alternative: the user manually chooses the type. For example, `i"hello ${world}"` would be rewritten as `idup(Interp!"hello ", Interp!typeof(world)(world))`, and `I"hello ${world}"` would be `AliasSeq!(Interp!"hello ", Interp!typeof(world)(world))`. And with latter I mean an honest alias sequence, not one with a special cased `.length` or anything like that.
> 

We have considered that. The problem is that people will use the string interpolation form without realizing the dangers or resulting bloat.

For instance, writeln(i"Hello, ${name}"), if made to proactively generate a string just to send it to writeln is extremely wasteful when writeln(I"Hello, ${name}") is not. I feel like the auto rewrite is a better option because it does the right thing in all cases. The beauty of it is that the library author gets to decide whether it makes sense to accept the expanded form, the user is just saying "here's something string-like I want you to handle". It puts the decision in the right hands, while not being intrusive in case the library author doesn't want to deal with it.

Consider also that code which uses a dual-literal system might have to use the string interpolation form because the library only allows that. Then at some point in the future, the library adds support for the expanded form. Now the user would have to go back and switch all usage to that new form, whereas an auto-rewrite would just work without changes.

-Steve
January 28, 2021
auto convoluted = i"${ir"`${"{"}`"}"; // nested string interpolations work.
assert(convoluted == "`{`");

+InterpolatedString:
+    InterpolatedDoubleQuotedString
+    InterpolatedWysiwygString
+    InterpolatedAlternateWysiwygString
+    InterpolatedTokenString

Interpolated string should obey all escaping rules of the string literal it's derived from, and initial lexing of such string should be done with the same logic, and handling of interpolation sequences should be done on raw content of the lexed string after all due unescaping.

i`\${.}`
i"\\${.}"
These two should have the same meaning of escaped interpolation dollar sign, the escaped backslash becomes just backslash after double quote string unescaping, and this backslash is interpreted as interpolation escape sequence.
January 28, 2021
On Wednesday, 27 January 2021 at 10:33:53 UTC, Mike Parker wrote:
> This is the feedback thread for the second round of Community Review of DIP 1036, "String Interpolation Tuple Literals".

DIP 1036 takes two different approaches to string interpolation and attempts to merge them together into a single proposal. In broad terms, those approaches can be characterized as follows:

1. The convenient approach: the language and runtime take care of string conversion and memory allocation for you, and you don't have to worry about any of the details.

2. The flexible approach: the language splits the string apart into interpolated and non-interpolated pieces, and it's up to you to decide what to do with them.

DIP 1036's proposal for #2 is very good, and its proposal for #1, while missing some important details, appears to be fundamentally on the right track. Either one of these proposals would make a fine DIP on its own. The problem with DIP 1036 is in the way it attempts to combine the two.

Fundamentally, the goal that DIP 1036 is aiming for is to give the programmers who want convenience the convenient version, and to give programmers who want flexibility the flexible version. While this is an admirable goal, fully achieving it requires reading the programmer's mind, which is infeasible given D's current level of compiler technology. So what DIP 1036 does is attempt to *guess* what the programmer wants, using a rather crude heuristic: if the code compiles with the flexible version, the compiler is to assume that's what the programmer wants; otherwise, it assumes they want the convenient version.

As with any heuristic or approximation, there are edge cases where this breaks down. One of them is called out in the DIP itself--type inference via `auto`--but it is not hard to imagine others. For example, a programmer who writes

    tuple(i"Good morning ${name}", i"Good evening ${name}")

...is probably not going to get what they intended, even though their code compiles.

Every D programmer who wants to make effective use of DIP 1036's interpolation literals will have to go through the process of learning when .idup is required, when it's optional, when it's allowed-but-unnecessary, and when it's forbidden--which means that, in practice, they will have to learn how it actually works, under the hood. This is not a desirable trait for a language feature that's intended to make programming *easier*.

Ultimately, I think attempting to guess the programmer's intent is the wrong way to go here. Either force them to spell it out explicitly (with a call to .idup, .text, etc.), or take away the choice and give up on one of the two approaches.
January 28, 2021
On 1/28/21 2:39 AM, Walter Bright wrote:
> #DIP1036
> 
> Full Disclosure: I am not favorably disposed to this, as it is fairly complicated and uses the GC.

I hope to alleviate your concerns, from the responses below, it seems like I have poorly conveyed the intentions of the DIP in many parts.

>  > It can bind to a parameter list, but it does not have a type by itself.
> 
> Makes no sense. What is it doing by "binding" to a parameter list? The examples make no sense, either, because assert doesn't have a parameter list.

Forgive my ignorance of the language spec and terminology. I want to say that basically if you write:

foo(i"Hello, ${name}")

It translates to:

foo(interp!"Hello, "(), name, interp!""())

Unless that doesn't match a valid overload, and if not, then it translates to:

foo(idup(interp!"Hello, "(), name, interp!""()))

Clearly I don't know how to say that properly. I'm thinking of a new way to say this with overloads (see overload blurb below). Hopefully this is better.

On assert, the fact that it won't match the expanded form means it will use the idup rewrite. That is intentional. I will make it clear that the rewrite will only happen for function or template argument lists.

> 
>  > idup
> 
> What does this function look like?

The signature would look like:

S idup(Args...)(Args args) if (is(Args[0] : interp!S, S))

And it would be roughly equivalent to std.conv.text, but without much of the cruft of phobos (likely it reuses some features already in druntime, such as miniFormat).

>  > requires the GC
> 
> D needs to move away from such constructs.

First, the DIP only requires the GC if idup is used. SOME form of allocation is needed.

Follow the logic: you need a string from a set of arguments. This set of arguments is only knowable at runtime. Therefore you need a runtime allocation to hold the resulting string. Where should that allocated space come from?

There is no possible string interpolation feature that results in an actual string that can be done without either adding a new allocation scheme to the language (i.e. reference counting), or using the GC.

And it was very clear from the previous review, a string interpolation feature that cannot simply be assigned to or used as a string is a failed feature.

> 
>  > interp and idup
> 
> Not clear when interp is called and when idup is called.

See overload blurb below.

> 
>  > With proper library definitions, if usage of a string interpolation is an error, this DIP does not specify the language of the error condition. It is our preference that the resulting error of the idup call is emitted instead of the failed sequence match.
> 
> Finish this rather than hand wave.

I can do this even though it's an implementation detail.

> 
>  > functions which accept interp literals
> 
> what are "interp literals" ?

That should say InterpolationLiterals as defined in the description. It's an instantiation of the `interp` struct.

>  > Because the type interp!"..." is not implicitly convertible to any other type
> 
> Why wouldn't it be?

I don't understand the question. D does not allow implicit conversion of library types without either alias this or inheritance.

> 
>  > This design is intentional to trigger the implicit idup call whenever it is used for conventional string-accepting functions.
> 
> I don't know how this might fit in with overload resolution.

See my blurb about overload resolution below.

> 
>  > "Best effort" functions
> 
> I don't know what the definition of "best effort" is when applied to a function.

Functions that accept any and all types of arguments, like writeln, and use a best effort to do something with them. These will never trigger the idup rewrite, which is why I talk about them in the DIP.

You may roughly define a best effort function as one that accepts a vararg template parameter, and has no template constraints related to that list.

> 
>  > What became clear as the prior version was reviewed was that the complexity of specifying format while transforming into a parameter sequence was not worth adding to the language.
> 
> I didn't think that was the conclusion. This DIP is much more complicated.

I disagree. This DIP is much simpler to use. It may be more complicated to implement, but that doesn't matter to the user of the language.

The overload resolution is likely the only truly complex part to implement, since the rules are not easy to fit into the existing ones. The translation of the literal to InteropolationLiterals and expressions should be actually simpler than the previous DIP because no formatting specifiers are involved.

> 
>  > Because the interp template type will provide a toString member, it will pass properly to functions such as writeln or text and work as expected without any changes to the existing functions.
> 
> It won't work generally, however:
> 
>      void foo(string);
>      struct S { string toString(); }
>      void test() { S s; foo(s); }
> 
> fails.

I'm not sure if you understand the point of the statement. Functions such as writeln or text will work with interpolation literals. There is no attempt to say that it works with all functions, or that functions which accept strings will work with all types that define a toString member.

However, this will work with your foo and S above:

void test() { S s; foo(i"${s}"); }

> 
>  > To pass two sequential interpolation strings to a function that accepts interpolation strings, concatenation is not needed—separating the string literals by a comma will suffice.
> 
> This will have weird consequences for overloading, i.e. distinguishing one combined argument from two distinct arguments.

Identifying specific weird consequences would be most helpful.

>  > The complete specification of these translations is left up to the eventual implementors and language maintainers.
> 
> In my experience, doing the detail design of things often reveals a fatal flaw.

We are willing to write a library implementation for discussion. But the actual implementation does not affect the DIP. We are 100% confident an implementation of idup is possible (simply for the fact that std.conv.text exists).

> 
>  > Compiler implementation
> 
> This section appears to confuse a definition of of the feature with its implementation. It really should be labeled "Overload Resolution".

OK, thank you for giving me the correct term! And also, this is a better frame of view than what I originally wrote from. See my new suggestion below.

> 
> I am totally confused why it refers to InterpolationString for matching purposes, and yet says InterpolationSequence and InterpolationLiteral are used for function overloading. Can't have it both ways.

I'll make sure this is clearer.

> 
>  > with no further attempt to rewrite the sequence.
> 
> Does that mean there are multiple rewrites under other conditions?

No. The point of this clarification is because the idup rewrite itself still is a function call that goes through the overload rules. I do not want to get into a recursive situation in the compiler where it tries foo(<expanded form>) then foo(idup(<expanded form>)), which for some reason doesn't match, and then tries foo(idup(idup(<expanded form>))) etc.

The idup rewrite should contain no further possibility of rewriting.

> 
>  > In the case where it does not match, the InterpolationString will be rewritten as a call to a druntime library function named idup.
> 
> Does this imply a two-pass approach to overload resolutions? Try and fail, then try again with rewrites?

My intention was for this to happen. But it only fails and tries the rewrite if there is no match (for function and template argument lists).

> 
>  > If multiple InterpolationString tokens are used in a parameter list, the call must match for the resulting expansion of all InterpolationString tokens, or the entire expression will fail to match.
> 
> Which expansion, as there are two different expansions?

If you pass multiple interpolation string parameters into a function, then either all must be expanded or all must be rewritten to idup calls. There cannot be a mix of both rewritten or expanded forms matching.

> 
> What about variadic parameters? Lazy parameters?

Good point on variadic parameters. We think they should not match the expanded form. The point here is that the function is likely not equipped to handle these things, and so passing a string instead will be more compatible. If you want to match the expanded form, you must use a variadic template.

Are there different overload rules for lazy parameters? I would expect:

foo(lazy string s)
bar(Args...)(lazy Args args) if (is(Args[0] : interp!S, string S))

to both accept string interpolation literals the same as the non-lazy equivalents would.

> 
> No examples given of trivial and non-trivial overload matches illustrating each step of this process.

I will add this.

> 
> The reason I'm being pedantic on the overloading is we've done hand-wavy overload rules before (alias this, cough cough) and eventually found out it was unworkable.

I'm sorry for not being more detailed here. I am not experienced in the underlying details of overloads. In particular I would like to know cases that break this scheme either by making something not match when it should, or by using the wrong mechanism than is expected.

I can appreciate the point of view from the compiler side, and it's something we are lacking in experience. I am mostly focused on usability. I want to get it right, so that it's feasible to implement, whatever that takes.

-- Redo using Overloading instead of Compiler Implementation

I propose that instead of discussing the compiler implementation (that clearly was a mistake), the DIP should discuss the usage within the context of the existing overload rules.

Here is what I would propose:

1. If a StringInterpolation token appears anywhere other than an argument to a function call or template, the idup rewrite is always done. This includes for assert and mixin.
2. If a StringInterpolation token appears in an argument list to a function or template, the compiler shall try overloads with the StringInterpolation token expanded into InterpolationLiteral and Expression data. If there are any matches to the call, overload resolution processes as normal, and no rewrite is performed.
3. If no matches are found in step 2, then the compiler retries the overload search substituting a call to idup with the sequence for each of the parameters.

I will have to come up with a list of examples to clarify.

-Steve
January 29, 2021
> provides a call that is free of sql injection attacks

This is a strong claim that requires substantiation, especially since sql injection attacks are a critical problem.