Jump to page: 1 2
Thread overview
DIP 1027--String Interpolation--Final Review Feedback Thread
Jan 30, 2020
Mike Parker
Jan 30, 2020
Andrea Fontana
Jan 31, 2020
Arine
Feb 03, 2020
Walter Bright
Feb 03, 2020
Arine
Feb 02, 2020
Jon Degenhardt
Feb 03, 2020
Walter Bright
Feb 03, 2020
Adam D. Ruppe
Feb 03, 2020
Walter Bright
Feb 03, 2020
Adam D. Ruppe
Feb 03, 2020
Dennis
Feb 04, 2020
Walter Bright
Feb 12, 2020
Juraj Mojzis
Feb 13, 2020
Mike Parker
January 30, 2020
This is the feedback thread for DIP 1027, "String Interpolation".

===================================
**THIS IS NOT A DISCUSSION THREAD**

Posts in this thread must adhere to the feedback thread rules outlined in the Reviewer Guidelines (and listed at the bottom of this post).

https://github.com/dlang/DIPs/blob/master/docs/guidelines-reviewers.md

That document also provides guidelines on contributing feedback to a DIP review. Please read it before posting here. If you would like to discuss this DIP, please do so in the discussion thread:

https://forum.dlang.org/post/pligvevbwavdbyiwgcrp@forum.dlang.org

==================================

You can find DIP 1027 here:

https://github.com/dlang/DIPs/blob/d8f2e769c3a8c711e7886ccecc93eac9795dae9c/DIPs/DIP1027.md

The review period will end at 11:59 PM ET on February 13, or when I make a post declaring it complete. Feedback posted to this thread after that point may be ignored.

==================================
Posts in this thread that do not adhere to the following rules will be deleted at the DIP author's discretion:

* All posts must be a direct reply to the DIP manager's initial post, with only two exceptions:
    - Any commenter may reply to their own posts to retract feedback contained in the original post
    - The DIP author may (and is encouraged to) reply to any feedback solely to acknowledge the feedback with agreement or disagreement (preferably with supporting reasons in the latter case)
* Feedback must be actionable, i.e., there must be some action the DIP author can choose to take in response to the feedback, such as changing details, adding new information, or even retracting the proposal.
* Feedback related to the merits of the proposal rather than to the contents of the DIP (e.g., "I'm against this DIP.") is allowed in Community Review (not Final Review), but must be backed by supporting arguments (e.g., "I'm against this DIP because..."). The supporting arguments must be reasonable. Obviously frivolous arguments waste everyone's time.
* Feedback should be clear and concise, preferably listed as bullet points (those who take the time to do an in-depth review and provide feedback in the form of answers to the questions in this document will receive much gratitude). Information irrelevant to the DIP or is not provided in service of clarifying the feedback is unwelcome.

January 30, 2020
On Thursday, 30 January 2020 at 09:47:43 UTC, Mike Parker wrote:
> This is the feedback thread for DIP 1027, "String Interpolation".

I think the author missed an interesting use case more than printf:

string myString = format(i"I eat $count apples");

or better:

string myString = i"I eat $count apples".format;


January 31, 2020
This is purely feedback (please state why it was removed):

1) Format string with args in a tuple is *not* easy to manipulate.
2) A false statement is made in the DIP, this needs to be fixed.
3) A change was made that allows error prone code that's difficult to detect, this should be reverted.
4) D should diagnose printf format errors, C++ does this and it is only a library implementation there. This DIP brings printf formats into a language feature.
5) The grammar of the language should be updated so we know exactly what printf format option corresponds to which type. Eg string -> %s, int -> %d, float -> %f, etc...

> The DIP author's position, established over several comments, was essentially:
>
> tuples can easily be manipulated to suit the user's needs

I know this is just paraphrasing, but this is not true. Tuples alone might be easy to manipulated, but a formatted string with args disguised as a tuple is *not*.

Write a custom function that'd take an interpolated string that would otherwise remove or somehow manipulate the first argument that wasn't a string.

If an interpolated string simple evaluated as a plain tuple or as a struct (as some others have proposed). You can easy use CTFE to find the first argument, and then remove it or manipulated how it is outputted.

Because this DIP proposes using format strings, your implementation would have to be able to properly parse a formatted string. Find the argument you are looking for, and then proceed to remove or manipulate the formatted string to suite your needs so that you would then have to pass it into a sprintf()-like function.

That is no easy task. Especially if the expectation is that it should support _all_ of printf's formatting capabilities. This creates the expectation that a user can just pass a printf-like format argument with all their optional settings. If the user's implementation *needs* to verify the format string to be valid. It becomes extremely complicated and would need to be able to parse and understand all of printf's format syntax.

This means interpolated strings won't likely be used for anything except for essentially format() and printf() like functions (or functions that simply forward the arguments to those functions) that implement the format specification and don't need to manipulate or verify the input.

A format string isn't easily convertible back to a tuple of arguments where the arguments are in place of where their values should be in the string.

A tuple or struct (as proposed by someone else) representation of a interpolated string is easily convertible to be used in a printf-like function.

    i"hammering $object with $tool"             // becomes
    tuple("hammering %s with %s", object, tool)

    // which is difficult to convert back to:
    // (especially since the expectation is to support all of printf's syntax)

    tuple("hammering ", object, " with ", tool)
    // but the above can easily be turned into the printf-formatted tuple above

A tuple/struct interpolated string (without printf-like formatting) has all the benefits as described but it is much much easier for the user to manipulate and work with. And doesn't create an expectation to support all of printf's formatting syntax when it shouldn't be necessary. And it can easily be converted to be used with printf.

> The meaning of the format specifications is unknown to the core language.

This statement is just false and should be removed from the DIP. If it doesn't know anything about the format specification then the DIP with its' current proposed implementation wouldn't be able to function.

> Mixing Conventional Format Arguments With Interpolated Strings
>
> Interpolated string formats cannot be mixed with conventional elements:
>
>     string tool = "hammer";
>     writefln(i"hammering %s with $tool", "nails");

Not sure why this is now allowed. This just makes it difficult to see that something went wrong. Especially so if you are using printf()/scanf() and you mix up two incompatible types, like a string. This just adds unnecessary risk for very little gain. (And it seems the intention is to not analyze the formatted string)

> No attempt is made to diagnose format specification errors, such as attempting to format an integer as a floating point value.

If you are going to make this feature part of the language, then it should diagnose it. Even C++ compilers analyze it to ensure you are using printf correctly. That language doesn't even have it as part of a language feature. It is just a library detail implemented as part of the C standard library. If D is going to embed printf formatting details as part of the core language specification, then it should at the very least be capable of analyzing the format it is going to be forcing onto everyone that wants to use interpolated strings.
February 02, 2020
On Thursday, 30 January 2020 at 09:47:43 UTC, Mike Parker wrote:
> This is the feedback thread for DIP 1027, "String Interpolation".

* Better separation of language level specification and the implementation.

It would be easier to read and evaluate the DIP if the language level constructs available to the programmer were better separated from the implementation. This is done to some degree in the DIP, but in a several places the writing mixes the language level proposal and the anticipated implementation.

One of the clearer examples is the first line of the Concatenations section. This describes the behavior when an interpolated string is concatenated with a literal string. However, it also describes which compiler parsing pass this occurs in. This places responsibility on the DIP reader to identify if the implementation specific called out has an impact on the language feature as seen by the programmer. (If it does, it is not called out in the DIP.)

The above example is rather minor. The larger one involves the notion of the rewriting interpolated string to a tuple. From the wording used in the DIP, this appears to be primarily an implementation approach to achieve the desired functional goal. But whether this is the case or not is not clear. Is the tuple a first class object visible to the programmer? Or, is the implementation free to throw away the tuple implementation if an alternate way to achieve the functionality is identified?

* Incorporate the rationale for specific decisions listed in the Review Round 1 feedback in the body of the DIP.

In particular, the rationale for "no assignment to string" should be incorporated into the body of the DIP, and the more general goals of BetterC compatibility and no GC allocations. These are important considerations and would be better described in the main body of DIP.

* The family of functions string interpolation can be used with should be clarified.

The DIP provides examples of using interpolated strings with 'writefln' and 'printf'. However, it is not clear from the writing whether these are examples or an exhaustive list.

A DIP reader assuming examples would likely assume interpolated strings can be used with a wider variety of string formatting functions, for example, 'formattedWrite', 'format', and 'sformat'. A reader assuming an exhaustive list would draw very different conclusions.

The correct interpretation is not clear from the written text. A correct interpretation is important to assessing the value of the DIP, and the writing would be better if this was explicit.

* Clarify whether interpolated strings can be declared/manipulated outside the context of a format function literal arguments.

For example, is the following valid?

void foo()
{
    auto numApples = 3;
    auto interpolated = i"$numApples apples";
    writefln(interpolated);
}

This type of usage would be expected by many programmers. However, there are no examples of this usage in the DIP, and several of the goals of the DIP imply this would not be allowed. With the current writing the DIP reader must infer whether this is allowed or not. The DIP writing would be improved if this topic was explicitly addressed.

If the above behavior is allowed then there are a number of additional language level questions that should be addressed in the writing. For example, what does the following produce?:

void foo()
{
    auto numApples = 3;
    auto interpolated = i"$numApples apples";
    numApples = 4;
    writefln(interpolated);
}

* Clarify and if and how new functions that take interpolated strings as arguments can be written.

It is not clear from the DIP if users can write their own functions that take interpolated strings as arguments. Similarly, it is not clear if/how Phobos can add new functions taking interpolated strings as arguments.

February 02, 2020
On 2/2/2020 1:47 PM, Jon Degenhardt wrote:
> * Better separation of language level specification and the implementation.
> 
> It would be easier to read and evaluate the DIP if the language level constructs available to the programmer were better separated from the implementation. This is done to some degree in the DIP, but in a several places the writing mixes the language level proposal and the anticipated implementation.
> 
> One of the clearer examples is the first line of the Concatenations section. This describes the behavior when an interpolated string is concatenated with a literal string. However, it also describes which compiler parsing pass this occurs in. This places responsibility on the DIP reader to identify if the implementation specific called out has an impact on the language feature as seen by the programmer. (If it does, it is not called out in the DIP.)
> 
> The above example is rather minor. The larger one involves the notion of the rewriting interpolated string to a tuple. From the wording used in the DIP, this appears to be primarily an implementation approach to achieve the desired functional goal. But whether this is the case or not is not clear. Is the tuple a first class object visible to the programmer? Or, is the implementation free to throw away the tuple implementation if an alternate way to achieve the functionality is identified?

The introduction to the D language specification starts with the "Phases of Compilation". Identifying how the string interpolation fits in with this is appropriate and aids in understanding the proposal.

https://dlang.org/spec/intro.html


> * Incorporate the rationale for specific decisions listed in the Review Round 1 feedback in the body of the DIP.
> 
> In particular, the rationale for "no assignment to string" should be incorporated into the body of the DIP, and the more general goals of BetterC compatibility and no GC allocations. These are important considerations and would be better described in the main body of DIP.

As long as the information is present, it doesn't really matter what part of the DIP it is in. It'll get redone for incorporation into the specification anyway.


> * The family of functions string interpolation can be used with should be clarified.
> 
> The DIP provides examples of using interpolated strings with 'writefln' and 'printf'. However, it is not clear from the writing whether these are examples or an exhaustive list.
> 
> A DIP reader assuming examples would likely assume interpolated strings can be used with a wider variety of string formatting functions, for example, 'formattedWrite', 'format', and 'sformat'. A reader assuming an exhaustive list would draw very different conclusions.
> 
> The correct interpretation is not clear from the written text. A correct interpretation is important to assessing the value of the DIP, and the writing would be better if this was explicit.

There is nothing in the discussion of how it works that implies it is tied to any particular function. 'printf' and 'writefln' were selected for examples simply because they were so well known that no further explanation of them was necessary. Also, 'printf' and 'writefln' are not magic functions specially known and handled by the compiler, and there's no hint in the DIP that this situation is being changed.

> * Clarify whether interpolated strings can be declared/manipulated outside the context of a format function literal arguments.
> 
> For example, is the following valid?
> 
> void foo()
> {
>      auto numApples = 3;
>      auto interpolated = i"$numApples apples";
>      writefln(interpolated);
> }
> 
> This type of usage would be expected by many programmers. However, there are no examples of this usage in the DIP, and several of the goals of the DIP imply this would not be allowed. With the current writing the DIP reader must infer whether this is allowed or not. The DIP writing would be improved if this topic was explicitly addressed.

The behavior is clear if the text is followed. `interpolated` is set to the tuple ("%s apples", 3).

> 
> If the above behavior is allowed then there are a number of additional language level questions that should be addressed in the writing. For example, what does the following produce?:
> 
> void foo()
> {
>      auto numApples = 3;
>      auto interpolated = i"$numApples apples";
>      numApples = 4;
>      writefln(interpolated);
> }

Tuples are expressions, not ASTs. Expressions are evaluated where they appear, which is true throughout D. Hence, the above `writefln` call is evaluated as:

    writefln("$s apples", 3);


> * Clarify and if and how new functions that take interpolated strings as arguments can be written.
> 
> It is not clear from the DIP if users can write their own functions that take interpolated strings as arguments. Similarly, it is not clear if/how Phobos can add new functions taking interpolated strings as arguments.

Tuple expressions as function arguments are a longtime existing feature of D, and do not require re-explanation in the DIP.

It may indeed be the case that tuple expressions are inadequately explained in the language specification, but that is a problem that should be addressed with a bugzilla documentation issue, not this DIP.

February 03, 2020
> Mixing Conventional Format Arguments With Interpolated Strings

This DIP proposes leaving all % unmodified in the string, yet it also injects % characters into the string. This is a mistake - it forces user code to be aware of implementation details and carefully encode all special characters. Web programmers learned the hard way the problems of sloppy encoding.

As it stands, consuming functions have no way to tell if the first %s from `i"%s $foo"` is meant to go with the subsequent argument `foo` or the latter; it will throw off all future processing.

The DIP must be amended to specify that ALL % characters in the i"" string are replaced with %% in the yielded string.

> W and D Interpolated Strings

should work, the DIP rationale is poor. This is an arbitrary limitation and inconsistency with the rest of the language.

> [not present]

The DIP fails to address the feedback from the previous review round on concerns for overloading functions to use the new feature.

I move that we amend it so the format string literal is not created directly, but instead wrap it in  `_d_interpolated_string!"..."` instantiation.

So

writefln(i"foo % $bar")

is rewritten to

writefln(_d_interpolated_string!"foo %% %s", bar);

This allows a library implementation to detect it for the purposes of function overloading with zero other loss relative to the DIP status quo (in fact, the template there could be defined to reduce to the literal exactly via an eponymous enum).

This is different than the function call mentioned at the end of the DIP by the review manager, which lowered it, format string and all in to a function call. Here (a proposal that came up in the original thread), only the format string gets the template call, avoiding the other difficulties brought up with the function call.
February 02, 2020
On 2/2/2020 7:06 PM, Adam D. Ruppe wrote:
>> Mixing Conventional Format Arguments With Interpolated Strings
> 
> This DIP proposes leaving all % unmodified in the string, yet it also injects % characters into the string. This is a mistake - it forces user code to be aware of implementation details and carefully encode all special characters. Web programmers learned the hard way the problems of sloppy encoding.

A % is injected only in the case of $Argument where %s is injected. This is the default format. If other formats are desired, ${FormatString} is the syntax. If the user wants a %% in the rest of the string, he can add it. The user is expected to know what the intended target of the format string is and cater to it.


> As it stands, consuming functions have no way to tell if the first %s from `i"%s $foo"` is meant to go with the subsequent argument `foo` or the latter; it will throw off all future processing.
>
> The DIP must be amended to specify that ALL % characters in the i"" string are replaced with %% in the yielded string.

This is entirely up to the user to use %% where appropriate. Your proposed change will inadvertently wed it to the printf format.


>> W and D Interpolated Strings
> 
> should work, the DIP rationale is poor. This is an arbitrary limitation and inconsistency with the rest of the language.

I doubt anyone would use it. We can always add it later if desired, but removing it would be painful. It's not optimal to add features unless there is a clear and present need for it.

The original specification carefully treated the various string encodings equally. But as 20 years have passed, it's become very clear that UTF-8 is the hands-down winner and the W and D formats are aberrations.


>> [not present]
> 
> The DIP fails to address the feedback from the previous review round on concerns for overloading functions to use the new feature.
> 
> I move that we amend it so the format string literal is not created directly, but instead wrap it in `_d_interpolated_string!"..."` instantiation.
> 
> So
> 
> writefln(i"foo % $bar")
> 
> is rewritten to
> 
> writefln(_d_interpolated_string!"foo %% %s", bar);
> 
> This allows a library implementation to detect it for the purposes of function overloading with zero other loss relative to the DIP status quo (in fact, the template there could be defined to reduce to the literal exactly via an eponymous enum).

I don't see a point to overloading beyond what the tuple elements provide. There's a lot of value in having straightforward semantics - having a hidden template do something unexpected with overloading is hard to justify.


> This is different than the function call mentioned at the end of the DIP by the review manager, which lowered it, format string and all in to a function call. Here (a proposal that came up in the original thread), only the format string gets the template call, avoiding the other difficulties brought up with the function call.

February 03, 2020
The DIP is carefully designed to not need to know anything about format strings.

It's true that many C/C++ compilers have an extension which checks the types of the arguments against the format string, but this is an extension and is not part of the core language.

D can (and probably should) add such an extension, but it would be orthogonal to this DIP and should be addressed separately.
February 03, 2020
On Thursday, 30 January 2020 at 09:47:43 UTC, Mike Parker wrote:
> This is the feedback thread for DIP 1027, "String Interpolation".

> is the proposed feature specified in sufficient detail?

Some things are not clear to me.
- Is InterpolatedString meant to be added to TemplateSingleArgument?
Since it is a PrimaryExpression, it is already allowed as a template argument in brackets.
- How does that work? Especially regarding this:

> InterpolatedExpresssions undergo semantic analysis similar to MixinExpression

The specification does not describe MixinExpressions in much detail:
https://dlang.org/spec/expression.html#mixin_expressions

It is unclear why interpolated strings use the same rules.
alias T = int; mixin(T, " a = 8", -3, ";"); would result in "int a = 8-3;" being mixed in since T becomes T.stringof and -3 becomes "-3".
Turning arguments into strings at compile time does not make sense for an interpolated string though.

- The 'Concatenation' section is not very specific.
It does not mention whether the interpolated string grammar is parsed on the raw string literal or the escaped string literal, e.g. can I do `i"""\x24apples"` instead of i"$apples"?
It also does not mention the type of the resulting string literal when wstrings or dstrings are appended to the interpolated string.

- The way Character, CharacterNoBraces and CharacterNoParen are used in the grammar is ambiguous.
It allows unlimited non-{} / non-() characters, meaning $ and " are allowed too.
The grammar in its current form can produce any nonsensical string (e.g. i"i$)("$i}{"""$$$}}}{}") with near-arbitrary choice for FormatString and Argument placement.
If ambiguity were resolved by picking the first option, then rules for FormatString and Argument are unreachable.

Suggested actions: Specify how interpolated strings work as template arguments, fix the grammar, and clarify the Concatenation section.

#############

> are edge cases, flaws, and risks identified and addressed?

It is claimed that:
> It also makes interpolated strings agnostic about what the format specifications are.
> The meaning of the format specifications is unknown to the core language.

This is simply false, because the question "why is %s inserted by default" can only be answered with "that is Phobos' format specifier convention". Any attempt at a format function that has no special meaning for % will be at a disadvantage.
Consider this example:
```D
int s = 3;
format(i" 8%s = 8%$s ");
// = format(" 8%s = 8%%s ", 3);
// = " 83 = 8%s ";
```
Here the variable s got formatted at completely the wrong place because a %s was already there and the %s that the interpolated string inserted got escaped. The DIP identifies "Mixing Conventional Format Arguments With Interpolated Strings" as a limitation but does not address the fact that the current design requires the programmer to take special note of $ and % and the format string convention, or errors might occur, some of which are undetectable at compile time or run time.

- Nested interpolated strings are not considered. Is i"$(i"$x")" an error because the second " ends the literal early? Is i""`$(i"$x")` equal to tuple("%s", "%s", x), or is this not allowed? Failure / succes cases of using nested interpolated strings should be explored to determine whether it is allowed or not.

- It is weird to me that interpolated strings are not compatible with the c, w and d postfixes.
It is inconsistent with every other string literal, and the claim that the postfixes are used rarely is not backed up by anything. I see ""w strings often being used with Windows API functions which use UTF-16.

Suggested actions: Change the design to be truly format specifier agnostic, specify how nested interpolated strings work, allow other encodings.

#############

> is there an implementation that proves the proposed feature works in practice?

No. There is only an implementation of a different proposal:
https://github.com/dlang/dmd/pull/7988
Morover, the DIP does not explore use cases apart from printf and writef.
Many contexts that can benefit from interpolated strings (Exception / assertion messages, mixin, code generation for domain specific languages such as SQL) are not considered.

Suggested actions: Explore expected use cases and how interpolated strings will work in them, or let people toy with a prototype before settling on a final design.

#############

> does the DIP consider prior work from other languages?

A prior work section contains a link to the Wikipedia page on interpolated strings, and a few links to previous proposals, but no attempt is made to compare the proposed design with others. In the review summary some design goals are listed based on the discussion of the previous review round:

> the implementation must be compatible with BetterC, meaning printf and similar C functions

Why should it work in BetterC when interpolated strings are not a feature in C?
Why is direct compatibility with printf needed, is printf actually used commonly in D code outside of dmd and BetterC code?
Why is a custom function that accepts an interpolated string and forwards it to printf not acceptable instead?

> the implementation must not trigger GC allocations
> the implementation must not depend on Phobos

I assume this is with reference to the idea "why no assignment to string".
It should be noted that nobody proposed interpolated strings always directly go to string, just that they may implicitly convert to string, meaning the counter arguments don't apply since interpolated strings can still be used in BetterC, just not the string conversion functionality.
The requirement 'must not depend on Phobos' should also be motivated, for example by links to bugzilla issues with problems that the ^^ operator (which depends on std.math: pow) has.

> the implementation must be performant

The proposed design is notably not optimal in performance. It does not work with the writef variant that takes that format string at compile time, meaning the format string must be parsed at runtime, and a FormatException might get thrown.

Suggested actions: enumerate and motivate design goals in Rationale section, explain why proposed design best fits those goals. Compare other proposed designs and motivate why the DIP's design wins.
February 03, 2020
I propose an incremental change to this DIP.

Instead of

    i"a, $b, ${%d}c"

lowering to the tuple

    ("a, %s, %d", b, c)

it lowers instead to the tuple

    (__d_interpString!("a, ", __d_formatItem.init, ", ", __d_formatItem("%d")), b, c)

The item returned from __d_interpString shall be defined by the library, but one that implicitly converts to a string or immutable(char)*, with the string contents being identical to the current DIP's format string. It also will provide access to the original sequence as parsed.

__d_formatItem should be essentially a typedef of string, with a default value of "%s".

The rationale for this change is twofold:

1. Providing an alternative type that decays to a simple string allows specialized handling of such an interpolation via overloading. Some current problems that can be alleviated by this (as identified in the DIP discussion):

   a. multiple interpolated strings as parameters to writefln. e.g.: writefln(i" ... $a ... ", i" ... $b ... ") can be handled.
   b. Escaping of "%" characters in formatted strings. e.g.: writefln(i"$percentage% complete") will assume the "% c" is a format specifier, but could be overloaded to ignore that '%' if it knows the format string came from an interpolated string.
   c. alternative default format specifiers. For example MySQL uses only the '?' specifier for parameters, so one must prefix EVERY string interpolation parameter with ${?}. If a function overload knows you are using string interpolation, it can generate a proper SQL query without needing those prefixes.

2. The compiler is already doing the work of splitting up the parameters and string data. If the format string is simply passed as a string, then determining the original parsed form (of string + format specifiers) is difficult, if not impossible, and definitely unnecessary if the compiler has already done it.

It should be noted that this shouldn't affect any other items in the DIP -- everything as written in the DIP for examples should work exactly the same. But if we don't add this mechanism first, it will be more difficult to add it later. Most obvious would be that templates that accept any type as its first parameter would change to a new type from string. This may not break anything, but it would potentially be a breaking change.

Also, the names of the identifiers __d_interpString and __d_formatItem are subject to debate.

-Steve
« First   ‹ Prev
1 2