January 09, 2024
On 1/9/24 00:06, Walter Bright wrote:
> Here's how SQL support is done for DIP1036:
> 
> https://github.com/adamdruppe/interpolation-examples/blob/master/lib/sql.d
> 
> ```
> auto execi(Args...)(Sqlite db, InterpolationHeader header, Args args, InterpolationFooter footer) {
>      import arsd.sqlite;
> 
>      // sqlite lets you do ?1, ?2, etc
> 
>      enum string query = () {
>          string sql;
>          int number;
>          import std.conv;
>          foreach(idx, arg; Args)
>              static if(is(arg == InterpolatedLiteral!str, string str))
>                  sql ~= str;
>              else static if(is(arg == InterpolationHeader) || is(arg == InterpolationFooter))
>                  throw new Exception("Nested interpolation not supported");
>              else static if(is(arg == InterpolatedExpression!code, string code))
>                  {   } // just skip it
>              else
>                  sql ~= "?" ~ to!string(++number);
>          return sql;
>      }();
> 
>      auto statement = Statement(db, query);
>      int number;
>      foreach(arg; args) {
>          static if(!isInterpolatedMetadata!(typeof(arg)))
>              statement.bind(++number, arg);
>      }
> 
>      return statement.execute();
> }
> ```
> This:
> 
> 1. The istring, after converted to a tuple of arguments, is passed to the `execi` template.
> 2. It loops over the arguments, essentially turing it (ironically!) back into a format

This is not ironic at all. The point is it _can_ do that, while DIP1027 _cannot_ do _either this or the opposite direction_. It is yourself who called the istring the building block instead of the end product, but now you are indeed failing to turn the sausage back into the cow.

> string. The formats, instead of %s, are ?1, ?2, ?3, etc.
> 3. It skips all the Interpolation arguments inserted by DIP1036.
> 4. The remaining argument are each bound to the indices 1, 2, 3, ...
> 5. Then it executes the sql statement.
> 
> Note that nested istrings are not supported.
> ...

But you get a useful error message that exactly pinpoints what the problem is. Also, they could be supported, which is the point.

> Let's see how this can work with DIP1027:
> 
> ```
> auto execi(Args...)(Sqlite db, Args args) {
>      import arsd.sqlite;
> 
>      // sqlite lets you do ?1, ?2, etc
> 
>      enum string query = () {
>          string sql;
>          int number;
>          import std.conv;
>          auto fmt = arg[0];
>          for (size_t i = 0; i < fmt.length, ++i)
>          {
>              char c = fmt[i];
>              if (c == '%' && i + 1 < fmt.length && fmt[i + 1] == 's')
>              {
>                  sql ~= "?" ~ to!string(++number);
>                  ++i;
>              }
>              else if (c == '%' && i + 1 < fmt.length && fmt[i + 1] == '%')
>                  ++i;  // skip escaped %
>              else
>                  sql ~= c;
>          }
>          return sql;
>      }();
> 
>      auto statement = Statement(db, query);
>      int number;
>      foreach(arg; args[1 .. args.length]) {
>          statement.bind(++number, arg);
>      }
> 
>      return statement.execute();
> }
> ```
> This:
> ...

This does not work.

> 1. The istring, after converted to a tuple of arguments, is passed to the `execi` template.
> 2. The first tuple element is the format string.
> 3. A replacement format string is created by replacing all instances of "%s" with
> "?n", where `n` is the index of the corresponding arg.
> 4. The replacement format string is bound to `statement`, and the arguments are bound
> to their indices.
> 5. Then it executes the sql statement.
> 
> It is equivalent.

No. As Nickolay already explained, it is not equivalent.

- It does not even compile, even if we fix the typo arg -> args.

That is enough to dismiss DIP1027 for this example. However, let's for the sake of argument assume that, miraculously, `execi` can read the format string at compile time, then:

- With this signature, if you pass a manually-constructed string to it, it would just accept the SQL injection.
- It does not give a proper error message for nested istrings.
- It has to manually parse the format string. It iterates over each character of the original format string.
- It (ironically!) constructs a new format string, the original one was useless.
- If you pass a bad format string to it (for example, by specifying a manual format), it will just do nonsense, while DIP1036e avoids bad format strings by construction.
January 09, 2024
On 1/9/24 09:29, Walter Bright wrote:
> 
> Also, the reason I picked the SQL example is because that is the one most cited as being needed and in showing the power of DIP1036 and because I was told that DIP1027 couldn't do it :-)

And I stand by that.
January 09, 2024
On 1/9/24 10:59, Paolo Invernizzi wrote:
> 
> CTFE support is a must IMHO

Yes. Besides the usability benefits you allude to, it is simply a security feature. We absolutely do not want the constructed string to depend on dynamically entered runtime data. Constructing it at compile time ensures that this is the case.
January 09, 2024
On 1/9/2024 12:45 AM, Alexandru Ermicioi wrote:
> If that's the case, then 1036 wins imho, by simple thing of not doing any parsing of format string.

Consider the overhead 1036 has by comparing it with plain writeln or writefln:

```
void test(int baz)
{
    writeln(i"$(baz + 4)");
    writeln(baz + 5);
    writefln("%d", baz + 6);
}
```

Generated code:

0000:   55                       push      RBP
0001:   48 8B EC                 mov       RBP,RSP
0004:   48 83 EC 20              sub       RSP,020h
0008:   48 89 5D E8              mov       -018h[RBP],RBX

000c:   89 7D F8                 mov       -8[RBP],EDI          // baz
000f:   48 83 EC 08              sub       RSP,8
0013:   31 C0                    xor       EAX,EAX
0015:   88 45 F0                 mov       -010h[RBP],AL
0018:   48 8D 75 F0              lea       RSI,-010h[RBP]
001c:   FF 36                    push      dword ptr [RSI]      // header
001e:   88 45 F1                 mov       -0Fh[RBP],AL
0021:   48 8D 5D F1              lea       RBX,-0Fh[RBP]
0025:   FF 33                    push      dword ptr [RBX]      // expression!"baz + 4"
0027:   8D 7F 04                 lea       EDI,4[RDI]           // baz + 4
002a:   88 45 F2                 mov       -0Eh[RBP],AL
002d:   48 8D 75 F2              lea       RSI,-0Eh[RBP]
0031:   FF 36                    push      dword ptr [RSI]      // footer
0033:   E8 00 00 00 00           call      writeln

0038:   48 83 C4 20              add       RSP,020h
003c:   8B 45 F8                 mov       EAX,-8[RBP]
003f:   8D 78 05                 lea       EDI,5[RAX]           // baz + 5
0042:   E8 00 00 00 00           call      writeln

0047:   BA 00 00 00 00           mov       EDX,0                // "%d".ptr
004c:   BE 02 00 00 00           mov       ESI,2                // "%d".length
0051:   8B 4D F8                 mov       ECX,-8[RBP]
0054:   8D 79 06                 lea       EDI,6[RCX]           // baz + 6
0057:   E8 00 00 00 00           call      writefln

005c:   48 8B 5D E8              mov       RBX,-018h[RBP]
0060:   C9                       leave
0061:   C3                       ret

With the istring, there are 4 calls to struct member functions that just return null.
This can't be good for performance or program size.
We can compute the number of arguments passed to the function:

istring: 1 + 3 * <number of arguments> + 1 + 1  (*)
writeln: <number of arguments>
writefln: 1 + <number of arguments>

(*) includes string literals before, between, and after arguments

January 09, 2024
On 1/9/2024 4:40 AM, Timon Gehr wrote:
> On 1/9/24 09:29, Walter Bright wrote:
>>
>> Also, the reason I picked the SQL example is because that is the one most cited as being needed and in showing the power of DIP1036 and because I was told that DIP1027 couldn't do it :-)
> 
> And I stand by that.

But I showed that DIP1027 could do the SQL example.
January 09, 2024
On 1/9/2024 4:35 AM, Timon Gehr wrote:
> This does not work.

How so? Consider this:

```
import std.stdio;

auto execi(Args...)(Args args)
{
    auto fmt = args[0].dup;
    fmt[0] = 'k';
    writefln(fmt, args[1 .. args.length]);
}

void main()
{
    string b = "betty";
    execi(i"hello $b");
}
```

which compiles and runs, printing:

kello betty
January 09, 2024
On 1/9/2024 4:35 AM, Timon Gehr wrote:
> However, let's for the sake of argument assume that, miraculously, `execi` can read the format string at compile time, then:

Adam's implementation of execi() also runs at run time, not compile time.


> - With this signature, if you pass a manually-constructed string to it, it would just accept the SQL injection.

It was just a proof of concept piece of code. execi could check for format strings that contain ?n sequences. It could also check the number of %s formats against the number of arguments.


> But you get a useful error message that exactly pinpoints what the problem is.
> Also, they could be supported, which is the point.
> - It does not give a proper error message for nested istrings.

execi could be extended to reject arguments that contain %s sequences. Or, if there was an embedded istring, the number of %s formats can be checked against the number of arguments. An embedded istring would show a mismatch.

I expect that use of nested istrings would be exceedingly rare. If they are used, wrapping them in text() will make work. Besides, would a nested istring in an sql call be intended as part of the sql format, or would a text string be the intended result?


> - It has to manually parse the format string. It iterates over each character of the original format string.

Correct. And it does not need to iterate over and remove all the Interpolation arguments. Nor does it need the extra two arguments, which aren't free of cost.


> - It (ironically!) constructs a new format string, the original one was useless.

Yes, it converts the format specifiers to the sql ones. Why is this a problem?


> - If you pass a bad format string to it (for example, by specifying a manual format), it will just do nonsense, while DIP1036e avoids bad format strings by construction.

What happens when ?3 is included in a DIP1036 istring? `i"string ?3 ($betty)" ? I didn't see any check for that. Of course, one could add such a check to the 1036 execi.

printf format strings are checked by the compiler, and writef format strings are checked by writef. execi is also capable of being extended to check the format string to ensure the format matches the args.
January 09, 2024
On 1/9/24 20:06, Walter Bright wrote:
> On 1/9/2024 4:40 AM, Timon Gehr wrote:
>> On 1/9/24 09:29, Walter Bright wrote:
>>>
>>> Also, the reason I picked the SQL example is because that is the one most cited as being needed and in showing the power of DIP1036 and because I was told that DIP1027 couldn't do it :-)
>>
>> And I stand by that.
> 
> But I showed that DIP1027 could do the SQL example.

You actually did not.
January 09, 2024

On Tuesday, 9 January 2024 at 20:01:34 UTC, Walter Bright wrote:

>

With the istring, there are 4 calls to struct member functions that just return null.
This can't be good for performance or program size.

A valid point, thanks. Could you test if that fixes the issue?

import core.interpolation;
import std.meta: AliasSeq, staticMap;
import std.stdio;

template filterOutEmpty(alias arg) {
    alias T = typeof(arg);

    static if (is(T == InterpolatedLiteral!s, string s))
        static if (s.length)
            alias filterOutEmpty = s;
        else
            alias filterOutEmpty = AliasSeq!();
    else static if (
        is(T == InterpolationHeader) ||
        is(T == InterpolatedExpression!code, string code) ||
        is(T == InterpolationFooter)
    )
        alias filterOutEmpty = AliasSeq!();
    else
        alias filterOutEmpty = arg;
}

pragma(inline, true) // This pragma is necessary unless you compile with `-inline`.
void log(Args...)(InterpolationHeader, Args args, InterpolationFooter) {
    writeln(staticMap!(filterOutEmpty, args));
}

void main() {
    int baz = 3;
    log(i"$(baz + 4)");
    writeln(baz + 5);
}
>

Adam's implementation of execi() also runs at run time, not compile time.

We are probably talking about different things. Adam’s implementation constructs a format string at compile time thanks to enum storage class in line 36. Constructing it at compile time is essential so that we can validate the generated SQL and abort compilation, as Paolo demonstrated.

>

 execi could be extended to reject arguments that contain %s sequences.

I disagree. Storing a string that contains %s in a database should be allowed (storing any string should obviously be allowed, regardless of its contents). But execi is unable to differentiate between a string that happens to contain %s and a nested format string:

// DIP1027
example(i"prefix1 $(i"prefix2 $(x) suffix2") suffix1");
// Gets rewritten as:
example("prefix1 %s suffix1", "prefix2 %s suffix2", x);

I might be wrong, but it appears to me that DIP1027 is not able to deal with nested format strings, in a general case. DIP1036 has no such limitation (demonstrated in point 2 here).

>

Nor does it need the extra two arguments, which aren't free of cost.

I explained here why these two arguments are valuable. Aren’t free of cost—correct unless you enable inlining. execi may require some changes (like filterOutEmpty I showed above) to make them free of cost, but it is doable.

>

What happens when ?3 is included in a DIP1036 istring? i"string ?3 ($betty)" ? I didn't see any check for that. Of course, one could add such a check to the 1036 execi.

You are right, it doesn’t. Timon’s point (expressed as “This does not work”) is that DIP1036 is able to do validation at compile time while DIP1027 is only able to do it at runtime, when this function actually gets invoked.

January 09, 2024
On Tuesday, 9 January 2024 at 19:05:40 UTC, Walter Bright wrote:
> On 1/9/2024 12:45 AM, Alexandru Ermicioi wrote:
>> If that's the case, then 1036 wins imho, by simple thing of not doing any parsing of format string.
>
> Consider the overhead 1036 has by comparing it with plain writeln or writefln:

How is this related to original argument of not requiring any parsing to be done by user inside function that accepts istring, that you replied to?

I personally would be ok with any overhead 1036 adds as long as I don't need to do any extra work such as parsing.

Please take into consideration also code inside function that does accept interpolated string. I'm pretty sure that parsing of format string inside dip1027 function would result in bigger and more complex generated code, than overhead you've mentioned for 1036 version, for use cases similar to logging I've mentioned.