February 02, 2020
On 2/2/2020 10:46 AM, Steven Schveighoffer wrote:
> This works just fine:

That's a good point.

But it took me a moment to understand the `printf(5)` because what printf is is very deeply ingrained in me, hence I would not write code like that and wouldn't find it acceptable. It's jarring and misleading.

(I used to use clever printf macros in C to alter printf's behavior, but eventually removed all that for the same reason.)
February 02, 2020
On 2/2/20 3:18 PM, Walter Bright wrote:
> On 2/2/2020 10:46 AM, Steven Schveighoffer wrote:
>> This works just fine:
> 
> That's a good point.
> 
> But it took me a moment to understand the `printf(5)` because what printf is is very deeply ingrained in me, hence I would not write code like that and wouldn't find it acceptable. It's jarring and misleading.
> 
> (I used to use clever printf macros in C to alter printf's behavior, but eventually removed all that for the same reason.)

The printf(5) was just to show it can be overloaded. The real use case would be the interpolated string struct that everyone is discussing here (which you objected to), not adding integer printing ;). It's just that I can get that to work to demonstrate the overloadability of C functions.

In essence:

printf(i"I have $apples apples and $bananas bananas");

Works in both scenarios. With the DIP as-is (and no extra implementation in the library), and with the interpolated-string-to-struct idea (with a D-based shim as an overload).

Not arguing against the DIP, or for the struct mechanism, but just against that objection.

-Steve
February 02, 2020
I'm dropping other stuff just cuz I don't want to argue down rabbit holes. Let's see what common ground we can find and build upon that.

On Sunday, 2 February 2020 at 20:13:36 UTC, Steven Schveighoffer wrote:
> I think we could get the best of both worlds if the interpolated string itself was not just a string, but rather a library-defined type (well something slightly more special -- it should implicitly cast to a null-terminated immutable(char)* if needed, just like string literals).

Yes, indeed, I proposed this in the last round too. I think it is our best bet for a compromise to salvage this DIP.

You would lose the methods on it ... however, we have UFCS, so not a deal breaker.

Lots of the community is opposed to the requirement to import something for the UFCS flavor; that is a psychological barrier for many people. I'm not in love with that myself, but I can live with it.

And I remain legit concerned about implicit conversions though:

printf(i"$some_int");

that looks OK, but SILENTLY DOES THE WRONG THING.  But if dmd warned on the format specifier like gcc does, we're cool. Could be an enhancement later. I warn about this but do not withhold support about it.


The DIP mentions this one: `printf("%s $something", foo);` This also does the wrong thing. But so does `printf(i"$something% complete");`, since `% c` is a valid format specifier.... yet it doesn't look like one, especially since the interpolated string use $ now.

We need to make % go ahead and get translated to %% by the compiler so the parsing function can still work sanely with it. That's the solution to this. If the compiler makes % magic in the generated string, it needs to encode % in the user input too.

Then put a type on the format string, if it implicitly converts is up to y'all.  I just need some way to detect the format string's presence via in the type system for overloading existing functions.

Do those two small changes and this DIP will have my support.

* specify that % in the user string gets translated to %% in the format literal
* put the format literal in a wrapper type.

i"$foo %"

is translated to the tuple:

__d_format_literal!("%s %%"), foo


struct __d_format_literal(string fmt) {
        enum f = fmt;
        alias f this;
}

That implicitly converts and just works in printf. That's the answer.

> And we can actually add later the idea of making the interpolated string a new type after this DIP is implemented,

As soon as this DIP is implemented, we're frozen. Any future change will face additional barriers as a breaking change. It is going to be a lot better to just do it at least somewhat correctly now.


gah i've been trying to edit this post for like 90 mins now. im just gonna hit send; i 2g2 and i think you get the gist of what i mean.
February 03, 2020
On Sunday, 2 February 2020 at 23:54:35 UTC, Adam D. Ruppe wrote:
> Do those two small changes and this DIP will have my support.
>
> * specify that % in the user string gets translated to %% in the format literal
> * put the format literal in a wrapper type.
>
> i"$foo %"
>
> is translated to the tuple:
>
> __d_format_literal!("%s %%"), foo
>
>
> struct __d_format_literal(string fmt) {
>         enum f = fmt;
>         alias f this;
> }
>
> That implicitly converts and just works in printf. That's the answer.

I don't think you should compromise. Using printf as the basis just limits what can be done with an interpolated string. How do you handle custom types?

The DIP doesn't go over this but it'd have to either disallow it completely. Or convert it to a string first using `.toString()`, or similar. Since printf() wouldn't be able to take a custom type. This inhibits any sort of user implementation or customization and forces the type to be converted to a string first, losing any additional detail and preventing possible optimizations in only allocating one buffer for the entire thing. For each object toString() would allocate it's own buffer, when it could be constructed in place.

    SqlValue value;
    ExecuteSettings settings;

    execute(settings, i"SELECT * FROM table WHERE value=$value");

    void execute(InterpolatedString)(ref ExecuteSettings settings, InterpolatedString str) {
        string output;

        // static foreach(i; str.args) {
        // static if(is(typeof(str.args[i]) == SqlValue)) {
        // ...

        // typeof(str.args[i]) == SqlValue
        if( str.args[i].someProperty && settings.someSetting ) {
            // ...
        }

        str.args[i].emplaceToString(output); // or otherwise

    }

    printf(i"something $value".c); // ok, "c" would call toString() to make it compatible
    printf(i"something $value"); // or with overload for printf()

Compared to how'd you have to implement it with the current DIP:

    // this implementation can be used with printf
    void execute(Args...)(ref ExecuteSettings settings, string format, Args args) {
        // typeof(args[0]) == string, can't determine what type it was
        // more difficult to parse format and verify it is valid
        // can't access SqlValue for properties as it is now just a string
    }

    printf(i"something $value"); // works, calls toString first


Or to retain the type, it would then not work with printf().

    void execute(Args...)(ref ExecuteSettings settings, string format, Args args) {
        // typeof(args[0]) == SqlValue
        // what specifier was inserted into format? %s?
        // what if we know we can check now and instead use %d?
        // now we have to be able to parse printf-style formats properly to change
        // the formatting to be more efficient and make more sense
        //
        // also can't do in place optimizations without completely re-implementing
        // sprintf or otherwise support all of printf format capabilities
    }

    printf(i"something $value"); // error, passing SqlValue
    printf(i"something $value.toString()"); // ok, but kind of counter intuitive

Trying to cator to printf results in an implementation with the least amount of flexibility imo. It forces legacy debt onto the user to support. It will ultimately just be left to the handful of functions that use it now.




February 03, 2020
On Monday, 3 February 2020 at 02:57:41 UTC, Arine wrote:
> I don't think you should compromise. Using printf as the basis just limits what can be done with an interpolated string. How do you handle custom types?

Eh, that's a solved problem. See: https://wiki.dlang.org/Defining_custom_print_format_specifiers

The format string is incredibly flexible and the tuple proposal does a fairly good job maintaining all available information until the last minute.

With the % -> %% encoding, it is also possible to reliably* determine which

* with the exception of a user-defined format string that breaks this convention. e.g. `i"${not_percent}(foo)"` will not be able to tell for certain that foo is tied to not_percent, and this can throw off processing of all subsequent arguments as well. Maybe we should do something about this.

> Since printf() wouldn't be able to take a custom type.

Yeah, printf would likely fail with custom types, but that's printf's limitation - D's writef works well with them, including various custom format specifiers and allocation-free output.

And other custom functions, provided we identify the format string as such, can do any special parsing of it.

With my proposed addendum, even compile-time validation of the format string is possible in consumer functions:

void my_thing(Format, Args...)(Format fmt, Args args) {
    static if(is(Format == _d_interpolated_string!S, string S))
      static assert(string_is_ok(S));
    else
      static assert(0, "you didn't pass an interpolated string");

   // use args according to fmt
}

(I would actually recommend we make the name and interface a wee bit more friendly in the library for user consumption, including methods that can be tested without tying us to just one argument like that is expression does above, for future expansion potential... like I kinda want to also say how many arguments there are as CT params there, or even the slices of that string that represent format strings - that would IMO be the holy grail as we can not only get how many args but exactly where they are without reparsing the string - but I digress. Regardless, the compiler can output the _d_whatever ugly thing and I like to use ugly things in these examples in an effort to avoid people arguing over the name when I'd rather focus on the underlying concept.)

But anyway since the format string is actually part of the *type*, we can overload on it and extract the original string as a compile-time constant - including when passed as a runtime argument list - for further processing.

> For each object toString() would allocate it's own buffer, when it could be constructed in place.

void toString(
     scope void delegate(const(char)[]) sink,
     FormatSpec!char fmt)

is already possible with D's existing format function on custom objects. No allocation and you can respond to arbitrary formatting details as specified there.

So your example:

> Or to retain the type, it would then not work with printf().

A custom type wouldn't work with printf with the DIP as it stands. It does not attempt to convert anything, it just forwards the arguments.

Similarly your function could require that the format strings must just be %s and you don't support customization. The implementation there can be as simple as scanning forward for % and a compile-time template could break it up into an array for you with no runtime trouble. Or you could only support a custom format that the user needs to put in the ${here}(var) thingy.

I'm *very* close to supporting this DIP, it isn't actually that bad. As it is, I'd have to recommend we defeat it, but if we can all agree to amend it to put in the template thing we can fix everything. Even with just the one string arg, we can do a lot with it...

(my struct would make simple cases simpler - you can just .toString it without imports or even alias toString this - but the template does actually allow more custom flexibility.)
February 03, 2020
On Monday, 3 February 2020 at 03:57:09 UTC, Adam D. Ruppe wrote:
> On Monday, 3 February 2020 at 02:57:41 UTC, Arine wrote:
>> I don't think you should compromise. Using printf as the basis just limits what can be done with an interpolated string. How do you handle custom types?
>
> Eh, that's a solved problem. See: https://wiki.dlang.org/Defining_custom_print_format_specifiers
>
> The format string is incredibly flexible and the tuple proposal does a fairly good job maintaining all available information until the last minute.
>
> With the % -> %% encoding, it is also possible to reliably* determine which
>
> * with the exception of a user-defined format string that breaks this convention. e.g. `i"${not_percent}(foo)"` will not be able to tell for certain that foo is tied to not_percent, and this can throw off processing of all subsequent arguments as well. Maybe we should do something about this.
>
>> Since printf() wouldn't be able to take a custom type.
>
> Yeah, printf would likely fail with custom types, but that's printf's limitation - D's writef works well with them, including various custom format specifiers and allocation-free output.

Part of the whole reason is that this works with printf(), the examples in the DIP almost exclusive use the C functions. Why stop short of custom types?

> And other custom functions, provided we identify the format string as such, can do any special parsing of it.
>
> With my proposed addendum, even compile-time validation of the format string is possible in consumer functions:
>
> void my_thing(Format, Args...)(Format fmt, Args args) {
>     static if(is(Format == _d_interpolated_string!S, string S))
>       static assert(string_is_ok(S));
>     else
>       static assert(0, "you didn't pass an interpolated string");
>
>    // use args according to fmt
> }

Implement string_is_ok() for me. How easy is it to implement to verify it is valid for your own purpose while also working around the printf formatting. Ensuring that it correctly interprets all of printf's features.

> that would IMO be the holy grail as we can not only get how many args but exactly where they are without reparsing the string - but I digress.

You can have that without the formatting, or having to add extra meta data so you can link which argument goes with what index in the format array. You get all of this if you don't use a printf-style formatting string. It doesn't have to be added ontop with a bandaid.


> But anyway since the format string is actually part of the *type*, we can overload on it and extract the original string as a compile-time constant - including when passed as a runtime argument list - for further processing.
>
>> For each object toString() would allocate it's own buffer, when it could be constructed in place.
>
> void toString(
>      scope void delegate(const(char)[]) sink,
>      FormatSpec!char fmt)
>
> is already possible with D's existing format function on custom objects. No allocation and you can respond to arbitrary formatting details as specified there.

The biggest problem here is that toString() then doesn't know or understand the context it is being used in. You'd have to write some sort of wrapper and then it starts to become convoluted. At that point you are just trying to imitate a solution that doesn't use printf formatting.

That then has to use a delegate, and if you have to do something like insert one character at a time, it won't be ideal making so many calls like that. Especially if you are concerned with performance. Calling sink() multiple times could also cause reallocations. It all depends on how format/writef or whatever it is you are using is implemented. That generic function isn't going to know about it as well as whatever you are implementing. Such as if you want to implement a @nogc solution. There's no phobos implementation for that. Your on your own to implement your own @nogc solution and you're stuck having to implement this monolithic format spec.

You can respond to arbitrary formatting, but what does the compiler choose to insert? The DIP makes no mention of this. These are all lacking details that shouldn't be plugged as they appear.

> So your example:
>
>> Or to retain the type, it would then not work with printf().
>
> A custom type wouldn't work with printf with the DIP as it stands. It does not attempt to convert anything, it just forwards the arguments.

Where did you get that? The DIP doesn't mention that. So what is the format string in this case? That's the first point of failure, when it has to decide what format specifier it has to put into the format string.

Only the first example is what I'd deem good enough to implement. The other examples that use a printf-style format is showing the inadequacies that are created from using it.

> I'm *very* close to supporting this DIP, it isn't actually that bad. As it is, I'd have to recommend we defeat it, but if we can all agree to amend it to put in the template thing we can fix everything. Even with just the one string arg, we can do a lot with it...

You can't make your own @nogc function with custom types so easily. This DIP seems to be hellbent on supporting C functions, but at the end of the day it won't be able to support @nogc without having to completely create your own solution that implements printf-style formatting. That's not something anyone is likely to do. Supporting @nogc would be a better goal than supporting the C functions. Just calling toString() for the C functions isn't sufficient.

That's why people would want to use it with C functions, is so that it is @nogc. You're fine with not having custom types work with C functions, but is that really fine? D doesn't have an easy alternative and requiring the printf format spec be utilized makes it incredibly more difficult to roll your own solution.

It's ironic, this DIP is pushing for printf C functions to be usable so it can work with @nogc. But it makes it more difficult to support @nogc and your own custom implementations because phobos inadequately supports @nogc. Even if it did have some @nogc format, you'd still be impeded from optimizing your own implementation and would rely on whatever kind of optimizations are done in @nogc format. Which might not fit your use case.




February 03, 2020
On Monday, 3 February 2020 at 05:10:13 UTC, Arine wrote:
> Part of the whole reason is that this works with printf(), the examples in the DIP almost exclusive use the C functions. Why stop short of custom types?

This DIP doesn't format them at all. It is fundamentally just a syntax sugar rewrite of a string into an argument list.

> Implement string_is_ok() for me.

That varies based on what features you choose to support. It could be as simple as assert(string.count("%s") == args.length) and you ignore anything else. Or you could go crazy with all kinds of custom specifiers. The compiler doesn't do anything except

1) replace $xxxx with %s in the string
2) blindly copy/paste what the user said ${xxxx} while moving the var to the argument list

> Ensuring that it correctly interprets all of printf's features.

This is not necessary. You can make your version only do the bare minimum.

> You can respond to arbitrary formatting, but what does the compiler choose to insert? The DIP makes no mention of this.

The compiler ONLY uses %s unless the string's author specifies something else in the string itself. Anything else is up to the user - and the user is supposed to look up what the function they are calling supports. That's what this paragraph from the dip is about:

===
The {%d} syntax is for circumstances when the format specifier needs to be anything other than %s, which is the default. What goes between the { } is not specified, so this capability can work with future format specification improvements without needing to update the core language. It also makes interpolated strings agnostic about what the format specifications are.
===

Unless the person calling your function SPECIFICALLY writes something else - which you can say in the docs "do not attempt".


> Just calling toString() for the C functions isn't sufficient.

and the dip doesn't do that. That's why it passes the arguments unmodified - it is up to the function you are calling to do whatever it needs to do.

Really, the C compatibility thing is a red herring. This dip is NOT actually compatible with printf out of the box; it silently does the wrong thing in almost ALL cases.

What it does do is provide a hook for users that can be compatible with it. It does extremely little on its own.
February 03, 2020
On 2/2/20 6:54 PM, Adam D. Ruppe wrote:
> 
> I'm dropping other stuff just cuz I don't want to argue down rabbit holes. Let's see what common ground we can find and build upon that.
> 
> On Sunday, 2 February 2020 at 20:13:36 UTC, Steven Schveighoffer wrote:
>> I think we could get the best of both worlds if the interpolated string itself was not just a string, but rather a library-defined type (well something slightly more special -- it should implicitly cast to a null-terminated immutable(char)* if needed, just like string literals).
> 
> Yes, indeed, I proposed this in the last round too. I think it is our best bet for a compromise to salvage this DIP.
> 
> You would lose the methods on it ... however, we have UFCS, so not a deal breaker.
> 
> Lots of the community is opposed to the requirement to import something for the UFCS flavor; that is a psychological barrier for many people. I'm not in love with that myself, but I can live with it.

Not too concerned. If you want actual strings from this feature, then you need to pay the cost of importing stuff from phobos. It also allows for someone to define their own less-heavy version.

> And I remain legit concerned about implicit conversions though:
> 
> printf(i"$some_int");
> 
> that looks OK, but SILENTLY DOES THE WRONG THING.  But if dmd warned on the format specifier like gcc does, we're cool. Could be an enhancement later. I warn about this but do not withhold support about it.

This doesn't concern me AT ALL. If you want printf to work, you need to understand printf and the problems it can have. writef is there, and works. I don't think the compiler has any business complaining about printf usage.

> The DIP mentions this one: `printf("%s $something", foo);` This also does the wrong thing. But so does `printf(i"$something% complete");`, since `% c` is a valid format specifier.... yet it doesn't look like one, especially since the interpolated string use $ now.
> 
> We need to make % go ahead and get translated to %% by the compiler so the parsing function can still work sanely with it. That's the solution to this. If the compiler makes % magic in the generated string, it needs to encode % in the user input too.

I disagree. I don't want my SQL interpolated strings (which can use % for matching) to be tainted by the interpolation. Again, if you want to use string interpolation to call printf (or writef), you need to know what will happen, just like if you were calling it with string + args form.

> Then put a type on the format string, if it implicitly converts is up to y'all.  I just need some way to detect the format string's presence via in the type system for overloading existing functions.

I think this is the only thing we really agree on. Having a specialized type gives so much more options, and should decay into what is already proposed.

> 
> Do those two small changes and this DIP will have my support.
> 
> * specify that % in the user string gets translated to %% in the format literal
> * put the format literal in a wrapper type.
> 
> i"$foo %"
> 
> is translated to the tuple:
> 
> __d_format_literal!("%s %%"), foo

I'd do it a little different, so we don't throw away the work the compiler already did:

i"$apples and ${%d}bananas"

=>

(__d_format_literal!(Format.init, " and ", Format("%d")), apples, bananas)

If there is an overload that takes whatever this returns, then this is used as the lowering. Otherwise, a string literal as specified by the DIP is used (or we have an alias this in the result to the string version).

> struct __d_format_literal(string fmt) {
>          enum f = fmt;
>          alias f this;
> }
> 
> That implicitly converts and just works in printf. That's the answer.

This isn't much better than just passing the string, but still provides overload capability. However, this still means we have to parse things in the library if we want to do anything interesting.

>> And we can actually add later the idea of making the interpolated string a new type after this DIP is implemented,
> 
> As soon as this DIP is implemented, we're frozen. Any future change will face additional barriers as a breaking change. It is going to be a lot better to just do it at least somewhat correctly now.

I think we can do it in a way that's not a breaking change. Or at least doesn't break things that explicitly accept string format + args.

We shouldn't be frozen with this. And of course, string interpolation may prove to leave things wanting, so there may be an appetite to update to something like this.

-Steve
February 03, 2020
On Monday, 3 February 2020 at 14:37:22 UTC, Steven Schveighoffer wrote:
> I'd do it a little different, so we don't throw away the work the compiler already did:
>
> i"$apples and ${%d}bananas"
>
> =>
>
> (__d_format_literal!(Format.init, " and ", Format("%d")), apples, bananas)

Yes, that would be excellent. If you make a motion to amend the DIP, I'll withdraw my motion and second yours. Let's form a coalition and get Walter onboard!

With this, there's no more magic %s from the compiler (though Format.init can and prolly should just return "%s") and there's no more need for % => %% since the library can reliably detect everything.

I'd just note that `Format` here is more realistically `__d_format_item` or something, a new simple thingy rather than a complex phobos struct or whatever.

> If there is an overload that takes whatever this returns, then this is used as the lowering. Otherwise, a string literal as specified by the DIP is used (or we have an alias this in the result to the string version).

yeah just alias this it. Let's not put too much magic in there when we already have a library solution with lowering.

February 03, 2020
On 2/3/20 9:52 AM, Adam D. Ruppe wrote:
> On Monday, 3 February 2020 at 14:37:22 UTC, Steven Schveighoffer wrote:
>> I'd do it a little different, so we don't throw away the work the compiler already did:
>>
>> i"$apples and ${%d}bananas"
>>
>> =>
>>
>> (__d_format_literal!(Format.init, " and ", Format("%d")), apples, bananas)
> 
> Yes, that would be excellent. If you make a motion to amend the DIP, I'll withdraw my motion and second yours. Let's form a coalition and get Walter onboard!

I hope this can work, but I feel Walter might not be on board due to past comments from him. This is why I feel we can wait and get string interpolation in, and then later add this.

But I'll throw the idea out there, and see what he says.

> 
> With this, there's no more magic %s from the compiler (though Format.init can and prolly should just return "%s") and there's no more need for % => %% since the library can reliably detect everything.

Yeah, I like that too. For instance mysql can accept i"select * from sometable where id = $id" and not have to put the ${?} crap in front of it.

> I'd just note that `Format` here is more realistically `__d_format_item` or something, a new simple thingy rather than a complex phobos struct or whatever.

Right, the naming isn't important.

>> If there is an overload that takes whatever this returns, then this is used as the lowering. Otherwise, a string literal as specified by the DIP is used (or we have an alias this in the result to the string version).
> 
> yeah just alias this it. Let's not put too much magic in there when we already have a library solution with lowering.
> 

I have 2 concerns here. First, I don't want to eagerly construct the string if not needed/used. But I can solve this by making toString a template function, which enums the constructed string if asked for.

Second concern is that strings in general don't implicitly cast to immutable(char)*. Which means printf stops working. Of course, we can just enum the format string together, but then it's eagerly constructed.

the "compiler magic" would alleviate both these concerns, which is why I suggested it. But if we did the enum with alias this, it would be I think fully compatible.

-Steve