Thread overview
"string interpolation"
Jun 08, 2019
Amex
Jun 09, 2019
Amex
Jun 09, 2019
Jonathan Marler
Jun 10, 2019
Amex
Jun 10, 2019
Adam D. Ruppe
Jun 10, 2019
Jonathan Marler
Jun 09, 2019
Nicholas Wilson
Jun 09, 2019
Adam D. Ruppe
June 08, 2019
I and many others write a code that uses string mixins simply to define a symbol based on others:

mixin(T~" "~id~" = "~value~");");


This is very ugly and it is impossible to debug properly.

Why not allivate this issue? Is is so common that there can be a short hand syntax that the compiler can decode naturally.

e.g.,

#T #id = #value;


Or whatever syntax one wants to device.

¡T¡¡id¡ = ¡value¡;

↕T ↕id = ↕value;


Whatever...

The compiler then types the symbol and tries to resolve it. If it is a string in it inserts it directly as a symbol.

Basically whatever the simplification it is effectively lowered in to a string mixin...

but because it is expected to be used in a statement or expression it can be parsed. Only the marked symbols are resolved and everything else must be a valid expression. Essentially the compiler resolves the symbols then parses the string as if it were typed in as direct code. Since the symbol substitution is well defined it can easily backtrack to pinpoint the error in the "mixin".

if id = "fdf";

↕"int" ↕id = ↕"3";

is the same as

int fdf = 3;


Of course, if T is a type then

↕T ↕id = ↕value;

is the same as

T ↕id = ↕value;






June 08, 2019
On 6/8/19 5:28 PM, Amex wrote:
> I and many others write a code that uses string mixins simply to define a symbol based on others:
> 
> mixin(T~" "~id~" = "~value~");");
> 
> 
> This is very ugly and it is impossible to debug properly.
> 
> Why not allivate this issue? Is is so common that there can be a short hand syntax that the compiler can decode naturally.
> 

There is work being done on the idea of interpolated strings:

https://github.com/dlang/dmd/pull/7988

June 09, 2019
On Saturday, 8 June 2019 at 21:28:13 UTC, Amex wrote:
> I and many others write a code that uses string mixins simply to define a symbol based on others:
>
> mixin(T~" "~id~" = "~value~");");
>
>
> This is very ugly and it is impossible to debug properly.

Specifically for the case for mixins they accept comma separated stuff. Stuff that is not a string will be converted through it `.stringof` property (e.g. types).


mixin(T, " ", id, " = ", value, ");");
June 09, 2019
On Sunday, 9 June 2019 at 00:53:23 UTC, Nicholas Wilson wrote:
> Specifically for the case for mixins they accept comma separated stuff.

all that does is change ~ for , ...

> Stuff that is not a string will be converted through it `.stringof` property (e.g. types).

And this is bad! .stringof is the wrong thing to do in the vast majority of mixin cases, especially now that we have static foreach.

> mixin(T, " ", id, " = ", value, ");");

This will actually generate wrong code in there coincidentally happens to be another thing in scope with the same name as T, and if not, it is liable to cause a compile error.

The correct way to write these things is

mixin("T " ~ id ~ " = value;");


In well-written mixin code, concatenation is infrequent (operators (think opBinary) and declaration names are the only ones I wouldn't question. there are a few exceptions, but you probably aren't one of them) and stringof is extremely rare. Thorughout all my code generation libraries, I have *zero* uses of .stringof inside mixin. In my code generation tests and samples, I have one use of it... and it predates `static foreach`. Let me show you this code.

The idea here was to do a pass-by-value lambda. Usage:

    auto foo(int x) @nogc
    {
        auto f = lambda!(x, q{ (int y) => x + y });
        return f;
    }

Note the @nogc part, which is why we aren't just using the built-in one and a string instead. You list the arguments you capture, then the lambda as a string.


Old version of the code:

---
    template lambda(Args...) {
    	import std.conv;
	import std.range;
	import std.string;

    	string evil() {
		// build the functor
		import std.meta;
		string code = "static struct anon {";
		foreach(i; aliasSeqOf!(iota(0, Args.length-1)))
			code ~= "typeof(Args[" ~ to!string(i) ~ "]) " ~ Args[i].stringof ~ ";";

		string func = Args[$-1];
		auto idx = func.indexOf("=>");
		if(idx == -1)
			throw new Exception("No => in lambda"); // or we could use one of the other styles

		auto args = func[0 .. idx];
		auto bod  = func[idx + 2 .. $];

		code ~= "auto opCall(T...)" ~ args ~ "{ return " ~ bod ~ "; }";

		code ~= "this(T...)(T t) {
			this.tupleof = t;
		};";

		code ~= "}";
		return code;
	}
    	mixin(evil());

        anon lambda() {
                anon a;
                // copy the values in
                a.tupleof = Args[0 .. $-1];
                return a;
        }
    }
---


It worked, but you can tell by my name of evil(), I didn't even like this code back when I wrote it the first time.

But I use a lot of concats and some stringof there. Let's try to rewrite this to eliminate that stuff with D's new features.

---
    template lambda(Args...) {

	static struct anon {
		static foreach(i; 0 .. Args.length - 1)
			mixin("typeof(Args[i]) " ~ __traits(identifier, Args[i]) ~ ";");

		mixin("auto opCall(T...)"~Args[$-1][0 .. Args[$-1].indexOf("=>")]~" {
			return " ~ Args[$-1][Args[$-1].indexOf("=>") + 2 .. $] ~ ";
		}");
	}

	anon lambda() {
		anon a;
		// copy the values in
		a.tupleof = Args[0 .. $-1];
		return a;
	}
    }
---

No more stringof! No more conversion of values to string, meaning no more need for two of those imports! Yay. And compile time cut down by 1/4.

But, that is still a lot of concatenation, and it isn't a declaration name or an operator, so this code is still suspect to me. Can we do even better?

Of course! Let's replace that mixin generated opCall with something simpler:

       auto opCall(T...)(T t) {
           return mixin(Args[$-1])(t);
       }

Heeey :) it still works, and it is a LOT simpler. Zero imports needed now!

And you get better error messages and more compatibility with other D syntaxes. Let's take the type out.

With old code:

lll.d-mixin-53(53): Error: undefined identifier y
lll.d(15): Error: template lll.foo.lambda!(x, " (y) => x + y ").anon.opCall cannot deduce function from argument types !()(int), candidates are:


But you defined y! So that error message doesn't help.

With new code:

lll.d(58): Error: function literal __lambda3(y) is not callable using argument types (int)
lll.d(58):        cannot pass argument _param_0 of type int to parameter y
lll.d(15): Error: template instance `lll.foo.lambda!(x, " (y) => x + y ").anon.opCall!int` error instantiating


Oh, the literal types aren't matching up. This message helps.


Let's change the lambda to (int y) { return x + y; }. You get a range error or a static assert error about the syntax, depending on how robust the implementation is.

But with the new code? No error message, it actually works. Instead of trying to slice up D code, I'm just compiling it as a complete chunk. And it still satisfies the @nogc constraint.


Mixins without stringof and minimizing concatenation is:

1) possible!

2) easier to write

3) easier to read

4) has fewer bugs


Instead of changing concat/stringof syntax, we should be educating people on the techniques to get rid of it. It is rarely justified.
June 09, 2019
On Saturday, 8 June 2019 at 22:23:34 UTC, Nick Sabalausky (Abscissa) wrote:
> On 6/8/19 5:28 PM, Amex wrote:
>> I and many others write a code that uses string mixins simply to define a symbol based on others:
>> 
>> mixin(T~" "~id~" = "~value~");");
>> 
>> 
>> This is very ugly and it is impossible to debug properly.
>> 
>> Why not allivate this issue? Is is so common that there can be a short hand syntax that the compiler can decode naturally.
>> 
>
> There is work being done on the idea of interpolated strings:
>
> https://github.com/dlang/dmd/pull/7988

This is not what I'm talking about. Similar but different. I'm specifically talking about usage to replace string mixins.

I'm not really talking about string interpolation at all. It's symbol substitution.

I've already given examples.


June 09, 2019
On Sunday, 9 June 2019 at 05:24:29 UTC, Amex wrote:
> On Saturday, 8 June 2019 at 22:23:34 UTC, Nick Sabalausky (Abscissa) wrote:
>> On 6/8/19 5:28 PM, Amex wrote:
>>> I and many others write a code that uses string mixins simply to define a symbol based on others:
>>> 
>>> mixin(T~" "~id~" = "~value~");");
>>> 
>>> 
>>> This is very ugly and it is impossible to debug properly.
>>> 
>>> Why not allivate this issue? Is is so common that there can be a short hand syntax that the compiler can decode naturally.
>>> 
>>
>> There is work being done on the idea of interpolated strings:
>>
>> https://github.com/dlang/dmd/pull/7988
>
> This is not what I'm talking about. Similar but different. I'm specifically talking about usage to replace string mixins.
>
> I'm not really talking about string interpolation at all. It's symbol substitution.
>
> I've already given examples.

So it sounds like you want `#X` to lower to `mixin(X)`? So in your example

#T #id = #value;

would be

mixin(T) mixin(id) = mixin(value);

I think that using the `#` character as a stand-in for mixin has been proposed before, and I think the idea has merit.  However, as the language exists today there's no way to support the level of mixin you are suggesting.  Mixin only works at higher levels like expressions and statements, so you could do things like:

mixin("const a = 1 + 2 + 3");

and even things like:

const a = 1 + mixin("2 + 3");

but not things like this:

mixin("const a") = 1 + 2 + 3;

Or things like

const mixin("a") = 1 + 2 + 3;

Mixin can only appear at certain places in the grammar because the compiler needs to be able to analyze the code surrounding it before evaluating what's inside the mixin. This way you can use your own code at compile-time to generate what's inside the mixin without depending on what the mixin does to your code.

In any case, with Adam Ruppe's full string interpolation proposal, you're example can be done with:

mixin("$T $id = $value;");

June 10, 2019
On Sunday, 9 June 2019 at 07:05:26 UTC, Jonathan Marler wrote:
> On Sunday, 9 June 2019 at 05:24:29 UTC, Amex wrote:
>> On Saturday, 8 June 2019 at 22:23:34 UTC, Nick Sabalausky (Abscissa) wrote:
>>> On 6/8/19 5:28 PM, Amex wrote:
>>>> I and many others write a code that uses string mixins simply to define a symbol based on others:
>>>> 
>>>> mixin(T~" "~id~" = "~value~");");
>>>> 
>>>> 
>>>> This is very ugly and it is impossible to debug properly.
>>>> 
>>>> Why not allivate this issue? Is is so common that there can be a short hand syntax that the compiler can decode naturally.
>>>> 
>>>
>>> There is work being done on the idea of interpolated strings:
>>>
>>> https://github.com/dlang/dmd/pull/7988
>>
>> This is not what I'm talking about. Similar but different. I'm specifically talking about usage to replace string mixins.
>>
>> I'm not really talking about string interpolation at all. It's symbol substitution.
>>
>> I've already given examples.
>
> So it sounds like you want `#X` to lower to `mixin(X)`? So in your example
>
> #T #id = #value;
>
> would be
>
> mixin(T) mixin(id) = mixin(value);
>
> I think that using the `#` character as a stand-in for mixin has been proposed before, and I think the idea has merit.  However, as the language exists today there's no way to support the level of mixin you are suggesting.  Mixin only works at higher levels like expressions and statements, so you could do things like:
>
> mixin("const a = 1 + 2 + 3");
>
> and even things like:
>
> const a = 1 + mixin("2 + 3");
>
> but not things like this:
>
> mixin("const a") = 1 + 2 + 3;
>
> Or things like
>
> const mixin("a") = 1 + 2 + 3;
>
> Mixin can only appear at certain places in the grammar because the compiler needs to be able to analyze the code surrounding it before evaluating what's inside the mixin. This way you can use your own code at compile-time to generate what's inside the mixin without depending on what the mixin does to your code.
>
> In any case, with Adam Ruppe's full string interpolation proposal, you're example can be done with:
>
> mixin("$T $id = $value;");


There is nothing that would stop the language from doing what I'm suggesting...


You agree that every statement S could be rewritten to mixin("S"); and then evaluated by the compiler?

If so, then it can do what I suggest.

1. Scans statement for any symbol starting with special char(#,$, or whatever).

2. If the statement has such a symbol then the statement is wrapped in the mixin but all the symbols are expanded first.


.e.g,


$T $id = $value;

$ is found in front of a symbol.

simply expand symbols

mixin(T~" "~id~" = "~value~";");

with the appropriate to!string's when needed.

It is simply a rewrite rule that simplifies meta code and makes certain string mixins more readable by making them look like standard code. It makes it easier to debug. It is only for symbol substitution so there is always a 1 to 1 relation. Any errors in the resolved string can be mapped back in to the original statement.


It's purpose is to get rid of explicit mixin entirely for certain types of very common string mixin statements.



$T $id = $value;

can be lifted to mixin without any compiler knowledge. It is a direct map. The only issues may be in determining how to convert the symbols to strings... but in most cases to!string will work fine.



June 10, 2019
On Monday, 10 June 2019 at 08:53:56 UTC, Amex wrote:
> It's purpose is to get rid of explicit mixin entirely for certain types of very common string mixin statements.

Show me a real world example you've written where you would use this. I'm currently convinced it is a harmful antipattern...
June 10, 2019
On Monday, 10 June 2019 at 08:53:56 UTC, Amex wrote:
> On Sunday, 9 June 2019 at 07:05:26 UTC, Jonathan Marler wrote:
>> On Sunday, 9 June 2019 at 05:24:29 UTC, Amex wrote:
>>> On Saturday, 8 June 2019 at 22:23:34 UTC, Nick Sabalausky (Abscissa) wrote:
>>>> On 6/8/19 5:28 PM, Amex wrote:
>>>>> I and many others write a code that uses string mixins simply to define a symbol based on others:
>>>>> 
>>>>> mixin(T~" "~id~" = "~value~");");
>>>>> 
>>>>> 
>>>>> This is very ugly and it is impossible to debug properly.
>>>>> 
>>>>> Why not allivate this issue? Is is so common that there can be a short hand syntax that the compiler can decode naturally.
>>>>> 
>>>>
>>>> There is work being done on the idea of interpolated strings:
>>>>
>>>> https://github.com/dlang/dmd/pull/7988
>>>
>>> This is not what I'm talking about. Similar but different. I'm specifically talking about usage to replace string mixins.
>>>
>>> I'm not really talking about string interpolation at all. It's symbol substitution.
>>>
>>> I've already given examples.
>>
>> So it sounds like you want `#X` to lower to `mixin(X)`? So in your example
>>
>> #T #id = #value;
>>
>> would be
>>
>> mixin(T) mixin(id) = mixin(value);
>>
>> I think that using the `#` character as a stand-in for mixin has been proposed before, and I think the idea has merit.  However, as the language exists today there's no way to support the level of mixin you are suggesting.  Mixin only works at higher levels like expressions and statements, so you could do things like:
>>
>> mixin("const a = 1 + 2 + 3");
>>
>> and even things like:
>>
>> const a = 1 + mixin("2 + 3");
>>
>> but not things like this:
>>
>> mixin("const a") = 1 + 2 + 3;
>>
>> Or things like
>>
>> const mixin("a") = 1 + 2 + 3;
>>
>> Mixin can only appear at certain places in the grammar because the compiler needs to be able to analyze the code surrounding it before evaluating what's inside the mixin. This way you can use your own code at compile-time to generate what's inside the mixin without depending on what the mixin does to your code.
>>
>> In any case, with Adam Ruppe's full string interpolation proposal, you're example can be done with:
>>
>> mixin("$T $id = $value;");
>
>
> There is nothing that would stop the language from doing what I'm suggesting...
>
>
> You agree that every statement S could be rewritten to mixin("S"); and then evaluated by the compiler?

Yes I believe that's true.

>
> If so, then it can do what I suggest.
>
> 1. Scans statement for any symbol starting with special char(#,$, or whatever).
>
> 2. If the statement has such a symbol then the statement is wrapped in the mixin but all the symbols are expanded first.
>
>
> .e.g,
>
>
> $T $id = $value;
>
> $ is found in front of a symbol.
>
> simply expand symbols
>
> mixin(T~" "~id~" = "~value~";");
>
> with the appropriate to!string's when needed.
>

Ok I see how this is possible now.

> It is simply a rewrite rule that simplifies meta code and makes certain string mixins more readable by making them look like standard code. It makes it easier to debug. It is only for symbol substitution so there is always a 1 to 1 relation. Any errors in the resolved string can be mapped back in to the original statement.
>

It seems like a pretty specific set of syntax/semantics to add to the language. I can't think of alot of cases that would benefit from it. If this is a very common pattern it could be justified. Can you point to a few places in some real code that could benefit from this pattern?

>
> It's purpose is to get rid of explicit mixin entirely for certain types of very common string mixin statements.
>

Using mixin code shouldn't happen too often. Its powerfull but also comes with issues. Usually they are hidden away inside templates/functions in a library invisible to the user. Using a keyword like mixin makes them easy to find, and since they shouldn't happen too often, the extra characters don't create much of an issue.

However, your suggestion isn't as powerful or as dangerous as a full blown mixin, which makes them less concerning to have to keep track of, and it's still easy enough to search for `#`. So if it turns out to be useful enough then it could justify a language change. But I'm failing to think of useful cases for it. Can you share the cases you have thought this could be useful for? Pointing to existing code would be the best way to show them.