December 03, 2013
On Tue, Dec 03, 2013 at 09:19:34PM +0100, Brad Anderson wrote:
> On Tuesday, 3 December 2013 at 20:06:49 UTC, Walter Bright wrote:
> >On 12/3/2013 4:41 AM, Russel Winder wrote:
> >>Yes.
> >>
> >>	a + b
> >>
> >>could be set union, logic and, string concatenation. The + is just a message to the LHS object, it determines what to do. This is the whole basis for DSLs.

Ugh. Ugh, ugh, ugh. This beckons to that horrid decision in C++'s <iostream> of overloading << to mean "output" and >> to mean "input". The only redeeming quality about this is that << and >> are relatively rarely used in their original sense (bitwise shifts), so it doesn't cause as much of a cognitive dissonance as it otherwise might. But still. Ugh. There are just so many things wrong with this choice, not the least of which is the fact that the operator precedence of << and >> makes no sense when used as I/O operators -- because said operators were never intended to be I/O in the first place!! This leads to such fun as:

	int a, b;
	cout << a < b;	// what does this do?
			// (hint: it does NOT output the value of a < b)

Ugh!


> >Using operator overloading to create a DSL is just wrong. Part of the design of operator overloading in D is to deliberately frustrate such attempts.

+1.


> >+ should mean addition, not union, concatenation, etc. Overloading is there to support addition on user defined types, not to invent new meanings for it.

There's a C++ library that overloads the *comma operator* (!!) to allow
you to do things like this:

	// Creates a 3x4 matrix (!)
	A = 1, 2,  3, 4,
	    5, 6,  7, 8,
	    9, 10, 11, 12;

Now, this particular example looks rather cute, but let's say we want to compute matrix elements as we construct it:

	// Creates a 3x4 matrix (what, really?!)
	A = x++, y++, z++, f(x+y),
	    y+2*x-z, 4*y, 5*(z-y*x),
	    f(x)-f(y), f(z), g(x), 0;

Seriously?? Anyone who understands what a comma operator is (which is itself already a Bad Idea) might imagine this is but a needlessly obscure way of setting A to 0 while performing a whole bunch of side-effects, in a way fitting for an IOCCC entry.

(And just in case you wonder: the dimensions of the matrix are determined beforehand. So technically, you *could* create a 3x4 matrix using this code:

	// Yes this is still a 3x4 matrix... and yes the first row
	// contains 1, 2, 3, 4, and the second row starts with 5.
	// Obvious, isn't it?
	A = 1,  2,  3,
	    4,  5,  6,
	    7,  8,  9,
	    10, 11, 12;

Or, indeed, this:

	// This is a 3x4 matrix too, even though it sure doesn't look
	// anything like it!!
	A = 1, 2, 3, 4,  5,  6,
	    7, 8, 9, 10, 11, 12;

Please, somebody tell me how this can even remotely be construed to be a good thing.)

Not to mention, the meaning of such code depends entirely up the type of A. What if I have another custom type that also overloads the comma operator, in a slightly different way? Then the semantics of the above snippets would be *completely* different yet again.

Now tell me again, why is C++ code so hard to maintain? Hmmm...


> >Embedded DSLs should be visually distinct, and D provides the ability for that with string mixins and CTFE.

String mixins + CTFE = teh r0ckz when it comes to DSLs.

After having experienced C++ for a decade or two, I've come to decide that operator overloading is a Bad Idea(tm), except when it applies strictly to custom numerical types that are intended to behave like built-in numerical types. All other uses of operator overloading are, strictly speaking, abusive, and lead to unmaintainable code. Yes, it's cute and clever, and lets you write things not supported by the language "directly", but the next person to inherit your code will curse your name when they spend 5 hours trying to figure out exactly why x+y didn't do what they thought it did. And that's just with *one* library that overloads operators in an unusual way. Now add a second, third, fourth library, each of which overloads the operators in an unusual way, and you might as well be submitting your code as IOCCC entries (except that they don't take C++ entries).

OTOH, I completely understand the desire for infix notation for
operators on custom types. If you're writing a set library, it sucks to
have to write a.union(b.intersection(c)) when what you *really* want is
to write: a ∪ (b ∩ c). Here is where D does it right: use a compile-time
string argument to a CTFE function that transforms this string into
code. Then you can write:

	Set a, b, c;
	auto d = mixin(SetExpr!"a ∪ (b ∩ c)");
		// The above line gets turned into:
		// auto d = a.union(b.intersection(c));
		// at compile-time.

So you can write your set expressions the "natural" way, *and* a new reader of your code will know to look for SetExpr's documentation to understand what the string argument does (not to mention it being amply clear that a DSL is involved here, rather than code that looks like normal numerical expressions but actually does something else).

This has even more benefits than fixing C++'s wrong approach, though:

For one thing, overloaded operators can't easily generate optimal code, because they just get translated into nested function calls. In order to be able to optimize, say, a ∪ a ∪ a into a no-op, in C++'s approach you'd have to resort to arcane black magic like expression templates to coax the compiler to do what you want. In D, you are parsing the expression as a *string*, which means you get to define how the string is parsed, and how it is to be transformed into code, *directly*. You can run the expression tree through an expression simplifier algorithm, for example, factor common subexpressions, reduce it using known identities, etc.. All of which, granted, can be done by expression templates, except with many more times the pain, proneness to bugs, and unmaintainability.

These string DSLs also let you define your own operators (like I did above) without needing to abuse existing operators like + and *, define your own operator precedence rules, define custom syntax without needing to twist and warp it to conform to host language syntax (like that C++ regex library, which honestly makes me cringe every time I look at its contorted syntax).


> >Part of my opinion for this comes from C++ regexes done using expression templates. It's cute and clever, but it's madness. For one, any sort of errors coming out of it if a mistake is made are awesomely incomprehensible. For another, there's no clue in the source code when one has slipped into DSL-land, and suddenly * doesn't mean pointer dereference, it means "0 or more".
> >
> >Utter madness.

Yeah, that library, while admittedly very clever, is total madness. It looks *nothing* like what regexen normally look like, does something completely unlike what its surface syntax might suggest, and is in pretty much every way very difficult to understand, and therefore hard to maintain and prone to bugs. In today's software development world, where there's too much code to comprehend and too little time to comprehend it, dissociating syntax from its usual meaning is just asking for maintenance nightmares.


> Indeed. I had a regex bottleneck in a C++ program so I figured I'd just convert it to Boost Xpressive as an easy solution. It took me half a day to convert the regular expression into the convoluted single line of code with dozens of operators it became. It did run faster (phew!) so it was worth it but the code is unrecognizable as a regular expression and I have to keep a comment with the original regular expression in the code because nobody (myself included) should have to spend an ungodly amount of time trying to decipher the cryptic source code it became.
> 
> If my program were written in D I would have just replaced "regex(" with "ctRegex!(" and moved on with my day.

Yeah!! Props to std.regex!


T

-- 
Why can't you just be a nonconformist like everyone else? -- YHL
December 04, 2013
On Tuesday, 3 December 2013 at 22:28:26 UTC, H. S. Teoh wrote:
>[snip] Then you can write:
>
> 	Set a, b, c;
> 	auto d = mixin(SetExpr!"a ∪ (b ∩ c)");
> 		// The above line gets turned into:
> 		// auto d = a.union(b.intersection(c));
> 		// at compile-time.
>
> So you can write your set expressions the "natural" way, *and* [snip]

This would make for a good blog post/wiki article.  Does one already exist?


December 04, 2013
Joshua Niehus:

> This would make for a good blog post/wiki article.  Does one already exist?

If you have a AST macros like in Julia language, I think you can write something like:

@setExpr(a ∪ (b ∩ c));

The main difference is that the compiler gives you a tree in the macro to work on, instead of a string to parse and munge.

Bye,
bearophile
December 04, 2013
On 2013-12-03 21:06, Walter Bright wrote:

> Embedded DSLs should be visually distinct, and D provides the ability
> for that with string mixins and CTFE.

The point of DSL's are to make a languages that work optimal and look appropriate for the given domain. Not necessarily make it distinct from standard D.

-- 
/Jacob Carlborg
December 04, 2013
On 12/3/13 7:23 PM, monarch_dodra wrote:
> On Tuesday, 3 December 2013 at 20:09:52 UTC, Ary Borenszweig wrote:
>> On 12/3/13 4:53 PM, Andrei Alexandrescu wrote:
>>> On 12/3/13 4:41 AM, Russel Winder wrote:
>>>> On Tue, 2013-12-03 at 13:29 +0100, Tobias Pankrath wrote:
>>>> […]
>>>>> Does scala have arbitrary operators like Haskell? Looks useless
>>>>> in D. If you have an operator '+' that should not be pronounced
>>>>> 'plus' you are doing it wrong.
>>>>
>>>> Yes.
>>>>
>>>>    a + b
>>>>
>>>> could be set union, logic and, string concatenation. The + is just a
>>>> message to the LHS object
>>>
>>> or RHS :o).
>>
>> How come?
>
> "opBinaryRight":
> http://dlang.org/operatoroverloading.html
>
> It's a "neat" feature that allows operators being member functions, yet
> still resolve to the right hand side if needed. For example:
> auto result = 1 + complex(1, 1);
>
> Will compile, and be re-written as:
> auto result = complex(1, 1).opBinaryRight!"+"(1);
>
> In contrast, C++ has to resort to non-member friend operators to make
> this work.

That's nice.

Of course, it's not needed if you overload "+" for the int type to receive a complex.
December 04, 2013
On Wednesday, 4 December 2013 at 13:39:32 UTC, Ary Borenszweig wrote:
> That's nice.
>
> Of course, it's not needed if you overload "+" for the int type to receive a complex.

The point is that D does not have operator overloading for in-built types. The unnecessary one is the global operator overload you suggest, as it is more intrusive than `opBinaryRight`.
December 04, 2013
On Wed, Dec 04, 2013 at 04:23:59AM +0100, bearophile wrote:
> Joshua Niehus:
> 
> >This would make for a good blog post/wiki article.  Does one already exist?
> 
> If you have a AST macros like in Julia language, I think you can write something like:
> 
> @setExpr(a ∪ (b ∩ c));
> 
> The main difference is that the compiler gives you a tree in the macro to work on, instead of a string to parse and munge.
[...]

The problem with having the compiler parse it is that it has to be in a syntax understood by the compiler. If your DSL needs a radically different syntax, it won't work (e.g., regex: how is the compiler to know '+' is a postfix operator instead of an infix one?).

By having a compile-time string as input, you have maximum flexibility. It's essentially writing a mini-compiler embedded in D, because it runs in CTFE.


T

-- 
You are only young once, but you can stay immature indefinitely. -- azephrahel
December 04, 2013
On Wed, Dec 04, 2013 at 08:44:17AM +0100, Jacob Carlborg wrote:
> On 2013-12-03 21:06, Walter Bright wrote:
> 
> >Embedded DSLs should be visually distinct, and D provides the ability for that with string mixins and CTFE.
> 
> The point of DSL's are to make a languages that work optimal and look appropriate for the given domain. Not necessarily make it distinct from standard D.
[...]

Of course, it's not the *point* of DSLs to be distinct from the host language, but it's a good idea for it to be. Operator overloading that turns + and * into something completely unlike their usual meanings violates the principle of least surprise. A CTFE-string containing + and * interpreted differently is better, because the syntax itself reminds you that something unlike normal D syntax is happening.

	// (D) It's clear * and + means something different:
	auto m = input.match(ctRegex!`^a+b*c`);

	// (C++) What on earth might this mean?!
	sregex r = (s1= +_w) >> ' ' >> (s2= +_w) >> '!';


T

-- 
Computers aren't intelligent; they only think they are.
December 04, 2013
On Tuesday, 3 December 2013 at 20:20:40 UTC, deadalnix wrote:
> On Tuesday, 3 December 2013 at 19:41:46 UTC, Andrei Alexandrescu wrote:

> Arguably, optional () and the mess involved around fall into the category of opaque and unclear syntax.

yes, that was a trap
December 04, 2013
On Tuesday, 3 December 2013 at 20:42:01 UTC, Paulo Pinto wrote:
> Am 03.12.2013 16:49, schrieb eles:
>> On Tuesday, 3 December 2013 at 14:25:50 UTC, Paulo Pinto wrote:
>>> On Tuesday, 3 December 2013 at 12:41:40 UTC, Russel Winder wrote:
>>>> On Tue, 2013-12-03 at 13:29 +0100, Tobias Pankrath wrote:

> It is my daily German creeping into my English, uni => University.

it is my daily D rant, uni => unicode