December 03, 2019
On 12/3/19 3:51 PM, Dennis wrote:
> Please answer this: Do you think there were good reasons to deprecate hexstring literals, or do you consider that a mistake / unnecessary?

It was great primarily because it was a built-in feature made unnecessary by improvements to the language.

It would be a mistake to presuppose that hex string literals are a good precedent, however. Heredocs have no library alternative. The DIP would not be helped by attempting a parallel.

> Your first reply came across as "this is useless, please work on
> something else". That felt like a destructive comment. This reply
> actually has constructive feedback, which helps. Thanks for that.
> 
> I will be more specific when talking about 'the grammar', give some
> more focus on syntax highlighters and maybe dive more into the
> precedent of reducing language complexity by removing features.

The destructive comment was actually more useful than one that prompts improvements to this DIP. Even if executed to perfection the impact would be null.

Let me ask this question: what would be a nice way to convey "this is useless, please work on something else"?
December 03, 2019
On 12/3/19 4:04 PM, H. S. Teoh wrote:
> I think you're imposing a needlessly literal(!) interpretation of
> context-free grammars.

I feared that would happen. When I drafted the initial answer, I had this text: "Subject to the way the grammar is defined across lexical tokens and higher-level constructs, yes, one could build a theoretical argument that heredocs are a context-dependent construct." Then I removed it to avoid divagating. Now, here we are.
December 03, 2019
On Tue, Dec 03, 2019 at 03:09:19PM +0000, Dennis via Digitalmars-d wrote: [...]
> I don't propose deprecating token strings, only the identifier delimited ones, which get highlighted as strings.
> 
> ```
> string s = q{
> this is fine
> };
> 
> string t = q"EOS
> this is not fine
> EOS";
> ```

The problem is that token strings require the contents to be *D tokens*. So if I need to emit snippets of another language, I'm out of luck, and have to resort to quoted strings and Leaning Toothpick Syndrome.

I oppose this DIP.

1) It puts undue focus on a marginal, non-intrusive language feature and
   makes it seem as if it's a primary cause of tooling problems (it does
   add some complexity, no doubt, but let's not make mountains out of
   molehills here);

2) It places the blame of the syntax highlighting issue at the wrong
   place: syntax highlighters should be fixed, not the other way round.

3) It does not adequately strive to understand why heredoc syntax was
   introduced in the first place, where/when it might be useful, and how
   to mitigate the problems heredoc syntax solves if we were to remove
   it;

4) It breaks a pretty long list of existing D projects, yet does not
   provide strong enough benefits to justify this breakage (doubly so
   for me, because I don't use syntax highlighters to begin with, so for
   me this is all loss and no gain);

5) The breakage does not unquestionably improve code, in fact, I can
   already see many cases for which it makes code *less* readable;

6) The amount of work it will take to rewrite heredoc literals far
   outweighs any small benefits this DIP might bring (and in my case,
   it's work for *no* benefit).


T

-- 
Claiming that your operating system is the best in the world because more people use it is like saying McDonalds makes the best food in the world. -- Carl B. Constantine
December 03, 2019
On Tuesday, 3 December 2019 at 21:04:52 UTC, H. S. Teoh wrote:
> Treating string (or any other) literals as non-tokens makes no sense because they are not symmetric with non-string (or other) tokens, e.g., D tokens allow arbitrary whitespace between them, yet you cannot arbitrarily insert whitespace into a string literal without changing its semantics.

Just change the syntax to q"delimiter .... retimiled" and I believe it will be context free... IIRC.

So yeah, I agree. CFG is not a the right argument. Never understood why people are so enarmoured by them, parsers are far more powerful today than they used to be. The human should be the important factor when designing syntax, not the parser...

Also, not sure if it is context free if you include comments... But I could be wrong, and again I don't think it should matter...

December 03, 2019
On Tue, Dec 03, 2019 at 04:14:47PM -0500, Andrei Alexandrescu via Digitalmars-d wrote:
> On 12/3/19 4:04 PM, H. S. Teoh wrote:
> > I think you're imposing a needlessly literal(!) interpretation of
> > context-free grammars.
> 
> I feared that would happen. When I drafted the initial answer, I had this text: "Subject to the way the grammar is defined across lexical tokens and higher-level constructs, yes, one could build a theoretical argument that heredocs are a context-dependent construct." Then I removed it to avoid divagating. Now, here we are.

Yes, sigh, I can see it already: this thread is going to be another of those interminably-long debates and nitpicking over technicalities, and at the end of it all, this DIP will fall by the wayside and we will have accomplished nothing.


T

-- 
Ph.D. = Permanent head Damage
December 03, 2019
On Tuesday, 3 December 2019 at 21:21:30 UTC, Ola Fosheim Grøstad wrote:
> On Tuesday, 3 December 2019 at 21:04:52 UTC, H. S. Teoh wrote:
>> Treating string (or any other) literals as non-tokens makes no sense because they are not symmetric with non-string (or other) tokens, e.g., D tokens allow arbitrary whitespace between them, yet you cannot arbitrarily insert whitespace into a string literal without changing its semantics.
>
> Just change the syntax to q"delimiter .... retimiled" and I believe it will be context free... IIRC.

That was a joke! Don't argue it...

December 03, 2019
On Tuesday, 3 December 2019 at 14:45:31 UTC, Dennis wrote:
> On Tuesday, 3 December 2019 at 12:38:29 UTC, Andrei Alexandrescu wrote:
>> [...]
>
> I consider this low-hanging fruit: just deprecating a token takes little implementation effort, and reduction in language complexity is (as far as I know) always welcome for the usual reasons:
> - less code in dmd
> - less specification text
> - less didactic material / stuff to learn for new D programmers
> - less bug/enhancement reports
> - any tool that re-implements some part of the compiler is easier to make
>
> [...]

actually with textmate based grammars this is pretty easy to implement: https://github.com/Pure-D/code-d/blob/master/syntaxes/d.json#L2190-L2200
December 03, 2019
On Tuesday, 3 December 2019 at 18:34:22 UTC, H. S. Teoh wrote:
> So why the hate against heredoc strings?

I don't think you use the same terminology as the DIP so I might misinterpret this, but I have nothing against here documents. I'm glad D provides plenty of useful string literals for including text in source code, it's just that some of them are rarely used and bump up the complexity class of D's lexical grammar.

D has 6 types of string literals ("double quote" `back tick` r"r string" q{tokens} 	q"<brackets>" q”EOS ident EOS”) with 3 encoding options (char, wchar, dchar).
For comparison, Java has one. C# has two + interpolated strings.
There is a DIP for adding interpolated strings to D.

People are mentioning how D keeps adding adding features and is on a road towards C++ complexity. There is precedent for removing barely used features (see e.g. octal, escape or hexstring literals  on https://dlang.org/deprecate.html).

And of course there are always users that remorse the removal of their favorite feature, but in the long run everyone benefits from a simpler language.

As for your use case of code generation, I'm having trouble relating to it. I happened to write some code generation algorithms myself recently, and could do fine with q{} strings for large templates and regular "" or `` string for small token parts like "switch(".

- Do you truly have 50,000 character string literals in your code base?
- Can't you use bracket delimited strings instead, q"<like this?>"
- If accidental early termination in huge string literals is a concern, even an identifier-delimited string isn't always safe. Can't you use an `import()` statement on an external text file?
- If those 50,000 characters are code and you value readability of it, isn't it a problem that there is no syntax highlighting in a q"EOS EOS" string?
- Can you maybe post an example of some of your q"EOS EOS" strings used for code generation?

> As for poor syntax highlighting as mentioned in the DIP, how is that even a problem with the language?! It's a strawman argument based on skewed data obtained from badly-written lexers that don't actually lex D code correctly. It should be the syntax highlighter that should be fixed, rather than deprecate an actually useful feature in the language.

The thing is, these string literals simply can't be expressed in e.g. a PEG grammar. The D's grammar is one complexity class higher than needed just for this one relatively obscure string literal. Sure you can say "not our problem, those tooling authors just need to account for D's complexity", but I don't think that is useful for D's tooling ecosystem.

> Not to mention, the long list of projects at the end that will need to be updated, which includes dmd itself BTW, looks like strong evidence of good use of such string literals

dmd only uses them in the test-suite, same as libdparse.
I can spend some more time in the DIP exploring how other packages use them however.
December 03, 2019
On Tuesday, 3 December 2019 at 21:20:57 UTC, H. S. Teoh wrote:
> The problem is that token strings require the contents to be *D tokens*. So if I need to emit snippets of another language, I'm out of luck, and have to resort to quoted strings and Leaning Toothpick Syndrome.

Bracket-delimited string (q"[text]", allowing <>, [], (), and {} as delimiters) are still allowed and do not need to contain valid tokens.
December 03, 2019
On Tuesday, 3 December 2019 at 20:53:07 UTC, H. S. Teoh wrote:
>> *snipped various arguments to do with simplicity*
> Agreed, but that can't be the only criterion for removing a feature. By the same argument, one could make the case for removing templates from D. Bingo, the language instantly becomes so much easier to parse! And it greatly simplifies the compiler -- we can delete large sections of it, in fact! The spec becomes simpler, D newbies don't need to learn this hard template stuff anymore, and we can close all template-relateed bugs, and tools become greatly simplified.

That's clearly not a fair comparison.  Heredocs can be reduced to a set of local transformations, while templates cannot.  This means: code using heredocs can be mechanically changed to not use them, and heredocs do not make the language more expressive.

>> If we don't want D support in syntax highlighters to be half-baked everywhere, keeping the lexical grammar simple is a good cause.
> IOW, implementators aren't competent enough to implement something up to spec, therefore we should dumb down the spec for their sake? Sounds like a backwards reason for doing something.

The easier the language is to implement, the more implementors there will be.  If there are compelling reasons to include a language feature, and it makes implementation more difficult, it should be included regardless.  But that doesn't mean that ease of implementation should be completely ignored when considering language features.