December 03, 2019
On Tuesday, 3 December 2019 at 15:05:14 UTC, Jonathan M Davis wrote:
> There are definitely people who use token strings in their code when writing string mixins

Token strings are q{ }, this is about the delimited strings like q"xxx .... xxx" and q"( lll )";
December 03, 2019
On Tuesday, 3 December 2019 at 15:05:14 UTC, Jonathan M Davis wrote:
> There are definitely people who use token strings in their code when writing string mixins, because it makes it so that the code in the strings actually gets syntax highlighting like normal code does instead of being displayed as a string.

I don't propose deprecating token strings, only the identifier delimited ones, which get highlighted as strings.

```
string s = q{
this is fine
};

string t = q"EOS
this is not fine
EOS";
```
December 03, 2019
On Tuesday, 3 December 2019 at 15:05:14 UTC, Jonathan M Davis wrote:
> [...]
>
> There are definitely people who use token strings in their code when writing string mixins, because it makes it so that the code in the strings actually gets syntax highlighting like normal code does instead of being displayed as a string. I expect that a number of people would be quite unhappy to not be able to do that anymore.

This DIP explicitly doesn't deprecate token strings, only
identifier-delimited strings and character-delimited strings.

December 03, 2019
On Tuesday, December 3, 2019 8:09:19 AM MST Dennis via Digitalmars-d wrote:
> On Tuesday, 3 December 2019 at 15:05:14 UTC, Jonathan M Davis
>
> wrote:
> > There are definitely people who use token strings in their code when writing string mixins, because it makes it so that the code in the strings actually gets syntax highlighting like normal code does instead of being displayed as a string.
>
> I don't propose deprecating token strings, only the identifier delimited ones, which get highlighted as strings.
>
> ```
> string s = q{
> this is fine
> };
>
> string t = q"EOS
> this is not fine
> EOS";
> ```

Ah. Clearly, I glanced over it all too quickly. I confess that that particular type of string literal seems useless to me. I don't think that I've ever seen anyone use them, and I'd be even less interested in using them than token strings. I don't feel particularly strongly about whether we remove them from the language, but if we were talking about adding them, I'd certainly be against it.

- Jonathan M Davis



December 03, 2019
On Tue, Dec 03, 2019 at 07:38:29AM -0500, Andrei Alexandrescu via Digitalmars-d wrote:
> On 12/3/19 4:03 AM, Mike Parker wrote:
> > This is the feedback thread for the first round of Community Review for DIP 1026, "Deprecate Context-Sensitive String Literals":
> > 
> > https://github.com/dlang/DIPs/blob/a7199bcec2ca39b74739b165fc7b97afff9e29d1/DIPs/DIP1026.md
> 
> This DIP is a non-starter. Here documents are easily and effectively handled during lexing and have no impact on the language grammar.
[...]

When I read the title "context-sensitive string literals" I was wondering what part of D actually has strings whose interpretation changes depending on context.  I was shocked to discover that it was referring to heredoc strings.

Please don't get rid of heredoc strings. I use them quite a bit, because I work a lot with code generators. They are a refreshing change from C/C++ where trying to quote a piece of code as a string requires Leaning Toothpick Syndrome (i.e., \'s all over the place to escape quoted string metacharacters). I do *not* want to return to that nastiness, thank you very much.

As Andrei said, heredoc string are trivial to parse because they are essentially a single big token.  This should not pose any problem for the parser at all.  The argument in the DIP is flawed because, at the level of a lexer/parser, a heredoc string is no different from a delimited string: it starts with a sequence of one or more characters (the opening delimiter), spans some arbitrary number of characters (the string content) until another sequence of one or more characters (the closing delimiter).  Nothing stops someone from writing a 50,000-character double-quoted string, for example, and the lexer/parser will handle it just fine.  So why the hate against heredoc strings? Arguably, heredoc strings are exactly what *solves* the problem of 50,000-character strings being essentially unreadable to a human reader because of poor formatting.

As for poor syntax highlighting as mentioned in the DIP, how is that even a problem with the language?! It's a strawman argument based on skewed data obtained from badly-written lexers that don't actually lex D code correctly. It should be the syntax highlighter that should be fixed, rather than deprecate an actually useful feature in the language.

Not to mention, the long list of projects at the end that will need to be updated, which includes dmd itself BTW, looks like strong evidence of good use of such string literals, rather than marginal use that might be construed to be a reason for deprecation.

And most importantly of all: string literals are *single tokens* in the language. They are lexical units, and therefore have nothing whatsoever to do with the grammar being context-free or not.  We're shooting at the wrong target here.


T

-- 
Famous last words: I wonder what will happen if I do *this*...
December 03, 2019
On 12/3/19 9:45 AM, Dennis wrote:
> On Tuesday, 3 December 2019 at 12:38:29 UTC, Andrei Alexandrescu wrote:
>> Waste of labor is sadly a common theme in our community.
> 
> I consider this low-hanging fruit: just deprecating a token takes little implementation effort, and reduction in language complexity is (as far as I know) always welcome

These can never be the primary reasons for removing a feature. One doesn't remove a feature because it's easy to remove. One removes a feature because there are good reasons to remove it, and as perks we get simplification of the language and maybe it's easy to remove.

> In this case, such tools would be syntax highlighters.

The entire narrative of the DIP puts CFG front and center. Reader's first thought is, "wait, the author is confused about what a CFG is."

FIRST sentence in the abstract: "D is intended to have a context-free grammar..."

FIRST paragraph in the rationale: "Regarding language design, Walter Bright has stated: [... CFG stuff ...]"

Even the "Grammar Changes" section should be a give-away: the diff proposed is in the LEXICAL definition (https://dlang.org/spec/lex.html), not in the GRAMMAR definition (https://dlang.org/spec/grammar.html).

If syntax highlighters are the primary reason for the DIP, it should be the primary reason in the DIP. The entire rationale needs to be redone. There should be an enumeration of syntax highlighters along with their success/failure of implementing heredocs. (Didn't test all but far as I can tell I've never heard of difficulties with implementing heredocs for bash, perl and the like.)

> Maybe you don't care about syntax highlighting, but please judge this DIP by its own merits and not compared to potential other DIPs that you care more about.

A DIP ought to be judged by reading the DIP. This DIP is ill informed because it is built around the CFG argument, a non-existing issue. If the DIP requires a forum post explaining how it needs to be judged, that's a problem with the DIP, not the reader.
December 03, 2019
On Tuesday, 3 December 2019 at 18:34:22 UTC, H. S. Teoh wrote:
> On Tue, Dec 03, 2019 at 07:38:29AM -0500, Andrei Alexandrescu via Digitalmars-d wrote:
>> On 12/3/19 4:03 AM, Mike Parker wrote:
>> > This is the feedback thread for the first round of Community Review for DIP 1026, "Deprecate Context-Sensitive String Literals":
>> > 
>> > https://github.com/dlang/DIPs/blob/a7199bcec2ca39b74739b165fc7b97afff9e29d1/DIPs/DIP1026.md
>> 
>> This DIP is a non-starter. Here documents are easily and effectively handled during lexing and have no impact on the language grammar.
>
[...]
>
> As Andrei said, heredoc string are trivial to parse because they are essentially a single big token.  This should not pose any problem for the parser at all.

By definition, a context-free grammar is defined in terms of a finite set of non-terminal symbols (i.e., tokens). [1] The set of all string literals is infinite. Therefore, either string literals are not tokens, or D's grammar is not context-free.

[1] https://en.wikipedia.org/wiki/Context-free_grammar#Formal_definitions
December 03, 2019
On Tuesday, 3 December 2019 at 19:42:12 UTC, Andrei Alexandrescu wrote:
> These can never be the primary reasons for removing a feature. One doesn't remove a feature because it's easy to remove. One removes a feature because there are good reasons to remove it, and as perks we get simplification of the language and maybe it's easy to remove.

The DIP mentions:
- D's flagship parser generator Pegged can't express the D grammar (without user defined parser functions)
- Syntax highlighters such as the one on Rosetta code have trouble with it
- there is precedent of deprecating hexstring literals

I'll admit that the rationale section is not clear in the "primary reasons" to remove it, but I considered reducing language complexity an obvious win.

Every feature is a trade off between what it brings to the table and what it costs, and when it turns out the benefit of a feature is low it gets removed, even when it's not inherently problematic. That's what happened with .sort, .reverse, Floating point NCEG operators, octal literals, hexstring literals, escape string literals.

Please answer this: Do you think there were good reasons to deprecate hexstring literals, or do you consider that a mistake / unnecessary?

> FIRST paragraph in the rationale: "Regarding language design, Walter Bright has stated: [... CFG stuff ...]"
>
> Even the "Grammar Changes" section should be a give-away: the diff proposed is in the LEXICAL definition (https://dlang.org/spec/lex.html), not in the GRAMMAR definition (https://dlang.org/spec/grammar.html).

And the very first thing on the grammar page is:

> 3.1 Lexical Syntax

With a link to the lexical grammar page. I consider lexical grammar part of "the grammar of D", even when the lexer and parser are separate stages in the compiler. You might say Walter was exclusively talking about parsing grammar and not lexing grammar, but considering this part of the quote:

> A context free grammar, besides making things a lot simpler, means that IDEs can do syntax highlighting without integrating in most of a compiler front end

It mentions syntax highlighting which does not require parsing.

> If syntax highlighters are the primary reason for the DIP, it should be the primary reason in the DIP.

I don't want to commit to it as 'the primary reason', but I will put more emphasis on it in the next iteration.

> If the DIP requires a forum post explaining how it needs to be judged, that's a problem with the DIP, not the reader.

Your first reply came across as "this is useless, please work on something else".
That felt like a destructive comment. This reply actually has constructive feedback, which helps. Thanks for that.

I will be more specific when talking about 'the grammar', give some more focus on syntax highlighters and maybe dive more into the precedent of reducing language complexity by removing features.
December 03, 2019
On Tue, Dec 03, 2019 at 02:45:31PM +0000, Dennis via Digitalmars-d wrote:
> On Tuesday, 3 December 2019 at 12:38:29 UTC, Andrei Alexandrescu wrote:
> > Waste of labor is sadly a common theme in our community.

That's a bit uncalled for.


> I consider this low-hanging fruit: just deprecating a token takes little
> implementation effort, and reduction in language complexity is (as far as I
> know) always welcome for the usual reasons:
> - less code in dmd
> - less specification text
> - less didactic material / stuff to learn for new D programmers
> - less bug/enhancement reports
> - any tool that re-implements some part of the compiler is easier to make

Agreed, but that can't be the only criterion for removing a feature. By the same argument, one could make the case for removing templates from D. Bingo, the language instantly becomes so much easier to parse! And it greatly simplifies the compiler -- we can delete large sections of it, in fact! The spec becomes simpler, D newbies don't need to learn this hard template stuff anymore, and we can close all template-relateed bugs, and tools become greatly simplified.


> In this case, such tools would be syntax highlighters. There are lots
> of syntax highlighting implementations for D, just a few off the top
> off my head:
> - GitHub
> - Code-d
> - Kate
> - Atom
> - Sublime
> - Chroma
> - Vim
> - Emacs
> - Notepad++
> - ...
> 
> They all tend to use their own domain specific language, and I'm pretty sure most of them are not powerful enough to express identifier-delimited strings.

Are you sure? Adam just gave an example of correct heredoc highlighting in vim.  It may not be *trivial*, but it's possible.  And users don't have to worry about it, somebody writes the snippet once for all, and everyone else can just reuse it.


[...]
> If we don't want D support in syntax highlighters to be half-baked everywhere, keeping the lexical grammar simple is a good cause.

IOW, implementators aren't competent enough to implement something up to spec, therefore we should dumb down the spec for their sake? Sounds like a backwards reason for doing something.


> I can improve the rationale for this DIP with examples like in this post, though if you're absolutely adamant that this is a waste of effort then that won't help obviously.
> 
> Maybe you don't care about syntax highlighting, but please judge this DIP by its own merits and not compared to potential other DIPs that you care more about.

The problem with this DIP is that it removes a marginal feature for no good rationale, breaking a pretty long list of existing D code projects that depend on said feature, while offering very little in return (nothing that can't be fixed another way, e.g., fix broken syntax highlighters so that they work properly(!)). And it does so without considering why this feature might have been added in the first place, what kind of problems it solves, and how said problems can be mitigated if the feature was removed.

As I've already said, I work a lot with code generators and other code that embed long-ish text passages in code.  Heredoc syntax is ideal for this sort of code, allowing you to temporarily "escape" from D syntax and write code snippets as-is, rather than require onerous escaping which makes said text less readable. E.g., if I want to embed a mini Perl script inside a function, I couldn't write it as a token string (some Perl tokens are not D tokens), and writing it as a quoted string induces Leaning Toothpick Syndrome, making it hard to edit the script. The script itself is short enough it doesn't seem worth creating it as a separate file (and then needing to fight with paths to find it in the right place).  Heredoc syntax lets me just write the danged script in situ and move on already, instead of fighting with Leaning Toothpick Syndrome or heaping on yet another layer of pathname resolution code just to find a miserable 5-line script file.

Same goes for embedded long-ish text (don't have to type ""~ all over
the place), etc..

It's marginal, yes, but heredocs are quite useful for the use cases they were intended to be used, and I really don't see why they should be singled out among so many other things that D could stand to improve in.


T

-- 
If the comments and the code disagree, it's likely that *both* are wrong. -- Christopher
December 03, 2019
On Tue, Dec 03, 2019 at 08:40:14PM +0000, Paul Backus via Digitalmars-d wrote:
> On Tuesday, 3 December 2019 at 18:34:22 UTC, H. S. Teoh wrote:
[...]
> > As Andrei said, heredoc string are trivial to parse because they are essentially a single big token.  This should not pose any problem for the parser at all.
> 
> By definition, a context-free grammar is defined in terms of a finite set of non-terminal symbols (i.e., tokens). [1] The set of all string literals is infinite. Therefore, either string literals are not tokens, or D's grammar is not context-free.
[...]

I think you're imposing a needlessly literal(!) interpretation of context-free grammars.  For example, integer literals are also unbounded (there is no largest integer, therefore the set of integer literals is infinite). Does that mean that a calculator program that includes integer literals in its grammar is not context-free?  I think that's a preposterous application of the definitions.

As far as the grammar is concerned, all integer literals are the same terminal symbol, because the grammar does not (need to) distinguish between them.

Treating string (or any other) literals as non-tokens makes no sense because they are not symmetric with non-string (or other) tokens, e.g., D tokens allow arbitrary whitespace between them, yet you cannot arbitrarily insert whitespace into a string literal without changing its semantics.


T

-- 
Time flies like an arrow. Fruit flies like a banana.