December 03, 2019
On Tuesday, 3 December 2019 at 21:34:26 UTC, Dennis wrote:
> The thing is, these string literals simply can't be expressed in e.g. a PEG grammar.

Can't you use a lexer with a PEG parser?

December 03, 2019
On Tuesday, 3 December 2019 at 21:11:49 UTC, Andrei Alexandrescu wrote:
> Let me ask this question: what would be a nice way to convey "this is useless, please work on something else"?

If you truly wanted to convey that, you did a good job. But I do wonder how you expected me to take that. I would not reply "Got it, be right back, I'll e-mail Mike immediately and cancel this DIP and terminate all my effort so far right here.". Not after three comments in review round 1.

Even if this DIP is a failure, we could at least try to salvage some lessons from it. Why is it a bad DIP? What criteria should a language feature have to be candidate for removal, and why don't context-sensitive string literals fit those criteria? What sources of language complexity can be removed instead?

December 03, 2019
On Tuesday, 3 December 2019 at 09:03:44 UTC, Mike Parker wrote:
> This is the feedback thread for the first round of Community Review for DIP 1026, "Deprecate Context-Sensitive String Literals":
>
> https://github.com/dlang/DIPs/blob/a7199bcec2ca39b74739b165fc7b97afff9e29d1/DIPs/DIP1026.md
>
> All review-related feedback on and discussion of the DIP should occur in this thread. The review period will end at 11:59 PM ET on December 17, or when I make a post declaring it complete.
>
> At the end of Round 1, if further review is deemed necessary, the DIP will be scheduled for another round of Community Review. Otherwise, it will be queued for the Final Review and Formal Assessment.
>
> Anyone intending to post feedback in this thread is expected to be familiar with the reviewer guidelines:
>
> https://github.com/dlang/DIPs/blob/master/docs/guidelines-reviewers.md
>
> *Please stay on topic!*
>
> Thanks in advance to all who participate.

1) Are there any examples of strings that don't have an in-source code workaround if this dip is accepted?

2) the link in rosetta code shows a lot of the languages with funky parsing. So I'm not sure that proves anything.

3) how much less complex does the parser actually get? Is it trivial?

December 03, 2019
On Tuesday, 3 December 2019 at 22:11:22 UTC, Dennis wrote:
> Even if this DIP is a failure, we could at least try to salvage
> some lessons from it. Why is it a bad DIP?

Bad motivation and bad construction. The bad construction is
apparently that HERE docs do not actually conflict with context
free grammars and that the entire point of the DIP is moot.
That wasn't obvious to me; I was mainly thinking "I guess it's
assumed that dmd will compile faster with this?"

I think the bad motivation is more interesting, even though a
lot of this is how I received your DIP rather than how you may
have definitely meant it:

1. "Less is always better." Not stated in the DIP, but in your
defense of it here:

  reduction in language complexity is (as far as I know) always welcome for the usual reasons:
  - less code in dmd
  - less specification text
  - less didactic material / stuff to learn for new D programmers
  - less bug/enhancement reports
  - any tool that re-implements some part of the compiler is easier to make

Less should have a *point*, though. Much code, specification,
and most importantly didactic material is already written. I
have physical bound books within arms reach of me that discuss
these features. Removing the feature doesn't make these books
easier to write, it just makes it more annoying for people to
read them, as they're introduced to deprecated features. It
makes "other people's code" slightly more annoying to consider,
as you may have to update that code to remove since-deprecated
features.

Removing HERE docs doesn't create a python2/python3 or a
perl5/perl6 situation, but it still forks the language and the
old language still does not simply or automatically disappear.
I really dislike this about C++: that no matter how modern it
gets, there will be these huge carbon-dated layers of code out
there that are pre-modern and that can hardly be understood
without also learning the stuff that the modern features are
supposed to have replaced.

If a feature were to be judged a mistake, it can still be a
mistake to remove the feature later on. Less is not always
better.

2. D's problem is "too many features" -> let's remove any
feature that we can -> this DIP as step #1, remove something
that looks relatively easy to remove.

How much agreement do you think there is on the first point?

Consider the "remove ~= from arrays" DIP. It removed a
feature, and removing the feature arguably materially improved
D's options to evolve as a language, and it got a really
incensed negative response.

A human engineer can improve a machine by shutting it down,
tearing it apart, making an improvement, and putting it back
together again. This interruptability of the engineered system
is one of the characteristics of human engineering, along with
"use dry materials" and "use stiff materials", that
distinguishes it from what you might call engineering by Mother
Nature, who uses wet materials, and flexible materials, and
whose works (even if they pull some tricks like molting or
entering a cocoon) must continue to stay alive even as they
undergo radical changes in form.

A DIP can't kill D, take it apart, make an improvement, and
then put it back together again, because then all the users
will be gone. Language design is more like natural engineering
in this way.

If part of D's problems is that it has a lot of features, the
best way forward can still not be to remove them.

3. "Walter said a thing about D, but a StackOverflow comment
refuted that, so the language should change so that this
criticism is no longer true."

https://stackoverflow.com/a/7083615

Geez. Someone who thinks D has "an obnoxious amount of ambiguity"
is definitely still going to think that after HERE docs are gone.

December 03, 2019
On Tuesday, 3 December 2019 at 23:13:16 UTC, aliak wrote:
> 1) Are there any examples of strings that don't have an in-source code workaround if this dip is accepted?

Considering escape sequences such as "\x0B" and string concatenation with ~, any string literal can still be expressed. The generic, most non-intrusive transformation I can think of would be:
Given an identifier delimited string, check which of < ( { [ has the least amount of mismatched brackets. Then convert the string literal to a bracket delimited string with all unmatched brackets concatenated in:
```
q"EOS
((["`[<< { ((["`[<<
EOS"

// only one mismatching {, so it becomes

q"{((["`[<< }" ~ "{" ~ q"{ ((["`[<<}"
```

(This is a worst case example, in practice I expect there to be not so many mismatched brackets and quotes/back ticks in a string literal)

> 3) how much less complex does the parser actually get? Is it trivial?

In dmd not so much, it would just make this function a bit smaller:
https://github.com/dlang/dmd/blob/073b6861b1d1a9859a90e25c8d7f079b54280aca/src/dmd/lexer.d#L1477

For implementations of a D lexer in lexer/parser generators (e.g. http://dinosaur.compilertools.net/lex/index.html), it means only needing context-free constructs to express everything.
December 03, 2019
On Tue, Dec 03, 2019 at 09:38:28PM +0000, Elronnd via Digitalmars-d wrote:
> On Tuesday, 3 December 2019 at 20:53:07 UTC, H. S. Teoh wrote:
[...]
> > IOW, implementators aren't competent enough to implement something up to spec, therefore we should dumb down the spec for their sake? Sounds like a backwards reason for doing something.
> 
> The easier the language is to implement, the more implementors there will be.  If there are compelling reasons to include a language feature, and it makes implementation more difficult, it should be included regardless.  But that doesn't mean that ease of implementation should be completely ignored when considering language features.

This is a valid consideration *before* the language is implemented. The current situation is:

1) Heredocs are *already* implemented, have been for a long time, and working very well, except with the wrinkle of some poor syntax highlighter implementations that fail to parse them correctly.

2) Parsing heredocs is actually not *that* hard, as proven by already (at least) two examples given in this very thread of syntax highlighting code that actually parses them correctly. We aren't talking about solving NP complete problems here, that might be considered reasonable cause for simplifying something.

It does not take a day's work to write a parser that understands heredocs, and we're debating about implementation *difficulty*? Whoa.


T

-- 
My program has no bugs! Only undocumented features...
December 03, 2019
On Tue, Dec 03, 2019 at 09:35:42PM +0000, Elronnd via Digitalmars-d wrote:
> On Tuesday, 3 December 2019 at 21:20:57 UTC, H. S. Teoh wrote:
> > The problem is that token strings require the contents to be *D tokens*.  So if I need to emit snippets of another language, I'm out of luck, and have to resort to quoted strings and Leaning Toothpick Syndrome.
> 
> Bracket-delimited string (q"[text]", allowing <>, [], (), and {} as
> delimiters) are still allowed and do not need to contain valid tokens.

They still need to nest properly, though.  Generating BF snippets, for example, wouldn't work.


T

-- 
English has the lovely word "defenestrate", meaning "to execute by throwing someone out a window", or more recently "to remove Windows from a computer and replace it with something useful". :-) -- John Cowan
December 03, 2019
On Tue, Dec 03, 2019 at 09:34:26PM +0000, Dennis via Digitalmars-d wrote:
> On Tuesday, 3 December 2019 at 18:34:22 UTC, H. S. Teoh wrote:
[...]
> D has 6 types of string literals ("double quote" `back tick` r"r
> string" q{tokens} 	q"<brackets>" q”EOS ident EOS”) with 3 encoding
> options (char, wchar, dchar).

Walter has admitted that having 3 encodings, with the corresponding 3 string types, was a "miss" in D's design, and that he should have just stuck with UTF-8. UTF-16 is occasionally useful for interfacing with Windows APIs, but that's pretty narrow and contained, and nobody uses UTF-32 strings in practice.  In practice, I've not seen many examples of non-UTF-8 strings in D code.

I admit D having 6 types of string literals is excessive, but as somebody has already said, even if something was a mistake in retrospect, doesn't necessarily mean that removing it isn't also a mistake. Because now you have the weight of existing code weighing against removing it.

And just for a bit more perspective, Python also has heredoc syntax, so does Perl, PHP, bash, and probably many others. If heredocs were really such a bad idea, why are people putting them into so many languages, over and over again? Perhaps, just perhaps, there are use cases for them that this DIP has overlooked / underrepresented?  I don't hear people clamoring for removing heredocs from Python, for example, so I'm really having a hard time understanding why we're having this debacle right now.


> For comparison, Java has one. C# has two + interpolated strings. There is a DIP for adding interpolated strings to D.

That DIP seems dead in the water though. The author has vanished and nobody has taken up the reins.


> People are mentioning how D keeps adding adding features and is on a road towards C++ complexity. There is precedent for removing barely used features (see e.g. octal, escape or hexstring literals  on https://dlang.org/deprecate.html).

Actually, I was a bit disappointed with the removal of hexstring literals, but the issue is somewhat more complex. The problem with hexstring literals was that it was some kind of half-hearted attempt at supporting literal hexadecimal data, because it coerces the result into string rather than ubyte[]. The hexstring *syntax* was ideal for entering hex data, but then having the result coerced into string seemed to me like a backwards misfit. If it had produced a ubyte[] then there would have been much more reason to keep it in the language, since occasionally it's very useful to be able to enter blocks of binary data in hex.  As to why the original design produced a string rather than a ubyte[], I can only speculate. Perhaps it was meant as a poor man's way of writing a Unicode string without a Unicode-aware keyboard / input method?  Who knows.  In any case, *that* use case is rendered completely moot by the \u.... and \U........ escape sequences in your regular double-quoted string.  The ubyte[] use case is arguably implementable in a CTFE parser the same way octal literals can, and so hexstrings went the way of the dodo.


> And of course there are always users that remorse the removal of their favorite feature, but in the long run everyone benefits from a simpler language.
> 
> As for your use case of code generation, I'm having trouble relating to it.  I happened to write some code generation algorithms myself recently, and could do fine with q{} strings for large templates and regular "" or `` string for small token parts like "switch(".

q{} works well for emitting *D code*.  Not so well for non-D code.


> - Do you truly have 50,000 character string literals in your code base?

No, but I do have a number of large multi-line string literals that simply look best / are most maintainable in heredoc format.


> - Can't you use bracket delimited strings instead, q"<like this?>"

Heredoc syntax is better because the ending delimiter is obvious. When the string literal spans multiple lines, single-character terminating delimiters just aren't the best way to do it.


> - If accidental early termination in huge string literals is a concern, even an identifier-delimited string isn't always safe. Can't you use an `import()` statement on an external text file?

Identifier-delimited string is safe because the literal is typed in directly as code, so you already know beforehand what words might appear or not appear in it, and you already know what will *never* appear in the string.  It isn't as though I'm copy-n-pasting arbitrary text from arbitrary input files into my code just for fun.

String imports require creating an extra file to contain the string, and requires running the compiler with -J + the right path(s), all of which are extra hurdles to jump through.  It's the same thing with external unittests vs. unittest blocks that you can just write inline. It's *possible*, but inconvenient and liable to go out-of-sync as you modify the code.


> - If those 50,000 characters are code and you value readability of it, isn't it a problem that there is no syntax highlighting in a q"EOS EOS" string?

As I said, I don't use a syntax highlighter.  Also, any attempt to highlight is moot if the string contains code of a different language (see below for my use cases).


> - Can you maybe post an example of some of your q"EOS EOS" strings used for code generation?

I feel a single example will not adequately convey my point. Here's a list of use cases I use heredocs for (in no particular order):

1) Generating HTML snippets
2) Generating PovRay scene description snippets
3) Generating D code snippets
4) Generating snippets of a DSL I use for generating geometric models
5) Generating boilerplate for input data to an external convex hull
   solver (has its own peculiar syntax)
6) Generating GLSL shader code snippets
7) Generating Java code snippets
8) Command line usage descriptions

Some of this code is somewhat old but is actively used as infrastructure for my current projects, and having to go back to rewrite heredocs just because of some ivory tower ideal of "cleaning up useless literals in D" is rather distasteful to me, you understand, esp. since I don't even use syntax highlighting in the first place, so this is just pure work for zero benefit.  If we were still in the early stages of D development, then sure, go ahead and nuke heredocs if you have very good reasons for it, but I'm not about to go rewriting code for (1) to (8) now, not when there's basically zero benefit in doing so.


> > As for poor syntax highlighting as mentioned in the DIP, how is that even a problem with the language?! It's a strawman argument based on skewed data obtained from badly-written lexers that don't actually lex D code correctly. It should be the syntax highlighter that should be fixed, rather than deprecate an actually useful feature in the language.
> 
> The thing is, these string literals simply can't be expressed in e.g. a PEG grammar.

?!  Can't you just use a custom lexer with your PEG grammar?


> The D's grammar is one complexity class higher than needed just for this one relatively obscure string literal. Sure you can say "not our problem, those tooling authors just need to account for D's complexity", but I don't think that is useful for D's tooling ecosystem.
[...]

Then isn't the solution simply to write a self-contained heredoc parsing function, put it in a dub package, and let everyone reuse it? Then nobody will have to write it for themselves again. Problem solved.

(If it's even that complex to begin with. As I said, we already have 2 working examples of syntax highlighter code that work fine with heredocs. It's not as though D invented heredocs; they have been around since the early days of the Unix shell, and people have been writing parsing code for it for a long time. Its supposed "complexity" is really blown out of proportion here.)

This whole debacle feels like heredocs are being singled out as a scapegoat in a misguided quest to "simplify the language".  Like we're grasping at straws because we're unable to tackle the bigger issues, so here's a convenient simple target we can shoot and kill and feel good about ourselves that we're finally making progress.  Talking about straining out the gnat and swallowing the camel.


T

-- 
"I'm not childish; I'm just in touch with the child within!" - RL
December 04, 2019
On Wednesday, 4 December 2019 at 01:26:24 UTC, H. S. Teoh wrote:
> And just for a bit more perspective, Python also has heredoc syntax, so does Perl, PHP, bash, and probably many others.

Python actually doesn't have HERE docs. When it's included in
lists of "languages with HERE docs", it's just to show what a
Python programmer would use in their stead.

Please accept Ruby as a replacement example.
December 04, 2019
On Wednesday, 4 December 2019 at 01:26:24 UTC, H. S. Teoh wrote:
> UTF-16 is  <snip>

VERY useful and helps make D on Windows feel first class, so it is easy to do things right.

utf-32 doesn't matter, but "string"w is very, very nice for working with Windows, .net, java, etc. easily, efficiently, and correctly.


> That DIP seems dead in the water though. The author has vanished and nobody has taken up the reins.

The string interpolation thing is cool, I wrote up my proposal, I'm just not likely to bother with the burden of DIP bureaucracy. Even javascript has some stuff that beats us now.

> As I said, I don't use a syntax highlighter.  Also, any attempt to highlight is moot if the string contains code of a different language (see below for my use cases).

And I use the heredoc strings BECAUSE of how well they can be highlighted - again my vim happens to treat q"html and q"sql and q"css and others specially knowing they are embedded.

I could do that with something like css!" " too - a template instead and the type information could even be improved but still the heredoc is kinda cool for syntax highlighting.




BTW if heredoc strings were to be removed.... tbh I can live with it. It bugs me that they must end at the beginning of a line. I wish it would let you indent it. Seriously bugs me and is a reason why I don't use them more.

but still since they are there i use them.