Thread overview
October 15

According to [1] U+2028 and U+2029 are considered end-of-line characters. Does this make sense?

$ cat lsps.d
void main ()
{
   enum b = 8;
   mixin ("enum a1 =\u2028b; pragma (msg, a1);");
   mixin ("enum a2\u2028= b; pragma (msg, a2);");
   mixin ("enum\u2028a3 = b; pragma (msg, a3);");
}
$ dmd lsps.d
8
lsps.d-mixin-5(5): Error: char 0x2028 not allowed in identifier
lsps.d-mixin-6(6): Error: char 0x2028 not allowed in identifier

[1] https://dlang.org/spec/lex.html#end_of_line

October 16
Based upon how the identifier tokenization occurs greedily, yes it makes sense.
October 16
On Monday, 16 October 2023 at 02:45:17 UTC, Richard (Rikki) Andrew Cattermole wrote:
> Based upon how the identifier tokenization occurs greedily, yes it makes sense.

The error message is confusing, compare with this code, using returns:

```
$ cat ret.d
void main ()
{
   enum b = 8;
   mixin ("enum a1 =\rb; pragma (msg, a1);");
   mixin ("enum a2\r= b; pragma (msg, a2);");
   mixin ("enum\ra3 = b; pragma (msg, a3);");
}
$ dmd ret.d
8
8
8
```

Why are U+2028 and U+2029 handled unlike \r and \n?
October 17
https://github.com/dlang/dmd/blob/master/compiler/src/dmd/lexer.d#L578

Basically its in multi-byte UTF-8 character, checks if its in the non-ASCII character ranges. No special handling of new lines is provided, but probably should be.
October 17
On 10/16/2023 5:37 PM, Richard (Rikki) Andrew Cattermole wrote:
> https://github.com/dlang/dmd/blob/master/compiler/src/dmd/lexer.d#L578
> 
> Basically its in multi-byte UTF-8 character, checks if its in the non-ASCII character ranges. No special handling of new lines is provided, but probably should be.

Yes, please file a bugzilla!
October 17
On Tuesday, 17 October 2023 at 00:37:41 UTC, Richard (Rikki) Andrew Cattermole wrote:
> https://github.com/dlang/dmd/blob/master/compiler/src/dmd/lexer.d#L578
>
> Basically its in multi-byte UTF-8 character, checks if its in the non-ASCII character ranges. No special handling of new lines is provided, but probably should be.

I've noticed that in the past, but this is clearly wrong. It's not just whitespace, it's also punctuation, emoji, a ton of stuff that are just not identifiers.

The lexer should match the proper charset as a character start.
October 18
https://issues.dlang.org/show_bug.cgi?id=24190
October 17
On 10/17/2023 4:18 PM, deadalnix wrote:
> this is clearly wrong.

I blame my parents.