October 18, 2023
https://issues.dlang.org/show_bug.cgi?id=24190

          Issue ID: 24190
           Summary: Identifier tokenizer is greedy steals new line
                    characters
           Product: D
           Version: D2
          Hardware: All
                OS: All
            Status: NEW
          Severity: enhancement
          Priority: P1
         Component: dmd
          Assignee: nobody@puremagic.com
          Reporter: alphaglosined@gmail.com

Currently, the tokenizer for identifiers is quite greedy. It'll steal the non-ASCII character for new lines when it should probably defer to the outer loop to error.

```d
$ cat lsps.d
void main ()
{
    enum b = 8;
    mixin ("enum a1 =\u2028b; pragma (msg, a1);");
    mixin ("enum a2\u2028= b; pragma (msg, a2);");
    mixin ("enum\u2028a3 = b; pragma (msg, a3);");
}
$ dmd lsps.d
8
lsps.d-mixin-5(5): Error: char 0x2028 not allowed in identifier
lsps.d-mixin-6(6): Error: char 0x2028 not allowed in identifier
```

That character 0x2028 is a valid new line character.

--