Jump to page: 1 2
Thread overview
Attributes (lexical)
Nov 25, 2021
rumbu
Nov 25, 2021
Elronnd
Nov 25, 2021
Dennis
Nov 25, 2021
Rumbu
Nov 25, 2021
rumbu
Nov 25, 2021
Dennis
Nov 25, 2021
Dennis
Nov 25, 2021
zjh
November 25, 2021

Just playing around with attributes.

This is valid D code:


@


nogc: //yes, this is @nogc in fact, even some lines are between


@

/* i can put some comments
*/

/** even some documentation
*/

// single line comments also

(12)

// yes, comments and newlines are allowed between attribute and declaration


int x; //@(12) is attached to declaration

Is that ok or it's a lexer bug?

Also, this works also for #line, even if the specification tells us that all tokens must be on the same line


#

//this works

line

/* this too */

12

//this is #line 12


November 25, 2021

On Thursday, 25 November 2021 at 08:06:27 UTC, rumbu wrote:

>

Is that ok or it's a lexer bug?

Yes. The lexer just eats whitespace and the parser accepts way too much.

November 25, 2021
On Thursday, 25 November 2021 at 08:06:27 UTC, rumbu wrote:
> Is that ok or it's a lexer bug?

@ (12) does exactly what I would expect.  @nogc I always assumed was a single token, but the spec says otherwise.  I suppose that makes sense.

#line is dicier as it is not part of the grammar proper; however the spec describes it as a 'special token sequence', and comments are not tokens, so I think the current behaviour is correct.
November 25, 2021

On Thursday, 25 November 2021 at 08:06:27 UTC, rumbu wrote:

>

Also, this works also for #line, even if the specification tells us that all tokens must be on the same line

Where does it say that?

November 25, 2021

On Thursday, 25 November 2021 at 10:10:25 UTC, Dennis wrote:

>

On Thursday, 25 November 2021 at 08:06:27 UTC, rumbu wrote:

>

Also, this works also for #line, even if the specification tells us that all tokens must be on the same line

Where does it say that?

Well:

#line IntegerLiteral Filespec? EndOfLine

Having EndOfLine at the end means for me that there are no other EOLs between, otherwise this syntax should pass but it's not (DMD last):

#line 12
"source.d"

I am not asking this questions out of thin air, I am trying to write a conforming lexer and this is one of the ambiguities.

November 25, 2021

On Thursday, 25 November 2021 at 10:41:05 UTC, Rumbu wrote:

>

I am not asking this questions out of thin air, I am trying to write a conforming lexer and this is one of the ambiguities.

I think it is easier to just look at the lexer in the dmd source. The D language does not really have a proper spec, it is more like an effort to document the implementation.

November 25, 2021

On Thursday, 25 November 2021 at 10:41:05 UTC, Rumbu wrote:

>

Well:

#line IntegerLiteral Filespec? EndOfLine

Having EndOfLine at the end means for me that there are no other EOLs between, otherwise this syntax should pass but it's not (DMD last):

#line 12
"source.d"

The lexical grammar section starts with:

>

The source text is decoded from its source representation into Unicode Characters. The Characters are further divided into: WhiteSpace, EndOfLine, Comments, SpecialTokenSequences, and Tokens, with the source terminated by an EndOfFile.

What it's failing to mention is how in the lexical grammar rules, spaces denote 'immediate concatenation' of the characters/rules before and after it, e.g.:

DecimalDigits:
    DecimalDigit
    DecimalDigit DecimalDigits

3 1 4 is not a single IntegerLiteral, it needs to be 314.

Now in the parsing grammar, it should mention that spaces denote immediate concatenation of Tokens, with arbitrary Comments and WhiteSpace inbetween. So the rule:

AtAttribute:
    @ nogc

Means: an @ token, followed by arbitrary comments and whitespace, followed by an identifier token that equals "nogc". That explains your first example.

Regarding this lexical rule:

#line IntegerLiteral Filespec? EndOfLine

This is wrong already from a lexical standpoint, it would suggest a SpecialTokenSequence looks like this:

#line10"file"

The implementation actually looks for a # token, skips WhiteSpace and Comments, looks for an identifier token ("line"), and then it goes into a custom loop that allows separation by WhiteSpace but not Comment, and also the first '\n' will be assumed to be the final EndOfLine, which is why this fails:

#line 12
"source.d"

It thinks it's done after "12".

In conclusion the specification should:

  • define the notation used in lexical / parsing grammar blocks
  • clearly distinguish lexical / parsing blocks
  • fix up the SpecialTokenSequence definition (and maybe change dmd as well)

By the way, the parsing grammar defines:

LinkageType:
    C
    C++
    D
    Windows
    System
    Objective-C

C++ and Objective-C cannot be single tokens currently, so they are actually 2/3, which is why these are allowed:

extern(C
       ++)
void f() {}

extern(Objective
       -
       C)
void g() {}

This should also be fixed in the spec.

>

I am not asking this questions out of thin air, I am trying to write a conforming lexer and this is one of the ambiguities.

That's cool! Are you writing an editor plugin?

November 25, 2021

On Thursday, 25 November 2021 at 08:06:27 UTC, rumbu wrote:

>

//this works

line

I hate #.

November 25, 2021

On Thursday, 25 November 2021 at 11:25:49 UTC, Ola Fosheim Grøstad wrote:

>

On Thursday, 25 November 2021 at 10:41:05 UTC, Rumbu wrote:

>

I am not asking this questions out of thin air, I am trying to write a conforming lexer and this is one of the ambiguities.

I think it is easier to just look at the lexer in the dmd source. The D language does not really have a proper spec, it is more like an effort to document the implementation.

I try to base my reasoning on specification, dmd is not always a good source of information, the lexer is polluted by old features or right now by the ImportC feature, trying to lex D an C in the same time.

DMD skips the new line if the file was not specified, that's why the "filename" is unexpected on a new line:
https://github.com/dlang/dmd/blob/d374003a572fe0c64da4aa4dcc55d894c648514b/src/dmd/lexer.d#L2838

libdparse completely ignores the contents after #line skipping everything until EOL, even a EOF/NUL marker which should end the lexing:
https://github.com/dlang-community/libdparse/blob/7112880dae3f25553d96dae53a445c16261de7f9/src/dparse/lexer.d#L1100

November 25, 2021

On Thursday, 25 November 2021 at 12:09:55 UTC, Dennis wrote:

>

This should also be fixed in the spec.

Filed as:

Issue 22543 - [spec] grammar blocks use unspecified notation:
https://issues.dlang.org/show_bug.cgi?id=22543

Issue 22544 - [spec] C++ and Objective-C are not single tokens
https://issues.dlang.org/show_bug.cgi?id=22544

« First   ‹ Prev
1 2