Thread overview | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
January 13, 2006 Lexer related questions | ||||
---|---|---|---|---|
| ||||
Hey, I'm using JFlex (http://jflex.de/) to implement a lexical analyser for the D language. I've already got quite alot done, but there's some issues here and there that I need to work on. Also, there's a couple things I need feedback on. For example, I can't seem to understand why it's allowed to have several succeeding _'s in a decimal/integer value. The grammer says Decimal: 0 NonZeroDigit NonZeroDigit Decimal NonZeroDigit _ Decimal which means that 0, 1, 12, 1_2 and 1_2_3 is allowed, but in my opinion, 1__2__3 is not allowed. The DMD compiler, however, accepts that value as 123. Also, the specification (http://www.digitalmars.com/d/lex.html) seems to lack information on some parts of the grammar. For example, it says Float: DecimalFloat HexFloat Float _ but it doesn't describe the grammar of DecimalFloat nor HexFloat. I'll post more questions once I find other issues. -- Using Opera's revolutionary e-mail client: http://www.opera.com/mail/ |
January 13, 2006 Re: Lexer related questions | ||||
---|---|---|---|---|
| ||||
Posted in reply to Casper Ellingsen | Casper Ellingsen wrote: > Hey, > > I'm using JFlex (http://jflex.de/) to implement a lexical analyser for the D language. I've already got quite alot done, but there's some issues here and there that I need to work on. Also, there's a couple things I need feedback on. > > For example, I can't seem to understand why it's allowed to have several succeeding _'s in a decimal/integer value. The grammer says > > Decimal: > 0 > NonZeroDigit > NonZeroDigit Decimal > NonZeroDigit _ Decimal > > which means that 0, 1, 12, 1_2 and 1_2_3 is allowed, but in my opinion, 1__2__3 is not allowed. The DMD compiler, however, accepts that value as 123. The D regexp and BNF information is woefully inaccurate in places, largely because Walter wrote DMD entirely by hand. You're best off verifying it against the written documentation: http://digitalmars.com/d/lex.html#integerliteral "Integers can have embedded '_' characters, which are ignored." > Also, the specification (http://www.digitalmars.com/d/lex.html) seems to lack information on some parts of the grammar. For example, it says > > Float: > DecimalFloat > HexFloat > Float _ > > but it doesn't describe the grammar of DecimalFloat nor HexFloat. Same thing here. Check this link: http://digitalmars.com/d/lex.html#floatliteral Though I suspect that aside from the embedded underscores, the syntax is identical to what it is in C/C++. Here's the pertinent bit of the C++ standard: floating-literal: fractional-constant exponent-part(opt) floating-suffix(opt) digit-sequence exponent-part floating-suffix(opt) fractional-constant: digit-sequence(opt) . digit-sequence digit-sequence . exponent-part: e sign(opt) digit-sequence E sign(opt) digit-sequence sign: one of + - digit-sequence: digit digit-sequence digit floating-suffix: one of f l F L |
January 13, 2006 Re: Lexer related questions | ||||
---|---|---|---|---|
| ||||
Posted in reply to Sean Kelly | On Fri, 13 Jan 2006 23:02:31 +0100, Sean Kelly <sean@f4.ca> wrote: > Though I suspect that aside from the embedded underscores, the syntax is identical to what it is in C/C++. Here's the pertinent bit of the C++ standard: > > floating-literal: > fractional-constant exponent-part(opt) floating-suffix(opt) > digit-sequence exponent-part floating-suffix(opt) > fractional-constant: > digit-sequence(opt) . digit-sequence > digit-sequence . > exponent-part: > e sign(opt) digit-sequence > E sign(opt) digit-sequence > sign: one of > + - > digit-sequence: > digit > digit-sequence digit > floating-suffix: one of > f l F L Thanks. As far as I can tell, this syntax is the same as for D, except for the floating-suffix, which has no imaginary part in C/C++. That's an easy fix though. I already added it to the jflex file, and it seems to work perfectly. Now I'll move on to hex floats. -- Using Opera's revolutionary e-mail client: http://www.opera.com/mail/ |
January 15, 2006 Re: Lexer related questions | ||||
---|---|---|---|---|
| ||||
Posted in reply to Casper Ellingsen | On Fri, 13 Jan 2006 22:38:18 +0100, Casper Ellingsen <no@reply.com> wrote: This is more a parser related question, but still, here goes: What visibility will the following function have, and why is it even legal to use more than one visibility keyword in combination like that? I mean, is it anything but confusing? public package private foo(int i) { writefln(i); } Also, how accurate is the BNF in http://www.digitalmars.com/d/declaration.html? -- Using Opera's revolutionary e-mail client: http://www.opera.com/mail/ |
January 15, 2006 Re: Lexer related questions | ||||
---|---|---|---|---|
| ||||
Posted in reply to Casper Ellingsen | Casper Ellingsen wrote: > On Fri, 13 Jan 2006 22:38:18 +0100, Casper Ellingsen <no@reply.com> wrote: > > This is more a parser related question, but still, here goes: What visibility will the following function have, and why is it even legal to use more than one visibility keyword in combination like that? I mean, is it anything but confusing? > > public package private foo(int i) { > writefln(i); > } I'd guess it would be private, and equivalent to the following: public: package: private: void foo(int i); > Also, how accurate is the BNF in http://www.digitalmars.com/d/declaration.html? It looks pretty close, at a glance. But perhaps someone who's spent more time with the D parser could offer a more informed opinion. Sean |
January 15, 2006 Re: Lexer related questions | ||||
---|---|---|---|---|
| ||||
Posted in reply to Sean Kelly | On Sun, 15 Jan 2006 06:12:13 +0100, Sean Kelly <sean@f4.ca> wrote: > Casper Ellingsen wrote: >> On Fri, 13 Jan 2006 22:38:18 +0100, Casper Ellingsen <no@reply.com> wrote: >> This is more a parser related question, but still, here goes: What visibility will the following function have, and why is it even legal to use more than one visibility keyword in combination like that? I mean, is it anything but confusing? >> public package private foo(int i) { >> writefln(i); >> } > > I'd guess it would be private, and equivalent to the following: > > public: > package: > private: > void foo(int i); Yes, that could make sense. I haven't had the time to confirm this yet though. >> Also, how accurate is the BNF in http://www.digitalmars.com/d/declaration.html? > > It looks pretty close, at a glance. But perhaps someone who's spent more time with the D parser could offer a more informed opinion. Some of it looks correct, but other parts confuse me. Like the '() Declarator' part of the Declarator rule. Can someone please provide me with an example of usage of this rule? Also, isn't the last declarator rule redundant? Declarator: BasicType2 Declarator Identifier () Declarator Identifier DeclaratorSuffixes () Declarator DeclaratorSuffixes -- Using Opera's revolutionary e-mail client: http://www.opera.com/mail/ |
January 15, 2006 Re: Lexer related questions | ||||
---|---|---|---|---|
| ||||
Posted in reply to Casper Ellingsen | Casper Ellingsen wrote:
> On Fri, 13 Jan 2006 22:38:18 +0100, Casper Ellingsen <no@reply.com> wrote:
>
> Also, how accurate is the BNF in http://www.digitalmars.com/d/declaration.html?
I don't really know.
I'm toying with a making a parser .. I couldn't use exactly the grammer that was there .. too confusing.
I tried to come up with my own description of the grammer .. it's not complete, mind you. I introduced some new rules to resolve some ambiguities (actually, work around them).
It's very experimental (and incomplete) at the moment. Use with care (if you ever use it anyway).
Note that I didn't include any keyword (i.e. int, float, etc) in the Type, because I don't lex them as keywords, but as Identifiers.
I'm not even sure how accurate it is, but here it is anyway:
Declaration:
Type Declarator ;
Type Declarator , DeclIdentifierList ;
Type Declarator Parameters ;
Type Declarator Parameters FunctionBody
Type:
IdentifierSequence
IdentifierSequence TypeSuffixes
TypeSuffixes:
TypeSuffix
TypeSuffix TypeSuffixes
TypeSuffix:
Pointer
Array
FunctionPointer
Delegate
Pointer:
*
Array:
[]
[ ExprType ]
ExprType:
AssignExpression
AssignExpression TypeSuffixes
FunctionPointer:
function Parameters
Delegate:
delegate Parameters
Declarator:
Identifier
Declarator CTypeSuffixes
Declarator = Initializer
( Declarator )
( TypeSuffixes Declarator )
CTypeSuffixes:
Array
Array CTypeSuffixes
DeclIdentifierList:
DeclIdentifier
DeclIdentifier, DeclIdentifierList
DeclIdentifier:
Identifier
Identifier = Initializer
IdentifierSequence:
IdentifierList
.IdentifierList
IdentifierSequence ! TemplateArguments
IdentifierList:
Identifier
Identifier.IdentifierList
TemplateArguments:
( TemplateArgumentList )
TemplateArgumentList:
TemplateArgument
TemplateArgument, TemplateArgumentList
TemplateArgument:
ExprType
Initializer:
void
AssignExpression
ArrayInitializer
StructInitializer
ArrayInitializer:
[ ArrayMemberInitializations ]
[ ]
ArrayMemberInitializations:
ArrayMemberInitialization
ArrayMemberInitialization ,
ArrayMemberInitialization , ArrayMemberInitializations
ArrayMemberInitialization:
AssignExpression
AssignExpression : AssignExpression
StructInitializer:
{ }
{ StructMemberInitializers }
StructMemberInitializers:
StructMemberInitializer
StructMemberInitializer ,
StructMemberInitializer , StructMemberInitializers
StructMemberInitializer:
AssignExpression
Identifier : AssignExpression
Parameters:
( )
( ParameterList )
ParameterList:
Paremeter
Parameter, ParameterList
Parameter:
Type
Type Declarator
Type Declarator = Initializer
InOut Parameter
InOut:
in
out
inout
FunctionBody:
StatementBlock
FunctionContracts body StatementBlock
FunctionContracts:
InContract
OutContract
InContract OutContract
OutContract InContract
InContract:
in StatementBlock
OutContract:
out StatementBlock
out ( Identifier ) StatementBlock
|
January 16, 2006 Re: Lexer related questions | ||||
---|---|---|---|---|
| ||||
Posted in reply to Casper Ellingsen | On Fri, 13 Jan 2006 22:38:18 +0100, Casper Ellingsen <no@reply.com> wrote: A version condition is defined in http://www.digitalmars.com/d/version.html as VersionCondition: version () Integer version () Identifier One valid version condition is version(X86) so why isn't the BNF rules defined as VersionCondition: version ( Integer ) version ( Identifier ) instead? It just seems odd to me, and really confused me for a while. -- Using Opera's revolutionary e-mail client: http://www.opera.com/mail/ |
January 16, 2006 Re: Lexer related questions | ||||
---|---|---|---|---|
| ||||
Posted in reply to Casper Ellingsen | Casper Ellingsen wrote:
> On Fri, 13 Jan 2006 22:38:18 +0100, Casper Ellingsen <no@reply.com> wrote:
>
> A version condition is defined in http://www.digitalmars.com/d/version.html as
>
> VersionCondition:
> version () Integer
> version () Identifier
>
> One valid version condition is
>
> version(X86)
>
> so why isn't the BNF rules defined as
>
> VersionCondition:
> version ( Integer )
> version ( Identifier )
>
> instead? It just seems odd to me, and really confused me for a while.
The parentheses are in the wrong place all through the docs. I think it's a ddoc problem (the docs weren't updated properly when they were converted to Ddoc).
|
January 16, 2006 Re: Lexer related questions | ||||
---|---|---|---|---|
| ||||
Posted in reply to Casper Ellingsen | There's two conflicting definitions of postfix expressions in http://www.digitalmars.com/d/expression.html. In the BNF at the top a postfix expression is defined as PostfixExpression: PrimaryExpression PostfixExpression . Identifier PostfixExpression ++ PostfixExpression -- PostfixExpression ( ) PostfixExpression ( ArgumentList ) IndexExpression SliceExpression IndexExpression: PostfixExpression [ ArgumentList ] SliceExpression: PostfixExpression [ ] PostfixExpression [ AssignExpression .. AssignExpression ] On the other hand, in the textual description further down, a postfix expression is defined as PostfixExpression: PostfixExpression . Identifier PostfixExpression -> Identifier PostfixExpression ++ PostfixExpression -- PostfixExpression ( ArgumentList ) PostfixExpression [ ArgumentList ] PostfixExpression [ AssignExpression .. AssignExpression ] The first one has PostfixExpression ( ) PostfixExpression [ ] which the second one doesn't have, whereas the second one has PostfixExpression -> Identifier which the first one doesn't have. What's the correct definition? Oh, if only the BNF grammar was correct. :/ -- Using Opera's revolutionary e-mail client: http://www.opera.com/mail/ |
Copyright © 1999-2021 by the D Language Foundation