Lexer related questions - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Lexer related questions

Thread overview

Lexer related questions
Jan 13, 2006 Casper Ellingsen
Jan 13, 2006 Sean Kelly
Jan 13, 2006 Casper Ellingsen
Jan 15, 2006 Casper Ellingsen
Jan 15, 2006 Sean Kelly
Jan 15, 2006 Casper Ellingsen
Jan 15, 2006 Hasan Aljudy
Jan 16, 2006 Casper Ellingsen
Jan 16, 2006 Don Clugston
Jan 18, 2006 Bruno Medeiros
Jan 18, 2006 Don Clugston
Jan 16, 2006 Casper Ellingsen
Jan 16, 2006 Hasan Aljudy

January 13, 2006

Lexer related questions

Posted by Casper Ellingsen

Casper Ellingsen

Hey,

I'm using JFlex (http://jflex.de/) to implement a lexical analyser for the D language. I've already got quite alot done, but there's some issues here and there that I need to work on. Also, there's a couple things I need feedback on.

For example, I can't seem to understand why it's allowed to have several succeeding _'s in a decimal/integer value. The grammer says

Decimal:
	0
	NonZeroDigit
	NonZeroDigit Decimal
	NonZeroDigit _ Decimal

which means that 0, 1, 12, 1_2 and 1_2_3 is allowed, but in my opinion, 1__2__3 is not allowed. The DMD compiler, however, accepts that value as 123.

Also, the specification (http://www.digitalmars.com/d/lex.html) seems to lack information on some parts of the grammar. For example, it says

Float:
	DecimalFloat
	HexFloat
	Float _

but it doesn't describe the grammar of DecimalFloat nor HexFloat.

I'll post more questions once I find other issues.
-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/

January 13, 2006

Re: Lexer related questions

Posted by Sean Kelly
in reply to Casper Ellingsen

Sean Kelly

Posted in reply to Casper Ellingsen

Casper Ellingsen wrote:
> Hey,
> 
> I'm using JFlex (http://jflex.de/) to implement a lexical analyser for the D language. I've already got quite alot done, but there's some issues here and there that I need to work on. Also, there's a couple things I need feedback on.
> 
> For example, I can't seem to understand why it's allowed to have several succeeding _'s in a decimal/integer value. The grammer says
> 
> Decimal:
>     0
>     NonZeroDigit
>     NonZeroDigit Decimal
>     NonZeroDigit _ Decimal
> 
> which means that 0, 1, 12, 1_2 and 1_2_3 is allowed, but in my opinion, 1__2__3 is not allowed. The DMD compiler, however, accepts that value as 123.

The D regexp and BNF information is woefully inaccurate in places, largely because Walter wrote DMD entirely by hand.  You're best off verifying it against the written documentation:

http://digitalmars.com/d/lex.html#integerliteral

"Integers can have embedded '_' characters, which are ignored."

> Also, the specification (http://www.digitalmars.com/d/lex.html) seems to lack information on some parts of the grammar. For example, it says
> 
> Float:
>     DecimalFloat
>     HexFloat
>     Float _
> 
> but it doesn't describe the grammar of DecimalFloat nor HexFloat.

Same thing here.  Check this link:

http://digitalmars.com/d/lex.html#floatliteral

Though I suspect that aside from the embedded underscores, the syntax is identical to what it is in C/C++.  Here's the pertinent bit of the C++ standard:

floating-literal:
	fractional-constant exponent-part(opt) floating-suffix(opt)
	digit-sequence exponent-part floating-suffix(opt)
fractional-constant:
	digit-sequence(opt) . digit-sequence
	digit-sequence .
exponent-part:
	e sign(opt) digit-sequence
	E sign(opt) digit-sequence
sign: one of
	+ -
digit-sequence:
	digit
	digit-sequence digit
floating-suffix: one of
	f l F L

January 13, 2006

Re: Lexer related questions

Posted by Casper Ellingsen
in reply to Sean Kelly

Casper Ellingsen

Posted in reply to Sean Kelly

On Fri, 13 Jan 2006 23:02:31 +0100, Sean Kelly <sean@f4.ca> wrote:

> Though I suspect that aside from the embedded underscores, the syntax is identical to what it is in C/C++.  Here's the pertinent bit of the C++ standard:
>
> floating-literal:
> 	fractional-constant exponent-part(opt) floating-suffix(opt)
> 	digit-sequence exponent-part floating-suffix(opt)
> fractional-constant:
> 	digit-sequence(opt) . digit-sequence
> 	digit-sequence .
> exponent-part:
> 	e sign(opt) digit-sequence
> 	E sign(opt) digit-sequence
> sign: one of
> 	+ -
> digit-sequence:
> 	digit
> 	digit-sequence digit
> floating-suffix: one of
> 	f l F L

Thanks. As far as I can tell, this syntax is the same as for D, except for the floating-suffix, which has no imaginary part in C/C++. That's an easy fix though. I already added it to the jflex file, and it seems to work perfectly. Now I'll move on to hex floats.
-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/

January 15, 2006

Re: Lexer related questions

Posted by Casper Ellingsen
in reply to Casper Ellingsen

Casper Ellingsen

Posted in reply to Casper Ellingsen

On Fri, 13 Jan 2006 22:38:18 +0100, Casper Ellingsen <no@reply.com> wrote:

This is more a parser related question, but still, here goes: What visibility will the following function have, and why is it even legal to use more than one visibility keyword in combination like that? I mean, is it anything but confusing?

public package private foo(int i) {
	writefln(i);
}

Also, how accurate is the BNF in http://www.digitalmars.com/d/declaration.html?
-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/

January 15, 2006

Re: Lexer related questions

Posted by Sean Kelly
in reply to Casper Ellingsen

Sean Kelly

Posted in reply to Casper Ellingsen

Casper Ellingsen wrote:
> On Fri, 13 Jan 2006 22:38:18 +0100, Casper Ellingsen <no@reply.com> wrote:
> 
> This is more a parser related question, but still, here goes: What visibility will the following function have, and why is it even legal to use more than one visibility keyword in combination like that? I mean, is it anything but confusing?
> 
> public package private foo(int i) {
>     writefln(i);
> }

I'd guess it would be private, and equivalent to the following:

public:
package:
private:
    void foo(int i);

> Also, how accurate is the BNF in http://www.digitalmars.com/d/declaration.html?

It looks pretty close, at a glance.  But perhaps someone who's spent more time with the D parser could offer a more informed opinion.


Sean

January 15, 2006

Re: Lexer related questions

Posted by Casper Ellingsen
in reply to Sean Kelly

Casper Ellingsen

Posted in reply to Sean Kelly

On Sun, 15 Jan 2006 06:12:13 +0100, Sean Kelly <sean@f4.ca> wrote:

> Casper Ellingsen wrote:
>> On Fri, 13 Jan 2006 22:38:18 +0100, Casper Ellingsen <no@reply.com> wrote:
>>  This is more a parser related question, but still, here goes: What visibility will the following function have, and why is it even legal to use more than one visibility keyword in combination like that? I mean, is it anything but confusing?
>>  public package private foo(int i) {
>>     writefln(i);
>> }
>
> I'd guess it would be private, and equivalent to the following:
>
> public:
> package:
> private:
>      void foo(int i);
Yes, that could make sense. I haven't had the time to confirm this yet though.

>> Also, how accurate is the BNF in http://www.digitalmars.com/d/declaration.html?
>
> It looks pretty close, at a glance.  But perhaps someone who's spent more time with the D parser could offer a more informed opinion.
Some of it looks correct, but other parts confuse me. Like the '() Declarator' part of the Declarator rule. Can someone please provide me with an example of usage of this rule? Also, isn't the last declarator rule redundant?

Declarator:
        BasicType2 Declarator
        Identifier
        () Declarator
        Identifier DeclaratorSuffixes
        () Declarator  DeclaratorSuffixes
-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/

January 15, 2006

Re: Lexer related questions

Posted by Hasan Aljudy
in reply to Casper Ellingsen

Hasan Aljudy

Posted in reply to Casper Ellingsen

Casper Ellingsen wrote:
> On Fri, 13 Jan 2006 22:38:18 +0100, Casper Ellingsen <no@reply.com> wrote:
> 
> Also, how accurate is the BNF in  http://www.digitalmars.com/d/declaration.html?

I don't really know.
I'm toying with a making a parser .. I couldn't use exactly the grammer that was there .. too confusing.
I tried to come up with my own description of the grammer .. it's not complete, mind you. I introduced some new rules to resolve some ambiguities (actually, work around them).

It's very experimental (and incomplete) at the moment. Use with care (if you ever use it anyway).
Note that I didn't include any keyword (i.e. int, float, etc) in the Type, because I don't lex them as keywords, but as Identifiers.

I'm not even sure how accurate it is, but here it is anyway:
	
Declaration:
	Type Declarator ;
	Type Declarator , DeclIdentifierList ;
	Type Declarator Parameters ;
	Type Declarator Parameters FunctionBody
		
Type:
	IdentifierSequence
	IdentifierSequence TypeSuffixes
	
TypeSuffixes:
	TypeSuffix
	TypeSuffix TypeSuffixes
	
TypeSuffix:
	Pointer
	Array
	FunctionPointer
	Delegate	

Pointer:
	*
	
Array:
	[]
	[ ExprType ]
	
ExprType:	
	AssignExpression
	AssignExpression TypeSuffixes
	
FunctionPointer:
	function Parameters

Delegate:
	delegate Parameters
	
Declarator:
	Identifier
	Declarator CTypeSuffixes
	Declarator = Initializer
	( Declarator )
	( TypeSuffixes Declarator )
	
CTypeSuffixes:
	Array
	Array CTypeSuffixes	
	
DeclIdentifierList:
	DeclIdentifier
	DeclIdentifier, DeclIdentifierList
	
DeclIdentifier:
	Identifier
	Identifier = Initializer
	
IdentifierSequence:
	IdentifierList
	.IdentifierList
	IdentifierSequence ! TemplateArguments
		
IdentifierList:
	Identifier
	Identifier.IdentifierList
	
TemplateArguments:
	( TemplateArgumentList )

TemplateArgumentList:
	TemplateArgument
	TemplateArgument, TemplateArgumentList
	
TemplateArgument:
	ExprType
	
Initializer:
	void
	AssignExpression
	ArrayInitializer
	StructInitializer
		
ArrayInitializer:
	[ ArrayMemberInitializations ]
	[ ]		

ArrayMemberInitializations:
	ArrayMemberInitialization
	ArrayMemberInitialization ,
	ArrayMemberInitialization , ArrayMemberInitializations

ArrayMemberInitialization:
	AssignExpression
	AssignExpression : AssignExpression
	
StructInitializer:
	{  }
	{ StructMemberInitializers }

StructMemberInitializers:
	StructMemberInitializer
	StructMemberInitializer ,
	StructMemberInitializer , StructMemberInitializers

StructMemberInitializer:
	AssignExpression
	Identifier : AssignExpression	
		
Parameters:
	( )
	( ParameterList )
	
ParameterList:
	Paremeter
	Parameter, ParameterList
	
Parameter:
	Type
	Type Declarator
	Type Declarator = Initializer
	InOut Parameter
	
InOut:
	in
	out
	inout
	
FunctionBody:
	StatementBlock
	FunctionContracts body StatementBlock

FunctionContracts:
	InContract
	OutContract
	InContract OutContract
	OutContract InContract

InContract:
	in StatementBlock

OutContract:
	out StatementBlock
	out ( Identifier ) StatementBlock

January 16, 2006

Re: Lexer related questions

Posted by Casper Ellingsen
in reply to Casper Ellingsen

Casper Ellingsen

Posted in reply to Casper Ellingsen

On Fri, 13 Jan 2006 22:38:18 +0100, Casper Ellingsen <no@reply.com> wrote:

A version condition is defined in http://www.digitalmars.com/d/version.html as

	VersionCondition:
		version () Integer
		version () Identifier

One valid version condition is

	version(X86)

so why isn't the BNF rules defined as

	VersionCondition:
		version ( Integer )
		version ( Identifier )

instead? It just seems odd to me, and really confused me for a while.
-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/

January 16, 2006

Re: Lexer related questions

Posted by Don Clugston
in reply to Casper Ellingsen

Don Clugston

Posted in reply to Casper Ellingsen

Casper Ellingsen wrote:
> On Fri, 13 Jan 2006 22:38:18 +0100, Casper Ellingsen <no@reply.com> wrote:
> 
> A version condition is defined in  http://www.digitalmars.com/d/version.html as
> 
>     VersionCondition:
>         version () Integer
>         version () Identifier
> 
> One valid version condition is
> 
>     version(X86)
> 
> so why isn't the BNF rules defined as
> 
>     VersionCondition:
>         version ( Integer )
>         version ( Identifier )
> 
> instead? It just seems odd to me, and really confused me for a while.

The parentheses are in the wrong place all through the docs. I think it's a ddoc problem (the docs weren't updated properly when they were converted to Ddoc).

January 16, 2006

Re: Lexer related questions

Posted by Casper Ellingsen
in reply to Casper Ellingsen

Casper Ellingsen

Posted in reply to Casper Ellingsen

There's two conflicting definitions of postfix expressions in http://www.digitalmars.com/d/expression.html. In the BNF at the top a postfix expression is defined as

	PostfixExpression:
		PrimaryExpression
		PostfixExpression . Identifier
		PostfixExpression ++
		PostfixExpression --
		PostfixExpression ( )
		PostfixExpression ( ArgumentList )
		IndexExpression
		SliceExpression

	IndexExpression:
		PostfixExpression [ ArgumentList ]

	SliceExpression:
		PostfixExpression [ ]
		PostfixExpression [ AssignExpression .. AssignExpression ]

On the other hand, in the textual description further down, a postfix expression is defined as

	PostfixExpression:
		PostfixExpression . Identifier
		PostfixExpression -> Identifier
		PostfixExpression ++
		PostfixExpression --
		PostfixExpression ( ArgumentList )
		PostfixExpression [ ArgumentList ]
		PostfixExpression [ AssignExpression .. AssignExpression ]

The first one has

		PostfixExpression ( )
		PostfixExpression [ ]

which the second one doesn't have, whereas the second one has

		PostfixExpression -> Identifier

which the first one doesn't have. What's the correct definition? Oh, if only the BNF grammar was correct. :/
-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation