Jump to page: 1 2
Thread overview
[BUG] dmd does not implement LR analysis
Mar 13, 2004
Manfred Nowak
Mar 13, 2004
Walter
Mar 14, 2004
Manfred Nowak
Mar 16, 2004
Stewart Gordon
Mar 16, 2004
Matthew
Mar 17, 2004
J C Calvarese
Mar 17, 2004
Matthew
Mar 17, 2004
Manfred Nowak
Mar 17, 2004
Stewart Gordon
Mar 17, 2004
Manfred Nowak
Mar 14, 2004
Ben Hinkle
Mar 14, 2004
C. Sauls
Mar 15, 2004
Stewart Gordon
Mar 15, 2004
Manfred Nowak
Mar 15, 2004
larry cowan
March 13, 2004
Also not explicitely specified the usual left-to-right lexical analysis and parsing of the grammar of D is currently not implemented in dmd.

Currently `2.' and `.4' are legal real numbers. Therefore the look alike range `[cast(int)2..4]' is not a range but should be analysed as two consecutive real numbers, as if it is written as `[cast(int)2. .4]', and therefore should yield something like:

| found '0.4' when expecting ']'

In the lexical analysis phase of dmd there has been done some trickery to prevent this, i.e. looking ahead and backing up.

On the other hand this trickery prevents now, that the legal range
expression `[cast(int)2...4]' which could be written as `[cast(int)2. ..
4]' is not correctly identified by dmd. dmd yields:

| found '...' when expecting ']'

So long.



March 13, 2004
"Manfred Nowak" <svv1999@hotmail.com> wrote in message news:c2uekl$1995$1@digitaldaemon.com...
> Also not explicitely specified the usual left-to-right lexical analysis and parsing of the grammar of D is currently not implemented in dmd.
>
> Currently `2.' and `.4' are legal real numbers. Therefore the look alike range `[cast(int)2..4]' is not a range but should be analysed as two consecutive real numbers, as if it is written as `[cast(int)2. .4]', and therefore should yield something like:
>
> | found '0.4' when expecting ']'
>
> In the lexical analysis phase of dmd there has been done some trickery to prevent this, i.e. looking ahead and backing up.
>
> On the other hand this trickery prevents now, that the legal range
> expression `[cast(int)2...4]' which could be written as `[cast(int)2. ..
> 4]' is not correctly identified by dmd. dmd yields:
>
> | found '...' when expecting ']'
>
> So long.

... is a valid token. You'll need to put the space after the first . to get the meaning you wish. True, the lexer does a bit of lookahead, but why not?


March 14, 2004
Walter wrote:

> ... is a valid token. You'll need to put the space after the first . to get the meaning you wish.

I am not talking about meanings I wish. I noticed this departure from the norm, because the public available syntax highlighting extension for D for vim exposed me `[2..4]' as two consecutive reals, thereby pointing me out, that my own syntax highlighting extension is wrong because I thought, that it is illegal to have an empty integer or fractional part in a real.

Then: following the usual left-to-right-analysis it is correct to analyze the construct in question as two consecutive reals and furthermore there is no way to build an LR-highlighter that is able to highlight the construct in question as two integer numbers divided by the range operator `..'.

Even the `d2html' example highlights the construct in question as the real `2.', followed by a `.', followed by the integer `4'.

I do not believe that any syntax highlighter currently out there is able to highlight the construct in question correctly.


> True, the lexer does a bit of lookahead, but why not?

That depends on what DigitalMars has in mind with the language D and the de facto reference compiler dmd.

If the intention of DigitalMars is to tempt a certain amount of computer nerds to the language D by promising an open standard and at the same time bind them to a proprietary implementation not fully consistent with the proposed standard and its somehow natural interpretation, then it is quite okay to make even more departures than the two I have detected:

- the one which is the matter of this thread, and
- the `cast' operator beeing optional in dmd.

If the intention of DigitalMars is to keep the language D and the de facto
reference compiler dmd in a homogeneous state, then the existence of
both exposed deviations is not okay.

There might be more intentions of DigitalMars, which I am unable to recognize.

So long.

March 14, 2004
On Sat, 13 Mar 2004 14:28:35 -0800, "Walter" <walter@digitalmars.com> wrote:

>
>"Manfred Nowak" <svv1999@hotmail.com> wrote in message news:c2uekl$1995$1@digitaldaemon.com...
>> Also not explicitely specified the usual left-to-right lexical analysis and parsing of the grammar of D is currently not implemented in dmd.
>>
>> Currently `2.' and `.4' are legal real numbers. Therefore the look alike range `[cast(int)2..4]' is not a range but should be analysed as two consecutive real numbers, as if it is written as `[cast(int)2. .4]', and therefore should yield something like:
>>
>> | found '0.4' when expecting ']'
>>
>> In the lexical analysis phase of dmd there has been done some trickery to prevent this, i.e. looking ahead and backing up.
>>
>> On the other hand this trickery prevents now, that the legal range
>> expression `[cast(int)2...4]' which could be written as `[cast(int)2. ..
>> 4]' is not correctly identified by dmd. dmd yields:
>>
>> | found '...' when expecting ']'
>>
>> So long.
>
>... is a valid token. You'll need to put the space after the first . to get the meaning you wish. True, the lexer does a bit of lookahead, but why not?

Fortran, MATLAB and Python use : for slicing instead of ..
I don't know the history of why but maybe this parsing issue factored
into it. The .. reminds me more of Pascal.

-Ben


March 14, 2004
MOO uses '..' as well, and having recently written a MOO parser/compiler/driver I can say its do-able.  Of course, MOO requires that floating-point numbers contain both integer and fraction, even if one is equal to 0 so maybe that makes all the difference.

-C. Sauls
-Invironz

Ben Hinkle wrote:
> Fortran, MATLAB and Python use : for slicing instead of ..
> I don't know the history of why but maybe this parsing issue factored
> into it. The .. reminds me more of Pascal.
March 15, 2004
Manfred Nowak wrote:
<snip>
> Currently `2.' and `.4' are legal real numbers. Therefore the look alike
> range `[cast(int)2..4]' is not a range but should be analysed as two
> consecutive real numbers, as if it is written as `[cast(int)2. .4]', and
> therefore should yield something like:
<snip>

That's news to me.  I'd imagined the tokenisation of D was supposed to be context-free.

Stewart.

-- 
My e-mail is valid but not my primary mailbox, aside from its being the unfortunate victim of intensive mail-bombing at the moment.  Please keep replies on the 'group where everyone may benefit.
March 15, 2004
Stewart Gordon wrote:

[...]
> I'd imagined the tokenisation of D was supposed to be context-free.

context free is an attribute that belongs to grammars. At your will dmd has not a context free lexical analysis, because the case "natural number followed by a point" is treated in a special way.

Lexical analysis usually is carried out by left-to-right finding the next _longest_ part of the remaining source that belongs to a token. This is called LR analysis.

I.e. `return2;' is the identifier `return2', not the keyword `return' followed by the integer number `2', followed by a `;'.

Not having an LR lexical analysis does not change the attribute context free for the grammar, also it is a convention to have LR lexical analysis with a context free grammar.

If D breaks this convention it should be explicitely mentioned in the specification.

If the non LR anaylsis stays, then the door is open for more implicite deviations from the conventions, like the one I mentioned with the `return2'.

Even the suggestion of an operator that overrides the usual LR
lexical analysis may arise. I would like `§$°@' to be supported then :-)

So long!
March 15, 2004
In article <c348v1$1o1i$1@digitaldaemon.com>, Stewart Gordon says...
>
>Manfred Nowak wrote:
><snip>
>> Currently `2.' and `.4' are legal real numbers. Therefore the look alike range `[cast(int)2..4]' is not a range but should be analysed as two consecutive real numbers, as if it is written as `[cast(int)2. .4]', and therefore should yield something like:
><snip>
>
>That's news to me.  I'd imagined the tokenisation of D was supposed to be context-free.
>
>Stewart.
>
For what it's worth, .5+4. and 4.+.5 both work as expected, equaling 4.5,
but I would rather have leading and trailing 0's required for literal
floats,doubles, and reals. -(.5-4.), 4.-.5 , 4.*-8. , 4./.2 , .1/16. , and
04*20. all look pretty strange at first glance.  I think FP literals should be
more obviously differentiated from integer literals.



March 16, 2004
Manfred Nowak wrote:
<snip>
> Even the `d2html' example highlights the construct in question as the
> real `2.', followed by a `.', followed by the integer `4'.
> 
> I do not believe that any syntax highlighter currently out there is able
> to highlight the construct in question correctly.

You're right, that syntax highlighters that are strictly LR have trouble with syntaxes that aren't strictly LR.  But see below....

>> True, the lexer does a bit of lookahead, but why not?

Depends on whether the lexicality is supposed to be strictly LR.  But I did just notice this in the spec:

"There are no digraphs or trigraphs in D. The source text is split into tokens using the maximal munch technique, i.e., the lexical analyzer tries to make the longest token it can. For example >> is a right shift token, not two greater than tokens."

But if that's exactly true, then from the way string literals are specified, surely in

	qwert("yuiop", "asdfg")

a single, 14-character string is being passed?

> That depends on what DigitalMars has in mind with the language D and the
> de facto reference compiler dmd.

I think what it should have in mind is making the spec clearer.  You're right, there's nothing suggesting that 2..4 should be 2 .. 4 and not 2. .4 or even any of the three other possibilities.

Of course it isn't difficult to write a lexer that looks ahead two or three characters.  The only trouble is that it's doing it for what's not clearly specified.

> If the intention of DigitalMars is to tempt a certain amount of computer
> nerds to the language D by promising an open standard and at the same time
> bind them to a proprietary implementation not fully consistent with the
> proposed standard and its somehow natural interpretation, then it is quite
> okay to make even more departures than the two I have detected:
> 
> - the one which is the matter of this thread, and
> - the `cast' operator beeing optional in dmd.
<snip>

You're right, that's just what I've been thinking for a while.  There does seem to be both an inconsistency and a deviation from CFG with casts.

Stewart.

-- 
My e-mail is valid but not my primary mailbox, aside from its being the unfortunate victim of intensive mail-bombing at the moment.  Please keep replies on the 'group where everyone may benefit.
March 16, 2004
> > If the intention of DigitalMars is to tempt a certain amount of computer nerds to the language D by promising an open standard and at the same
time
> > bind them to a proprietary implementation not fully consistent with the proposed standard and its somehow natural interpretation, then it is
quite
> > okay to make even more departures than the two I have detected:
> >
> > - the one which is the matter of this thread, and
> > - the `cast' operator beeing optional in dmd.
> <snip>
>
> You're right, that's just what I've been thinking for a while.  There does seem to be both an inconsistency and a deviation from CFG with casts.

I think the cast operator should be mandatory


« First   ‹ Prev
1 2