[Issue 1466] New: Spec claims maximal munch technique always works: not for "1..3" - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » Issues » [Issue 1466] New: Spec claims maximal munch technique always works: not for "1..3"

Thread overview

[Issue 1466] New: Spec claims maximal munch technique always works: not for "1..3"
Sep 01, 2007 d-bugmail
Sep 01, 2007 BCS
Sep 01, 2007 Jascha Wetzel
Sep 02, 2007 BCS
Sep 02, 2007 Chris Nicholson-Sauls
Sep 02, 2007 BCS
Sep 03, 2007 Jascha Wetzel
Sep 03, 2007 BCS
Sep 03, 2007 d-bugmail
Sep 09, 2007 d-bugmail
Sep 09, 2007 Jascha Wetzel
Sep 09, 2007 BCS
Sep 09, 2007 Jascha Wetzel
Sep 09, 2007 Jascha Wetzel
Sep 10, 2007 Matti Niemenmaa
Sep 10, 2007 Jascha Wetzel
Sep 10, 2007 Matti Niemenmaa
Nov 10, 2010 Walter Bright

September 01, 2007

[Issue 1466] New: Spec claims maximal munch technique always works: not for "1..3"

Posted by d-bugmail

d-bugmail

http://d.puremagic.com/issues/show_bug.cgi?id=1466

           Summary: Spec claims maximal munch technique always works: not
                    for "1..3"
           Product: D
           Version: 1.020
          Platform: All
               URL: http://digitalmars.com/d/1.0/lex.html
        OS/Version: All
            Status: NEW
          Keywords: spec
          Severity: minor
          Priority: P3
         Component: www.digitalmars.com
        AssignedTo: bugzilla@digitalmars.com
        ReportedBy: deewiant@gmail.com


A snippet from http://digitalmars.com/d/1.0/lex.html:

"The source text is split into tokens using the maximal munch technique, i.e., the lexical analyzer tries to make the longest token it can."

Relevant parts of the grammar:

Token:
        FloatLiteral
        ..

FloatLiteral:
        Float

Float:
        DecimalFloat

DecimalFloat:
        DecimalDigits .
        . Decimal

DecimalDigits:
        DecimalDigit

DecimalDigit:
        NonZeroDigit

Decimal:
        NonZeroDigit

Based on the above, if a lexer encounters "1..3", for instance in a slice: "foo[1..3]", it should, using the maximal munch technique, make the longest possible token from "1..3": this is the Float "1.". Next, it should come up with the Float ".3".

Of course, this isn't currently happening, and would be problematic if it did. But, according to the grammar, that's what should happen, unless I'm missing something.

Either some exception needs to be made or remove the "DecimalDigits ." possibility from the grammar and the compiler.


--

September 01, 2007

Re: [Issue 1466] New: Spec claims maximal munch technique always works: not for "1..3"

Posted by BCS
in reply to d-bugmail

BCS

Posted in reply to d-bugmail

Reply to d-bugmail@puremagic.com,

> http://d.puremagic.com/issues/show_bug.cgi?id=1466
> 
> Summary: Spec claims maximal munch technique always works:
> not
> for "1..3"
> Product: D
> Version: 1.020
> Platform: All
> URL: http://digitalmars.com/d/1.0/lex.html
> OS/Version: All
> Status: NEW
> Keywords: spec
> Severity: minor
> Priority: P3
> Component: www.digitalmars.com
> AssignedTo: bugzilla@digitalmars.com
> ReportedBy: deewiant@gmail.com
> A snippet from http://digitalmars.com/d/1.0/lex.html:
> 
> "The source text is split into tokens using the maximal munch
> technique, i.e., the lexical analyzer tries to make the longest token
> it can."
> 
> Relevant parts of the grammar:
> 
> Token:
> FloatLiteral
> ..
> FloatLiteral:
> Float
> Float:
> DecimalFloat
> DecimalFloat:
> DecimalDigits .
> . Decimal
> DecimalDigits:
> DecimalDigit
> DecimalDigit:
> NonZeroDigit
> Decimal:
> NonZeroDigit
> Based on the above, if a lexer encounters "1..3", for instance in a
> slice: "foo[1..3]", it should, using the maximal munch technique, make
> the longest possible token from "1..3": this is the Float "1.". Next,
> it should come up with the Float ".3".
> 
> Of course, this isn't currently happening, and would be problematic if
> it did. But, according to the grammar, that's what should happen,
> unless I'm missing something.
> 
> Either some exception needs to be made or remove the "DecimalDigits ."
> possibility from the grammar and the compiler.
> 

or make it "DecimalDigits . [^.]" where the ^ production is non consuming.

September 01, 2007

Re: [Issue 1466] New: Spec claims maximal munch technique always works: not for "1..3"

Posted by Jascha Wetzel
in reply to BCS

Jascha Wetzel

Posted in reply to BCS

BCS wrote:
> Reply to d-bugmail@puremagic.com,
> 
>> http://d.puremagic.com/issues/show_bug.cgi?id=1466
>>
>> Summary: Spec claims maximal munch technique always works:
>> not
>> for "1..3"
>> Product: D
>> Version: 1.020
>> Platform: All
>> URL: http://digitalmars.com/d/1.0/lex.html
>> OS/Version: All
>> Status: NEW
>> Keywords: spec
>> Severity: minor
>> Priority: P3
>> Component: www.digitalmars.com
>> AssignedTo: bugzilla@digitalmars.com
>> ReportedBy: deewiant@gmail.com
>> A snippet from http://digitalmars.com/d/1.0/lex.html:
>>
>> "The source text is split into tokens using the maximal munch
>> technique, i.e., the lexical analyzer tries to make the longest token
>> it can."
>>
>> Relevant parts of the grammar:
>>
>> Token:
>> FloatLiteral
>> ..
>> FloatLiteral:
>> Float
>> Float:
>> DecimalFloat
>> DecimalFloat:
>> DecimalDigits .
>> . Decimal
>> DecimalDigits:
>> DecimalDigit
>> DecimalDigit:
>> NonZeroDigit
>> Decimal:
>> NonZeroDigit
>> Based on the above, if a lexer encounters "1..3", for instance in a
>> slice: "foo[1..3]", it should, using the maximal munch technique, make
>> the longest possible token from "1..3": this is the Float "1.". Next,
>> it should come up with the Float ".3".
>>
>> Of course, this isn't currently happening, and would be problematic if
>> it did. But, according to the grammar, that's what should happen,
>> unless I'm missing something.
>>
>> Either some exception needs to be made or remove the "DecimalDigits ."
>> possibility from the grammar and the compiler.
>>
> 
> or make it "DecimalDigits . [^.]" where the ^ production is non consuming.

it is possible to parse D using a maximal munch lexer - see the seatd grammar for an example. it's a matter of what lexemes exactly you choose. in this particular case, the float lexemes need to be split, such that those floats with a trailing dot are not matched by a single lexeme.

September 02, 2007

Re: [Issue 1466] New: Spec claims maximal munch technique always works: not for "1..3"

Posted by BCS
in reply to d-bugmail

BCS

Posted in reply to d-bugmail

Reply to d-bugmail@puremagic.com,


> "The source text is split into tokens using the maximal munch
> technique, i.e., the lexical analyzer tries to make the longest token
> it can."
> 

another case:

actual
!isGood -> ! isGood 

MaxMunch
!isGood -> !is Good

September 02, 2007

Re: [Issue 1466] New: Spec claims maximal munch technique always works: not for "1..3"

Posted by Chris Nicholson-Sauls
in reply to BCS

Chris Nicholson-Sauls

Posted in reply to BCS

BCS wrote:
> Reply to d-bugmail@puremagic.com,
> 
> 
>> "The source text is split into tokens using the maximal munch
>> technique, i.e., the lexical analyzer tries to make the longest token
>> it can."
>>
> 
> another case:
> 
> actual
> !isGood -> ! isGood
> MaxMunch
> !isGood -> !is Good
> 
> 

I might be wrong, but my guess is that 'is' is always treated as its own entity, so that '!is' is really ('!' 'is').  Its not a bad practice when one has keyword-operators to do this, to avoid MM screwing up user's identifiers.  But, as I haven't taken any trips through the DMD frontend source, I might be completely off.

-- Chris Nicholson-Sauls

September 02, 2007

Re: [Issue 1466] New: Spec claims maximal munch technique alwaysworks: not for "1..3"

Posted by BCS
in reply to Chris Nicholson-Sauls

BCS

Posted in reply to Chris Nicholson-Sauls

Reply to Chris Nicholson-Sauls,

> BCS wrote:
> 
>> Reply to d-bugmail@puremagic.com,
>> 
>>> "The source text is split into tokens using the maximal munch
>>> technique, i.e., the lexical analyzer tries to make the longest
>>> token it can."
>>> 
>> another case:
>> 
>> actual
>> !isGood -> ! isGood
>> MaxMunch
>> !isGood -> !is Good
> I might be wrong, but my guess is that 'is' is always treated as its
> own entity, so that '!is' is really ('!' 'is').  Its not a bad

That's how I spoted it in the first place

> practice when one has keyword-operators to do this, to avoid MM
> screwing up user's identifiers.  But, as I haven't taken any trips
> through the DMD frontend source, I might be completely off.
> 

For that to work the lexer has to keep track of whitespace. :-b

September 03, 2007

[Issue 1466] Spec claims maximal munch technique always works: not for "1..3"

Posted by d-bugmail
in reply to d-bugmail

d-bugmail

Posted in reply to d-bugmail

http://d.puremagic.com/issues/show_bug.cgi?id=1466


jascha@mainia.de changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jascha@mainia.de




------- Comment #5 from jascha@mainia.de  2007-09-03 06:08 -------
(In reply to comment #0)
> A snippet from http://digitalmars.com/d/1.0/lex.html:
> 
> "The source text is split into tokens using the maximal munch technique, i.e., the lexical analyzer tries to make the longest token it can."
> 
> Relevant parts of the grammar:
> 
> Token:
>         FloatLiteral
>         ..
> 
> FloatLiteral:
>         Float
> 
> Float:
>         DecimalFloat
> 
> DecimalFloat:
>         DecimalDigits .
>         . Decimal
> 
> DecimalDigits:
>         DecimalDigit
> 
> DecimalDigit:
>         NonZeroDigit
> 
> Decimal:
>         NonZeroDigit
> 
> Based on the above, if a lexer encounters "1..3", for instance in a slice: "foo[1..3]", it should, using the maximal munch technique, make the longest possible token from "1..3": this is the Float "1.". Next, it should come up with the Float ".3".
> 
> Of course, this isn't currently happening, and would be problematic if it did. But, according to the grammar, that's what should happen, unless I'm missing something.
> 
> Either some exception needs to be made or remove the "DecimalDigits ." possibility from the grammar and the compiler.
> 

(In reply to comment #1)
> Reply to d-bugmail@puremagic.com,
> 
> > http://d.puremagic.com/issues/show_bug.cgi?id=1466
> > 
> > Summary: Spec claims maximal munch technique always works:
> > not
> > for "1..3"
> > Product: D
> > Version: 1.020
> > Platform: All
> > URL: http://digitalmars.com/d/1.0/lex.html
> > OS/Version: All
> > Status: NEW
> > Keywords: spec
> > Severity: minor
> > Priority: P3
> > Component: www.digitalmars.com
> > AssignedTo: bugzilla@digitalmars.com
> > ReportedBy: deewiant@gmail.com
> > A snippet from http://digitalmars.com/d/1.0/lex.html:
> > 
> > "The source text is split into tokens using the maximal munch technique, i.e., the lexical analyzer tries to make the longest token it can."
> > 
> > Relevant parts of the grammar:
> > 
> > Token:
> > FloatLiteral
> > ..
> > FloatLiteral:
> > Float
> > Float:
> > DecimalFloat
> > DecimalFloat:
> > DecimalDigits .
> > . Decimal
> > DecimalDigits:
> > DecimalDigit
> > DecimalDigit:
> > NonZeroDigit
> > Decimal:
> > NonZeroDigit
> > Based on the above, if a lexer encounters "1..3", for instance in a
> > slice: "foo[1..3]", it should, using the maximal munch technique, make
> > the longest possible token from "1..3": this is the Float "1.". Next,
> > it should come up with the Float ".3".
> > 
> > Of course, this isn't currently happening, and would be problematic if it did. But, according to the grammar, that's what should happen, unless I'm missing something.
> > 
> > Either some exception needs to be made or remove the "DecimalDigits ." possibility from the grammar and the compiler.
> > 
> 
> or make it "DecimalDigits . [^.]" where the ^ production is non consuming.
> 

it is possible to parse D using a maximal munch lexer - see the seatd grammar for an example. it's a matter of what lexemes exactly you choose. in this particular case, the float lexemes need to be split, such that those floats with a trailing dot are not matched by a single lexeme.


--

September 03, 2007

Re: [Issue 1466] New: Spec claims maximal munch technique alwaysworks: not for "1..3"

Posted by Jascha Wetzel
in reply to BCS

Jascha Wetzel

Posted in reply to BCS

BCS wrote:
> For that to work the lexer has to keep track of whitespace. :-b

you can also match "(!is)[^_a-zA-Z0-9]", advancing the input only for the submatch. or use a single-character lookahead.

September 03, 2007

Re: [Issue 1466] New: Spec claims maximal munch technique alwaysworks:not for "1..3"

Posted by BCS
in reply to Jascha Wetzel

BCS

Posted in reply to Jascha Wetzel

Reply to Jascha,

> BCS wrote:
> 
>> For that to work the lexer has to keep track of whitespace. :-b
>> 
> you can also match "(!is)[^_a-zA-Z0-9]", advancing the input only for
> the submatch. or use a single-character lookahead.
> 

That's what I'm hoping to do sooner or later. I already do somthing like that for ".." vs "."

September 09, 2007

[Issue 1466] Spec claims maximal munch technique always works: not for "1..3"

Posted by d-bugmail
in reply to d-bugmail

d-bugmail

Posted in reply to d-bugmail

http://d.puremagic.com/issues/show_bug.cgi?id=1466





------- Comment #7 from matti.niemenmaa+dbugzilla@iki.fi  2007-09-09 12:26 -------
Here's some example code underlining the issue:

class Foo {
        static int opSlice(double a, double b) {
                return 0;
        }
}

void main() {
        // works
        assert (Foo[0. .. 1] == 0);
        // thinks it's [0 ... 1], no maximal munch taking place
        assert (Foo[0... 1] == 0);
}


--

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation