A lexical change (a breaking change, but trivial to fix) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » A lexical change (a breaking change, but trivial to fix)

Thread overview

A lexical change (a breaking change, but trivial to fix)
Jul 07, 2012 Mehrdad
Jul 07, 2012 Alex Rønne Petersen
Jul 07, 2012 Mehrdad
Jul 07, 2012 H. S. Teoh
Jul 08, 2012 deadalnix
Jul 08, 2012 Jonathan M Davis
Jul 07, 2012 H. S. Teoh
Jul 07, 2012 Mehrdad
Jul 07, 2012 Timon Gehr
Jul 08, 2012 Mehrdad
Jul 08, 2012 Mehrdad
Jul 09, 2012 H. S. Teoh
Jul 09, 2012 Mehrdad
Jul 07, 2012 Timon Gehr
Jul 07, 2012 Timon Gehr
Jul 07, 2012 Jonathan M Davis
Jul 07, 2012 Jonathan M Davis

July 07, 2012

A lexical change (a breaking change, but trivial to fix)

Posted by Mehrdad

Mehrdad

This might sound silly, but how about if D stopped allowing   0..2  as a range, and instead just said "invalid floating-point number"?

Fixing it en masse would be pretty trivial... just run a regex to replace
	"\b(\d+)\.\."
with
	"\1 .. "
and you're good to go.

(Or if you want more accuracy, just take the compiler output and feed it back with a fix -- that would work too.)

The benefit, though, is that now you can do maximal munch without worrying about this edge case... which sure makes it easier to make a lexer.

Thoughts?

July 07, 2012

Re: A lexical change (a breaking change, but trivial to fix)

Posted by Alex Rønne Petersen
in reply to Mehrdad

Alex Rønne Petersen

Posted in reply to Mehrdad

On 07-07-2012 23:39, Mehrdad wrote:
> This might sound silly, but how about if D stopped allowing 0..2  as a
> range, and instead just said "invalid floating-point number"?
>
> Fixing it en masse would be pretty trivial... just run a regex to replace
>      "\b(\d+)\.\."
> with
>      "\1 .. "
> and you're good to go.
>
> (Or if you want more accuracy, just take the compiler output and feed it
> back with a fix -- that would work too.)
>
> The benefit, though, is that now you can do maximal munch without
> worrying about this edge case... which sure makes it easier to make a
> lexer.
>
> Thoughts?

... why is this even done at the lexical stage? It should be done at the parsing stage if anything.

-- 
Alex Rønne Petersen
alex@lycus.org
http://lycus.org

July 07, 2012

Re: A lexical change (a breaking change, but trivial to fix)

Posted by Mehrdad
in reply to Alex Rønne Petersen

Mehrdad

Posted in reply to Alex Rønne Petersen

On Saturday, 7 July 2012 at 21:41:44 UTC, Alex Rønne Petersen wrote:
>
> ... why is this even done at the lexical stage? It should be done at the parsing stage if anything.

Well, even better than -- it makes it easier to make a parser.

That said, what's wrong with doing it in the lexical stage?

July 07, 2012

Re: A lexical change (a breaking change, but trivial to fix)

Posted by H. S. Teoh
in reply to Mehrdad

H. S. Teoh

Posted in reply to Mehrdad

On Sat, Jul 07, 2012 at 11:39:59PM +0200, Mehrdad wrote:
> This might sound silly, but how about if D stopped allowing   0..2 as a range, and instead just said "invalid floating-point number"?
[...]

I like writing 0..2 as a range. It's especially nice in array slice notation, where you _want_ to have it as concise as possible.

OTOH, having implemented a D lexer before (just for practice, not production quality), I do see how ambiguities with floating-point numbers can cause a lot of code convolutions.

But I'm gonna have to say no to this one; *I* think a better solution would be to prohibit things like 0. or 1. in a float literal. Either follow it with a digit, or don't write the dot. This will also save us a lot of pain in the UFCS department, where 4.sqrt is currently a pain to lex. Once this is done, 0..2 is no longer ambiguous, and any respectable DFA lexer should be able to handle it with ease.


T

-- 
If a person can't communicate, the very least he could do is to shut up. -- Tom Lehrer, on people who bemoan their communication woes with their loved ones.

July 07, 2012

Re: A lexical change (a breaking change, but trivial to fix)

Posted by H. S. Teoh
in reply to Alex Rønne Petersen

H. S. Teoh

Posted in reply to Alex Rønne Petersen

On Sat, Jul 07, 2012 at 11:41:43PM +0200, Alex Rønne Petersen wrote:
> On 07-07-2012 23:39, Mehrdad wrote:
> >This might sound silly, but how about if D stopped allowing 0..2  as a range, and instead just said "invalid floating-point number"?
[...]
> ... why is this even done at the lexical stage? It should be done at the parsing stage if anything.
[...]

This is because the lexer can mistakenly identify it as "0." followed by ".2" instead of "0" followed by ".." followed by "2".

IMAO, this problem is caused by floating point notational stupidities like 0. and .1, especially the former. Get rid of the former (and optionally the latter) will fix a whole bunch of lexer pain in D.

T

-- 
They say that "guns don't kill people, people kill people." Well I think the gun helps. If you just stood there and yelled BANG, I don't think you'd kill too many people. -- Eddie Izzard, Dressed to Kill

July 07, 2012

Re: A lexical change (a breaking change, but trivial to fix)

Posted by Timon Gehr
in reply to Mehrdad

Timon Gehr

Posted in reply to Mehrdad

On 07/07/2012 11:39 PM, Mehrdad wrote:
> This might sound silly,

+1.

> but how about if D stopped allowing 0..2 as a
> range, and instead just said "invalid floating-point number"?
>
> Fixing it en masse would be pretty trivial... just run a regex to replace
> "\b(\d+)\.\."
> with
> "\1 .. "
> and you're good to go.
>
> (Or if you want more accuracy, just take the compiler output and feed it
> back with a fix -- that would work too.)
>
> The benefit, though, is that now you can do maximal munch without
> worrying about this edge case... which sure makes it easier to make a
> lexer.
>
> Thoughts?

It does not make it easier to create a lexer, because this is not
actually an edge case worth explicitly testing for.

switch(input.front){
    case '0'..'9': ...
    case 'a'..'f', 'A'..'F': ...
    case '.': if('0'>input[1]||input[1]>'9') break;
        ...
}

July 07, 2012

Re: A lexical change (a breaking change, but trivial to fix)

Posted by Timon Gehr
in reply to Timon Gehr

Timon Gehr

Posted in reply to Timon Gehr

On 07/08/2012 12:12 AM, Timon Gehr wrote:
> On 07/07/2012 11:39 PM, Mehrdad wrote:
>> This might sound silly,
>
> +1.
>
>> but how about if D stopped allowing 0..2 as a
>> range, and instead just said "invalid floating-point number"?
>>
>> Fixing it en masse would be pretty trivial... just run a regex to replace
>> "\b(\d+)\.\."
>> with
>> "\1 .. "
>> and you're good to go.
>>
>> (Or if you want more accuracy, just take the compiler output and feed it
>> back with a fix -- that would work too.)
>>
>> The benefit, though, is that now you can do maximal munch without
>> worrying about this edge case... which sure makes it easier to make a
>> lexer.
>>
>> Thoughts?
>
> It does not make it easier to create a lexer, because this is not
> actually an edge case worth explicitly testing for.
>
> switch(input.front){

I meant input[0]. No need for decoding.

>      case '0'..'9': ...
>      case 'a'..'f', 'A'..'F': ...
>      case '.': if('0'>input[1]||input[1]>'9') break;
>          ...
> }

July 07, 2012

Re: A lexical change (a breaking change, but trivial to fix)

Posted by Jonathan M Davis

Jonathan M Davis

On Saturday, July 07, 2012 15:01:50 H. S. Teoh wrote:
> On Sat, Jul 07, 2012 at 11:39:59PM +0200, Mehrdad wrote:
> > This might sound silly, but how about if D stopped allowing   0..2 as a range, and instead just said "invalid floating-point number"?
> 
> [...]
> 
> I like writing 0..2 as a range. It's especially nice in array slice notation, where you _want_ to have it as concise as possible.
> 
> OTOH, having implemented a D lexer before (just for practice, not production quality), I do see how ambiguities with floating-point numbers can cause a lot of code convolutions.
> 
> But I'm gonna have to say no to this one; *I* think a better solution would be to prohibit things like 0. or 1. in a float literal. Either follow it with a digit, or don't write the dot. This will also save us a lot of pain in the UFCS department, where 4.sqrt is currently a pain to lex. Once this is done, 0..2 is no longer ambiguous, and any respectable DFA lexer should be able to handle it with ease.

+1

I think that it's ridiculous that 1. and .1 are legal. 1.f was fixed, so I was shocked to find out recently that 1. and .1 weren't.

- Jonathan M Davis

July 07, 2012

Re: A lexical change (a breaking change, but trivial to fix)

Posted by Mehrdad
in reply to H. S. Teoh

Mehrdad

Posted in reply to H. S. Teoh

On Saturday, 7 July 2012 at 22:00:43 UTC, H. S. Teoh wrote:
> On Sat, Jul 07, 2012 at 11:39:59PM +0200, Mehrdad wrote:
>> This might sound silly, but how about if D stopped allowing   0..2
>> as a range, and instead just said "invalid floating-point number"?
> [...]
>
> I like writing 0..2 as a range. It's especially nice in array slice notation, where you _want_ to have it as concise as possible.

Hmm... true..

> OTOH, having implemented a D lexer before (just for practice, not production quality), I do see how ambiguities with floating-point numbers can cause a lot of code convolutions.

Yeah that's exactly what happened to me lol.
(Mainly the problem I ran into was that I was REALLY trying to avoid extra lookaheads if possible, since I was sticking to the range interface of front/popFront, and trying not to consume more than I can handle... and this was the edge case that broke it.)

> But I'm gonna have to say no to this one; *I* think a better solution would be to prohibit things like 0. or 1. in a float literal. Either follow it with a digit, or don't write the dot. This will also save us a lot of pain in the UFCS department, where 4.sqrt is currently a pain to lex. Once this is done, 0..2 is no longer ambiguous, and any respectable DFA lexer should be able to handle it with ease.

Good idea, I like it too. How about just disallowing trailing decimal points then?

July 07, 2012

Re: A lexical change (a breaking change, but trivial to fix)

Posted by Jonathan M Davis

Jonathan M Davis

On Saturday, July 07, 2012 15:20:28 Jonathan M Davis wrote:
> On Saturday, July 07, 2012 15:01:50 H. S. Teoh wrote:
> > On Sat, Jul 07, 2012 at 11:39:59PM +0200, Mehrdad wrote:
> > > This might sound silly, but how about if D stopped allowing   0..2 as a range, and instead just said "invalid floating-point number"?
> > 
> > [...]
> > 
> > I like writing 0..2 as a range. It's especially nice in array slice notation, where you _want_ to have it as concise as possible.
> > 
> > OTOH, having implemented a D lexer before (just for practice, not production quality), I do see how ambiguities with floating-point numbers can cause a lot of code convolutions.
> > 
> > But I'm gonna have to say no to this one; *I* think a better solution would be to prohibit things like 0. or 1. in a float literal. Either follow it with a digit, or don't write the dot. This will also save us a lot of pain in the UFCS department, where 4.sqrt is currently a pain to lex. Once this is done, 0..2 is no longer ambiguous, and any respectable DFA lexer should be able to handle it with ease.
> 
> +1
> 
> I think that it's ridiculous that 1. and .1 are legal. 1.f was fixed, so I was shocked to find out recently that 1. and .1 weren't.

There's an existing enhancement request for it:

http://d.puremagic.com/issues/show_bug.cgi?id=6277

- Jonathan M Davis

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation