July 07, 2012
On 07/08/2012 12:23 AM, Mehrdad wrote:
> On Saturday, 7 July 2012 at 22:00:43 UTC, H. S. Teoh wrote:
>> On Sat, Jul 07, 2012 at 11:39:59PM +0200, Mehrdad wrote:
>>> This might sound silly, but how about if D stopped allowing 0..2
>>> as a range, and instead just said "invalid floating-point number"?
>> [...]
>>
>> I like writing 0..2 as a range. It's especially nice in array slice
>> notation, where you _want_ to have it as concise as possible.
>
> Hmm... true..
>
>> OTOH, having implemented a D lexer before (just for practice, not
>> production quality), I do see how ambiguities with floating-point
>> numbers can cause a lot of code convolutions.
>
> Yeah that's exactly what happened to me lol.
> (Mainly the problem I ran into was that I was REALLY trying to avoid
> extra lookaheads if possible, since I was sticking to the range
> interface of front/popFront, and trying not to consume more than I can
> handle... and this was the edge case that broke it.)
>

You could go like this:

switch(input.front) {
    case '0'..'9':
        bool consumedtrailingdot;
        output.put(parseNumber(input, consumedtrailingdot));
        if(!consumedtrailingdot) continue;
        if(input.front != '.') {
            output.put(Token("."));
            continue;
        }
        input.popFront();
        if(input.front != '.') {
            output.put(Token(".."));
            continue;
        }
        output.put(Token("..."));
        continue;
}

>> But I'm gonna have to say no to this one; *I* think a better solution
>> would be to prohibit things like 0. or 1. in a float literal. Either
>> follow it with a digit, or don't write the dot. This will also save us
>> a lot of pain in the UFCS department, where 4.sqrt is currently a pain
>> to lex. Once this is done, 0..2 is no longer ambiguous, and any
>> respectable DFA lexer should be able to handle it with ease.
>
> Good idea, I like it too. How about just disallowing trailing decimal
> points then?
>

+1.
July 08, 2012
On 08/07/2012 00:04, H. S. Teoh wrote:
> On Sat, Jul 07, 2012 at 11:41:43PM +0200, Alex Rønne Petersen wrote:
>> On 07-07-2012 23:39, Mehrdad wrote:
>>> This might sound silly, but how about if D stopped allowing 0..2  as a
>>> range, and instead just said "invalid floating-point number"?
> [...]
>> ... why is this even done at the lexical stage? It should be done at
>> the parsing stage if anything.
> [...]
>
> This is because the lexer can mistakenly identify it as "0." followed by
> ".2" instead of "0" followed by ".." followed by "2".
>
> IMAO, this problem is caused by floating point notational stupidities
> like 0. and .1, especially the former. Get rid of the former (and
> optionally the latter) will fix a whole bunch of lexer pain in D.
>
>
> T
>

0. should be banned because of UFCS anyway.
July 08, 2012
On Sunday, July 08, 2012 03:29:37 deadalnix wrote:
> On 08/07/2012 00:04, H. S. Teoh wrote:
> > On Sat, Jul 07, 2012 at 11:41:43PM +0200, Alex Rønne Petersen wrote:
> >> On 07-07-2012 23:39, Mehrdad wrote:
> >>> This might sound silly, but how about if D stopped allowing 0..2  as a range, and instead just said "invalid floating-point number"?
> > 
> > [...]
> > 
> >> ... why is this even done at the lexical stage? It should be done at the parsing stage if anything.
> > 
> > [...]
> > 
> > This is because the lexer can mistakenly identify it as "0." followed by ".2" instead of "0" followed by ".." followed by "2".
> > 
> > IMAO, this problem is caused by floating point notational stupidities like 0. and .1, especially the former. Get rid of the former (and optionally the latter) will fix a whole bunch of lexer pain in D.
> > 
> > 
> > T
> 
> 0. should be banned because of UFCS anyway.

If you do 0.func(), UFCS works. If you do 0.f(), UFCS works (so 0.f as a literal is illegal), but there's notthing about UFCS preventing 0. from working as long as there's nothing immediately after it which could be considered a function (which would just be any letter and _). So, UFCS really isn't an argument for banning 0. as a literal. It's the facts that it's ludicrous to accept a partial literal and that it causes parsing problems which make it so that it should be banned.

- Jonathan M Davis
July 08, 2012
On Saturday, 7 July 2012 at 22:54:15 UTC, Timon Gehr wrote:
> On 07/08/2012 12:23 AM, Mehrdad wrote:
>
> You could go like this:
>
> switch(input.front) {
>     case '0'..'9':
>         bool consumedtrailingdot;
>         output.put(parseNumber(input, consumedtrailingdot));
>         if(!consumedtrailingdot) continue;
>         if(input.front != '.') {
>             output.put(Token("."));
>             continue;
>         }
>         input.popFront();
>         if(input.front != '.') {
>             output.put(Token(".."));
>             continue;
>         }
>         output.put(Token("..."));
>         continue;
> }

You kinda glossed over the crucial detail in parseNumber().  ;)

What happens if it sees   "2..3" ?

Then it /must/ have eaten the first period (it can't see the second period otherwise)... in which case now you have no idea that happened.

Of course, it's trivial to fix with an extra lookahead, but that would require using a forward range instead of an input range. (Which, again, is easy to do with an adapter -- what I ended up doing -- but the point is, it makes it harder to lex the code with just an input range.)
July 08, 2012
On Saturday, 7 July 2012 at 22:54:15 UTC, Timon Gehr wrote:
> On 07/08/2012 12:23 AM, Mehrdad wrote:
>> On Saturday, 7 July 2012 at 22:00:43 UTC, H. S. Teoh wrote:
>>> On Sat, Jul 07, 2012 at 11:39:59PM +0200, Mehrdad wrote:
>>>> This might sound silly, but how about if D stopped allowing 0..2
>>>> as a range, and instead just said "invalid floating-point number"?
>>> [...]
>>>
>>> I like writing 0..2 as a range. It's especially nice in array slice
>>> notation, where you _want_ to have it as concise as possible.
>>
>> Hmm... true..
>>
>>> OTOH, having implemented a D lexer before (just for practice, not
>>> production quality), I do see how ambiguities with floating-point
>>> numbers can cause a lot of code convolutions.
>>
>> Yeah that's exactly what happened to me lol.
>> (Mainly the problem I ran into was that I was REALLY trying to avoid
>> extra lookaheads if possible, since I was sticking to the range
>> interface of front/popFront, and trying not to consume more than I can
>> handle... and this was the edge case that broke it.)
>>
>
> You could go like this:
>
> switch(input.front) {
>     case '0'..'9':
>         bool consumedtrailingdot;
>         output.put(parseNumber(input, consumedtrailingdot));
>         if(!consumedtrailingdot) continue;
>         if(input.front != '.') {
>             output.put(Token("."));
>             continue;
>         }
>         input.popFront();
>         if(input.front != '.') {
>             output.put(Token(".."));
>             continue;
>         }
>         output.put(Token("..."));
>         continue;
> }


Right, it's trivial to fix with an extra state variable like 'consumedtrailingdot'.

The point was, it requires an extra lookahead character, which I was trying to avoid (mainly for fun).

In this case, it doesn't really make a difference in practice -- but in general I don't like lookaheads, because depending on future data makes it hard for e.g. the user to enter data via the console.
July 09, 2012
On Sun, Jul 08, 2012 at 09:59:38AM +0200, Mehrdad wrote: [...]
> Right, it's trivial to fix with an extra state variable like 'consumedtrailingdot'.

This is eventually what I did in my own D lexer.

Well, actually, I kinda blasted an ant with an M16... I had a queue of backlogged tokens which getNext will return if non-empty, and when I recognized things like 1..4, I would push 2 or 3 tokens onto the backlog queue, so no extra state is needed (although the backlog queue itself is really just another form of extra state).


> The point was, it requires an extra lookahead character, which I was trying to avoid (mainly for fun).
> 
> In this case, it doesn't really make a difference in practice -- but in general I don't like lookaheads, because depending on future data makes it hard for e.g. the user to enter data via the console.

In my case, the extra lookahead only happens when the lexer sees string prefixes like "4..", which doesn't usually happen at the end of a line. In all other cases, no lookahead is actually necessary, so except for the very rare case, entering data via console actually works just fine.


T

-- 
I am a consultant. My job is to make your job redundant. -- Mr Tom
July 09, 2012
On Monday, 9 July 2012 at 17:06:44 UTC, H. S. Teoh wrote:
>> In this case, it doesn't really make a difference in practice
> In my case, the extra lookahead only happens when the lexer sees string prefixes like "4..", which doesn't usually happen at the end of a line. In all other cases, no lookahead is actually necessary, so except for the very rare case, entering data via console actually works just fine.

Yup, hence my line above. :P
1 2
Next ›   Last »