September 12, 2013 Re: std.d.lexer: pre-voting review / discussion | ||||
---|---|---|---|---|
| ||||
Posted in reply to H. S. Teoh | On Wednesday, 11 September 2013 at 20:28:06 UTC, H. S. Teoh wrote:
> On Wed, Sep 11, 2013 at 10:18:12PM +0200, Dicebot wrote:
>> On Wednesday, 11 September 2013 at 20:08:44 UTC, H. S. Teoh wrote:
>> >On Wed, Sep 11, 2013 at 10:04:20PM +0200, Dicebot wrote:
>> >>On Wednesday, 11 September 2013 at 19:58:36 UTC, H. S. Teoh wrote:
>> >>>I disagree. I think it's more readable to use a consistent prefix,
>> >>>like kw... or kw_... (e.g. kw_int, kw_return, etc.), so that it's
>> >>>clear you're referring to token types, not the actual keyword.
>> >>
>> >>Not unless you want to change the style guide and break existing
>> >>Phobos code ;)
>> >
>> >How would that break Phobos code? Phobos code doesn't even use
>> >std.d.lexer right now.
>>
>> Phobos code must conform its style guide. You can't change it
>> without changing existing Phobos code that relies on it.
>> Inconsistent style is worst of all options.
>
> This doesn't violate Phobos style guidelines:
>
> enum TokenType {
> kwInt,
> kwFloat,
> kwDouble,
> ...
> kwFunction,
> kwScope,
> ... // etc.
> }
>
Int, Function, Scope, Import are all valid identifiers.
random minimization like kw is really bad. It is even worse when it doesn't make anything sorter.
|
September 12, 2013 Re: std.d.lexer: pre-voting review / discussion | ||||
---|---|---|---|---|
| ||||
Posted in reply to deadalnix | On 9/11/2013 6:30 PM, deadalnix wrote:
> Indeed. What solution do you have in mind ?
The solution dmd uses is to put in an intermediary layer that saves the lookahead tokens in a linked list.
|
September 12, 2013 Re: std.d.lexer: pre-voting review / discussion | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | On Thursday, 12 September 2013 at 01:39:52 UTC, Walter Bright wrote:
> On 9/11/2013 6:30 PM, deadalnix wrote:
>> Indeed. What solution do you have in mind ?
>
> The solution dmd uses is to put in an intermediary layer that saves the lookahead tokens in a linked list.
But then, you have an extra step when looking up every tokens + memory management overhead. How big is the performance improvement ?
|
September 12, 2013 Re: std.d.lexer: pre-voting review / discussion | ||||
---|---|---|---|---|
| ||||
Posted in reply to deadalnix | On Thursday, September 12, 2013 03:37:06 deadalnix wrote:
> Int, Function, Scope, Import are all valid identifiers.
All of which violate Phobos' naming conventions for enum values (they must start with a lowercase letter), which is why we went with adding an _ on the end. And it's pretty much as close as you can get to the keyword without actually using the keyword, which is a plus IMHO (though from the sounds of it, H.S. Teoh would consider that a negative due to possible confusion with the keyword).
- Jonathan M Davis
|
September 12, 2013 Re: std.d.lexer: pre-voting review / discussion | ||||
---|---|---|---|---|
| ||||
Posted in reply to deadalnix | On 09/12/2013 03:30 AM, deadalnix wrote: >> >> That's correct, but that implies re-lexing the tokens, which has >> negative performance implications. > > Indeed. What solution do you have in mind ? Buffering the tokens would work. There are some ways promote input ranges to forward ranges. But there are also some pitfalls like the implicit save on copy. I have two prototypes for a generic input range buffer. https://gist.github.com/dawgfoto/2187220 - uses growing ring buffer https://gist.github.com/dawgfoto/1257196 - uses ref counted lookahead buffers in a singly linked list The lexer itself has a ringbuffer for input ranges. https://github.com/Hackerpilot/phobos/blob/master/std/d/lexer.d#L2278 |
September 12, 2013 Re: std.d.lexer: pre-voting review / discussion | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | On 09/12/2013 03:39 AM, Walter Bright wrote:
> On 9/11/2013 6:30 PM, deadalnix wrote:
>> Indeed. What solution do you have in mind ?
>
> The solution dmd uses is to put in an intermediary layer that saves the
> lookahead tokens in a linked list.
Linked list sounds bad.
Do you have a rough idea how often lookahead is needed, i.e. is it performance relevant? If so it might be worth tuning.
|
September 12, 2013 Re: std.d.lexer: pre-voting review / discussion | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jonathan M Davis | Jonathan M Davis wrote:
> You have to look ahead to figure out whether it's .. or a floating point literal.
This lookahead is introduced by using a petty grammar.
Please reconsider, that lexing searches for the leftmost longest pattern
of the rest of the input. This means that introducing a pattern like
`<int>\.\.' return TokenType.INTDOTDOT;
would eliminate the lookahead in the lexer.
In the parser an additional rule then has to be added:
<range> ::= INT DOTDOT INT
| INTDOTDOT INT
-manfred
|
September 12, 2013 Re: std.d.lexer: pre-voting review / discussion | ||||
---|---|---|---|---|
| ||||
Posted in reply to Brian Schott | Brian Schott wrote:
>>> > Parsing D requires arbitrary lookahead.
> Yeah. D requires lookahead in both lexing and parsing.
Walter road about _arbitrary_ overhead, i.e. unlimited overhead.
-manfred
|
September 12, 2013 Re: std.d.lexer: pre-voting review / discussion | ||||
---|---|---|---|---|
| ||||
On Wed, Sep 11, 2013 at 10:06:11PM -0400, Jonathan M Davis wrote: > On Thursday, September 12, 2013 03:37:06 deadalnix wrote: > > Int, Function, Scope, Import are all valid identifiers. > > All of which violate Phobos' naming conventions for enum values (they must start with a lowercase letter), which is why we went with adding an _ on the end. And it's pretty much as close as you can get to the keyword without actually using the keyword, which is a plus IMHO (though from the sounds of it, H.S. Teoh would consider that a negative due to possible confusion with the keyword). [...] Actually, the main issue I have is that some of the enum values end with _ while others don't. This is inconsistent. I'd rather have consistency than superficial resemblance to the keywords as typed. Either *all* of the enum values should end with _, or *none* of them should. Having a mixture of both is an eyesore, and leads to people wondering, should I add a _ at the end or not? If people insist that the 'default' keyword absolutely must be represented as TokenType.default_ (I really don't see why), then *all* TokenType values should end with _. But honestly, I find that really ugly. Writing something like kwDefault, or tokenTypeDefault, would be far better. Sigh, Andrei was right. Once the bikeshed is up for painting, even the rainbow won't suffice. :-P T -- MASM = Mana Ada Sistem, Man! |
September 12, 2013 Re: std.d.lexer: pre-voting review / discussion | ||||
---|---|---|---|---|
| ||||
Posted in reply to Brian Schott | On Thu, Sep 12, 2013 at 03:17:06AM +0200, Brian Schott wrote: > On Thursday, 12 September 2013 at 00:13:36 UTC, H. S. Teoh wrote: > >But then the code example proceeds to pass byLine() to it. Is that correct? If it is, then the docs need to be updated, because last time I checked, byLine() isn't a range of char, but a range of char *arrays*. > > > > > >T > > The example doesn't pass the result of byLine to the byToken function directly. *facepalm* You're right, it's calling join() on it. Nevermind what I said then. :-P Sorry for all the unnecessary noise, I don't know what got into me that I didn't see the join(). T -- May you live all the days of your life. -- Jonathan Swift |
Copyright © 1999-2021 by the D Language Foundation