std.d.lexer: pre-voting review / discussion (page 5)

On Wednesday, 11 September 2013 at 20:28:06 UTC, H. S. Teoh wrote: > On Wed, Sep 11, 2013 at 10:18:12PM +0200, Dicebot wrote: >> On Wednesday, 11 September 2013 at 20:08:44 UTC, H. S. Teoh wrote: >> >On Wed, Sep 11, 2013 at 10:04:20PM +0200, Dicebot wrote: >> >>On Wednesday, 11 September 2013 at 19:58:36 UTC, H. S. Teoh wrote: >> >>>I disagree. I think it's more readable to use a consistent prefix, >> >>>like kw... or kw_... (e.g. kw_int, kw_return, etc.), so that it's >> >>>clear you're referring to token types, not the actual keyword. >> >> >> >>Not unless you want to change the style guide and break existing >> >>Phobos code ;) >> > >> >How would that break Phobos code? Phobos code doesn't even use >> >std.d.lexer right now. >> >> Phobos code must conform its style guide. You can't change it >> without changing existing Phobos code that relies on it. >> Inconsistent style is worst of all options. > > This doesn't violate Phobos style guidelines: > > enum TokenType { > kwInt, > kwFloat, > kwDouble, > ... > kwFunction, > kwScope, > ... // etc. > } > Int, Function, Scope, Import are all valid identifiers. random minimization like kw is really bad. It is even worse when it doesn't make anything sorter.

On Thursday, 12 September 2013 at 01:39:52 UTC, Walter Bright wrote: > On 9/11/2013 6:30 PM, deadalnix wrote: >> Indeed. What solution do you have in mind ? > > The solution dmd uses is to put in an intermediary layer that saves the lookahead tokens in a linked list. But then, you have an extra step when looking up every tokens + memory management overhead. How big is the performance improvement ?

On Thursday, September 12, 2013 03:37:06 deadalnix wrote: > Int, Function, Scope, Import are all valid identifiers. All of which violate Phobos' naming conventions for enum values (they must start with a lowercase letter), which is why we went with adding an _ on the end. And it's pretty much as close as you can get to the keyword without actually using the keyword, which is a plus IMHO (though from the sounds of it, H.S. Teoh would consider that a negative due to possible confusion with the keyword). - Jonathan M Davis

On 09/12/2013 03:30 AM, deadalnix wrote: >> >> That's correct, but that implies re-lexing the tokens, which has >> negative performance implications. > > Indeed. What solution do you have in mind ? Buffering the tokens would work. There are some ways promote input ranges to forward ranges. But there are also some pitfalls like the implicit save on copy. I have two prototypes for a generic input range buffer. https://gist.github.com/dawgfoto/2187220 - uses growing ring buffer https://gist.github.com/dawgfoto/1257196 - uses ref counted lookahead buffers in a singly linked list The lexer itself has a ringbuffer for input ranges. https://github.com/Hackerpilot/phobos/blob/master/std/d/lexer.d#L2278

On 09/12/2013 03:39 AM, Walter Bright wrote: > On 9/11/2013 6:30 PM, deadalnix wrote: >> Indeed. What solution do you have in mind ? > > The solution dmd uses is to put in an intermediary layer that saves the > lookahead tokens in a linked list. Linked list sounds bad. Do you have a rough idea how often lookahead is needed, i.e. is it performance relevant? If so it might be worth tuning.

Jonathan M Davis wrote: > You have to look ahead to figure out whether it's .. or a floating point literal. This lookahead is introduced by using a petty grammar. Please reconsider, that lexing searches for the leftmost longest pattern of the rest of the input. This means that introducing a pattern like `<int>\.\.' return TokenType.INTDOTDOT; would eliminate the lookahead in the lexer. In the parser an additional rule then has to be added: <range> ::= INT DOTDOT INT | INTDOTDOT INT -manfred

Brian Schott wrote: >>> > Parsing D requires arbitrary lookahead. > Yeah. D requires lookahead in both lexing and parsing. Walter road about _arbitrary_ overhead, i.e. unlimited overhead. -manfred

September 12, 2013

Re: std.d.lexer: pre-voting review / discussion

Posted by H. S. Teoh

Permalink

H. S. Teoh

Permalink

On Wed, Sep 11, 2013 at 10:06:11PM -0400, Jonathan M Davis wrote:
> On Thursday, September 12, 2013 03:37:06 deadalnix wrote:
> > Int, Function, Scope, Import are all valid identifiers.
> 
> All of which violate Phobos' naming conventions for enum values (they must start with a lowercase letter), which is why we went with adding an _ on the end. And it's pretty much as close as you can get to the keyword without actually using the keyword, which is a plus IMHO (though from the sounds of it, H.S. Teoh would consider that a negative due to possible confusion with the keyword).
[...]

Actually, the main issue I have is that some of the enum values end with _ while others don't. This is inconsistent. I'd rather have consistency than superficial resemblance to the keywords as typed. Either *all* of the enum values should end with _, or *none* of them should.  Having a mixture of both is an eyesore, and leads to people wondering, should I add a _ at the end or not?

If people insist that the 'default' keyword absolutely must be represented as TokenType.default_ (I really don't see why), then *all* TokenType values should end with _. But honestly, I find that really ugly. Writing something like kwDefault, or tokenTypeDefault, would be far better.

Sigh, Andrei was right. Once the bikeshed is up for painting, even the rainbow won't suffice. :-P

T

-- 
MASM = Mana Ada Sistem, Man!

On Thu, Sep 12, 2013 at 03:17:06AM +0200, Brian Schott wrote: > On Thursday, 12 September 2013 at 00:13:36 UTC, H. S. Teoh wrote: > >But then the code example proceeds to pass byLine() to it. Is that correct? If it is, then the docs need to be updated, because last time I checked, byLine() isn't a range of char, but a range of char *arrays*. > > > > > >T > > The example doesn't pass the result of byLine to the byToken function directly. *facepalm* You're right, it's calling join() on it. Nevermind what I said then. :-P Sorry for all the unnecessary noise, I don't know what got into me that I didn't see the join(). T -- May you live all the days of your life. -- Jonathan Swift

Forums