Thread overview | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
September 11, 2013 std.d.lexer: pre-voting review / discussion | ||||
---|---|---|---|---|
| ||||
std.d.lexer is standard module for lexing D code, written by Brian Schott ---- Input ---- Code: https://github.com/Hackerpilot/phobos/tree/master/std/d Documentation: http://hackerpilot.github.io/experimental/std_lexer/phobos/lexer.html Initial discussion: http://forum.dlang.org/thread/dpdgcycrgfspcxenzrjf@forum.dlang.org Usage example in real project: https://github.com/Hackerpilot/Dscanner (as stdx.d.lexer, Brian, please correct me if versions do not match) ---- Information for reviewers ---- (yes, I am mostly copy-pasting this :P) Goal of this thread is to detect if there are any outstanding issues that need to fixed before formal "yes"/"no" voting happens. If no critical objections will arise, voting will begin starting with a next week. Otherwise it depends on time module author needs to implement suggestions. Please take this part seriously: "If you identify problems along the way, please note if they are minor, serious, or showstoppers." (http://wiki.dlang.org/Review/Process). This information later will be used to determine if library is ready for voting. If there are any frequent Phobos contributors / core developers please pay extra attention to submission code style and fitting into overall Phobos guidelines and structure. Most important goal of this review is to determine any API / design problems. Any internal implementation tweaks may happen after inclusion to Phobos but it is important to assure that no breaking changes will be required any time soon after module will get wider usage. ---- Information request from module author ---- Performance was a major discussed topic in previous thread. Could you please provide benchmarking data for version currently ongoing the review? |
September 11, 2013 Re: std.d.lexer: pre-voting review / discussion | ||||
---|---|---|---|---|
| ||||
Posted in reply to Dicebot | On Wednesday, 11 September 2013 at 15:02:00 UTC, Dicebot wrote: > std.d.lexer is standard module for lexing D code, written by Brian Schott I remember reading there were some interesting hash-advances in dmd recently. http://forum.dlang.org/thread/kq7ov0$2o8n$1@digitalmars.com?page=1 maybe it's worth benchmarking those hashes for std.d.lexer as well. |
September 11, 2013 Re: std.d.lexer: pre-voting review / discussion | ||||
---|---|---|---|---|
| ||||
Posted in reply to Dicebot | Am Wed, 11 Sep 2013 17:01:58 +0200 schrieb "Dicebot" <public@dicebot.lv>: > std.d.lexer is standard module for lexing D code, written by Brian Schott > Question / Minor issue: As we already have a range based interface I'd love to have partial lexing / parsing, especially for IDEs. Say I have this source code: -------------------------------------------- 1: module a; 2: 3: void test(int a) 4: { 5: [...] 6: } 7: 8: void test2() 9: [...] -------------------------------------------- Then I first do a full parse pass over the source. Now line 5 is being edited. I know from the full parse that line 5 is part of a FunctionDeclaration which starts at line 3 and ends at line 6. Now I'd like to re-parse only that part: -------------------------------------------- FunctionDeclaration decl = document.getDeclByLine(5); decl.reparse(/*start_line=*/ 3, docBuffer); -------------------------------------------- I think these are the two critical points related to this for the proposed std.lexer: * How can I tell the lexer to start lexing at line/character n? Of course the input could be sliced, but then line number and position information in the Token struct is wrong. * I guess std.lexer slices the original input? This could make things difficult if the file buffer is edited in place. But this can probably be dealt with outside of std.lexer. (By reallocating all .value members) (And once this is working, an example in the docs would be great) But to be honest I'm not sure how important this really is. I think it should help for more responsive IDEs but maybe parsing is not a bottleneck at all? |
September 11, 2013 Re: std.d.lexer: pre-voting review / discussion | ||||
---|---|---|---|---|
| ||||
Posted in reply to Dicebot | On Wednesday, 11 September 2013 at 15:02:00 UTC, Dicebot wrote:
> std.d.lexer is standard module for lexing D code, written by Brian Schott
>
> Documentation:
> http://hackerpilot.github.io/experimental/std_lexer/phobos/lexer.html
The documentation for Token twice says "measured in ASCII characters or UTF-8 code units", which sounds confusing to me.
Is it UTF-8, which includes ASCII? Then it should not be "or".
This is nitpicking. Overall, I like the proposal. Great work!
|
September 11, 2013 Re: std.d.lexer: pre-voting review / discussion | ||||
---|---|---|---|---|
| ||||
Posted in reply to Dicebot | On 9/11/2013 8:01 AM, Dicebot wrote: > std.d.lexer is standard module for lexing D code, written by Brian Schott Thank you, Brian! This is important work. Not a thorough review, just some notes from reading the doc file: 1. I don't like the _ suffix for keywords. Just call it kwimport or something like that. 2. The example uses an if-then sequence of isBuiltType, isKeyword, etc. Should be an enum so a switch can be done for speed. 3. I assumed TokenType is a type. But it's not, it's an enum. Even the document says it's a 'type', but it's not a type. 4. When naming tokens like .. 'slice', it is giving it a syntactic/semantic name rather than a token name. This would be awkward if .. took on new meanings in D. Calling it 'dotdot' would be clearer. Ditto for the rest. For example that is done better, '*' is called 'star', rather than 'dereference'. 5. The LexerConfig initialization should be a constructor rather than a sequence of assignments. LexerConfig documentation is awfully thin. For example, 'tokenStyle' is explained as being 'Token style', whatever that is. 6. No clue how lookahead works with this. Parsing D requires arbitrary lookahead. 7. uint line; Should indicate that lines start with '1', not '0'. Ditto for columns. 8. 'default_' Again with the awful practice of appending _. 9. Need to insert intra-page navigation links, such as when 'byToken()' appears in the text, it should be link to where byToken is described. > Goal of this thread is to detect if there are any outstanding > issues that need to fixed before formal "yes"/"no" voting > happens. If no critical objections will arise, voting will begin > starting with a next week. Otherwise it depends on time module author needs to implement suggestions. I believe the state of the documentation is a showstopper, and needs to be extensively fleshed out before it can be considered ready for voting. |
September 11, 2013 Re: std.d.lexer: pre-voting review / discussion | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | The choice of ending token names with underscores was made according to the Phobos style guide. http://dlang.org/dstyle.html |
September 11, 2013 Re: std.d.lexer: pre-voting review / discussion | ||||
---|---|---|---|---|
| ||||
Posted in reply to Brian Schott | On 9/11/2013 12:10 PM, Brian Schott wrote:
> The choice of ending token names with underscores was made according to the
> Phobos style guide.
>
> http://dlang.org/dstyle.html
I didn't realize that was in the style guide. I guess I can't complain about it, then :-)
|
September 11, 2013 Re: std.d.lexer: pre-voting review / discussion | ||||
---|---|---|---|---|
| ||||
Posted in reply to qznc | On 9/11/2013 11:45 AM, qznc wrote:
> On Wednesday, 11 September 2013 at 15:02:00 UTC, Dicebot wrote:
>> std.d.lexer is standard module for lexing D code, written by Brian Schott
>>
>> Documentation:
>> http://hackerpilot.github.io/experimental/std_lexer/phobos/lexer.html
>
> The documentation for Token twice says "measured in ASCII characters or UTF-8
> code units", which sounds confusing to me.
>
> Is it UTF-8, which includes ASCII? Then it should not be "or".
Pedantically, it is just UTF-8 code units, which are a superset of ASCII.
|
September 11, 2013 Re: std.d.lexer: pre-voting review / discussion | ||||
---|---|---|---|---|
| ||||
Posted in reply to Johannes Pfau | On 9/11/2013 11:43 AM, Johannes Pfau wrote:
> But to be honest I'm not sure how important this really is. I think it
> should help for more responsive IDEs but maybe parsing is not a
> bottleneck at all?
It is important, and I'm glad you brought it up. The LexerConfig can provide a spot to put a starting line/column value.
|
September 11, 2013 Re: std.d.lexer: pre-voting review / discussion | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | Walter Bright wrote:
> Parsing D requires arbitrary lookahead.
Why---and since which version?
-manfred
|
Copyright © 1999-2021 by the D Language Foundation