Jump to page: 1 212  
Page
Thread overview
std.d.lexer: pre-voting review / discussion
Sep 11, 2013
Dicebot
Sep 11, 2013
Tove
Sep 11, 2013
Johannes Pfau
Sep 11, 2013
Walter Bright
Sep 11, 2013
qznc
Sep 11, 2013
Walter Bright
Sep 11, 2013
Walter Bright
Sep 11, 2013
Brian Schott
Sep 11, 2013
Walter Bright
Sep 11, 2013
H. S. Teoh
Sep 11, 2013
Dicebot
Sep 11, 2013
H. S. Teoh
Sep 11, 2013
Dicebot
Sep 11, 2013
H. S. Teoh
Sep 12, 2013
deadalnix
Sep 12, 2013
Jonathan M Davis
Sep 12, 2013
H. S. Teoh
Sep 12, 2013
sclytrack
Sep 11, 2013
Manfred Nowak
Sep 11, 2013
Walter Bright
Sep 11, 2013
Manfred Nowak
Sep 12, 2013
Robert Schadek
Sep 12, 2013
Manfred Nowak
Sep 11, 2013
Jonathan M Davis
Sep 11, 2013
Brian Schott
Sep 12, 2013
deadalnix
Sep 12, 2013
Manfred Nowak
Sep 12, 2013
Manfred Nowak
Sep 11, 2013
Jonathan M Davis
Sep 11, 2013
Jonathan M Davis
Sep 11, 2013
H. S. Teoh
Sep 11, 2013
Walter Bright
Sep 11, 2013
H. S. Teoh
Sep 11, 2013
Walter Bright
Sep 12, 2013
H. S. Teoh
Sep 12, 2013
Brian Schott
Sep 12, 2013
H. S. Teoh
Sep 12, 2013
Jacob Carlborg
Sep 11, 2013
Brian Schott
Sep 11, 2013
H. S. Teoh
Sep 12, 2013
Jacob Carlborg
Sep 11, 2013
Piotr Szturmaj
Sep 11, 2013
Kapps
Sep 11, 2013
Piotr Szturmaj
Sep 12, 2013
Jacob Carlborg
Sep 11, 2013
Michel Fortin
Sep 11, 2013
Walter Bright
Sep 12, 2013
deadalnix
Sep 12, 2013
Walter Bright
Sep 12, 2013
deadalnix
Sep 12, 2013
Walter Bright
Sep 12, 2013
deadalnix
Sep 12, 2013
Walter Bright
Sep 12, 2013
deadalnix
Sep 12, 2013
Martin Nowak
Sep 12, 2013
Martin Nowak
Sep 12, 2013
H. S. Teoh
Sep 12, 2013
deadalnix
Sep 12, 2013
H. S. Teoh
Sep 12, 2013
deadalnix
Sep 12, 2013
H. S. Teoh
Sep 13, 2013
deadalnix
Sep 12, 2013
Manfred Nowak
Sep 28, 2013
Mehrdad
Sep 28, 2013
Mehrdad
Sep 12, 2013
Walter Bright
Sep 12, 2013
Robert Schadek
Sep 12, 2013
Dmitry Olshansky
Sep 12, 2013
Robert Schadek
Sep 12, 2013
Jonathan M Davis
Sep 12, 2013
qznc
Sep 12, 2013
Martin Nowak
Sep 12, 2013
Timon Gehr
Sep 12, 2013
Dmitry Olshansky
Sep 12, 2013
H. S. Teoh
Sep 12, 2013
Timon Gehr
Sep 12, 2013
Jacob Carlborg
Sep 12, 2013
Martin Nowak
Sep 12, 2013
Timon Gehr
Sep 11, 2013
Walter Bright
Sep 11, 2013
Brian Schott
Sep 11, 2013
Walter Bright
Sep 11, 2013
Martin Nowak
Sep 12, 2013
Jacob Carlborg
Sep 12, 2013
dennis luehring
Sep 12, 2013
Jacob Carlborg
Sep 12, 2013
Dmitry Olshansky
Sep 12, 2013
Jacob Carlborg
Sep 12, 2013
Brian Schott
Sep 12, 2013
dennis luehring
Sep 12, 2013
dennis luehring
Sep 12, 2013
Walter Bright
Sep 12, 2013
Brian Schott
Sep 13, 2013
Walter Bright
Sep 12, 2013
deadalnix
Sep 12, 2013
Timon Gehr
Sep 17, 2013
Dicebot
Sep 17, 2013
deadalnix
Sep 17, 2013
Dicebot
Sep 17, 2013
Brian Schott
Sep 17, 2013
Dicebot
Sep 25, 2013
Brian Schott
Sep 25, 2013
Jacob Carlborg
Sep 25, 2013
Brian Schott
Sep 25, 2013
Jacob Carlborg
Sep 26, 2013
Jos van Uden
Sep 25, 2013
Brian Schott
Sep 25, 2013
deadalnix
Sep 17, 2013
ilya-stromberg
September 11, 2013
std.d.lexer is standard module for lexing D code, written by Brian Schott

---- Input ----

Code: https://github.com/Hackerpilot/phobos/tree/master/std/d

Documentation:
http://hackerpilot.github.io/experimental/std_lexer/phobos/lexer.html

Initial discussion:
http://forum.dlang.org/thread/dpdgcycrgfspcxenzrjf@forum.dlang.org

Usage example in real project:
https://github.com/Hackerpilot/Dscanner
(as stdx.d.lexer, Brian, please correct me if versions do not match)

---- Information for reviewers ----

(yes, I am mostly copy-pasting this :P)

Goal of this thread is to detect if there are any outstanding
issues that need to fixed before formal "yes"/"no" voting
happens. If no critical objections will arise, voting will begin
starting with a next week. Otherwise it depends on time module author needs to implement suggestions.

Please take this part seriously: "If you identify problems along the
way, please note if they are minor, serious, or showstoppers."
(http://wiki.dlang.org/Review/Process). This information later
will be used to determine if library is ready for voting.

If there are any frequent Phobos contributors / core developers
please pay extra attention to submission code style and fitting
into overall Phobos guidelines and structure.

Most important goal of this review is to determine any API / design problems. Any internal implementation tweaks may happen after inclusion to Phobos but it is important to assure that no breaking changes will be required any time soon after module will get wider usage.

---- Information request from module author ----

Performance was a major discussed topic in previous thread. Could you please provide benchmarking data for version currently ongoing the review?
September 11, 2013
On Wednesday, 11 September 2013 at 15:02:00 UTC, Dicebot wrote:
> std.d.lexer is standard module for lexing D code, written by Brian Schott

I remember reading there were some interesting hash-advances in dmd recently.

http://forum.dlang.org/thread/kq7ov0$2o8n$1@digitalmars.com?page=1

maybe it's worth benchmarking those hashes for std.d.lexer as well.
September 11, 2013
Am Wed, 11 Sep 2013 17:01:58 +0200
schrieb "Dicebot" <public@dicebot.lv>:

> std.d.lexer is standard module for lexing D code, written by Brian Schott
> 

Question / Minor issue:

As we already have a range based interface I'd love to have partial lexing / parsing, especially for IDEs.

Say I have this source code:
--------------------------------------------
1: module a;
2:
3: void test(int a)
4: {
5:     [...]
6: }
7:
8: void test2()
9: [...]
--------------------------------------------

Then I first do a full parse pass over the source. Now line 5 is being edited. I know from the full parse that line 5 is part of a FunctionDeclaration which starts at line 3 and ends at line 6. Now I'd like to re-parse only that part:

--------------------------------------------
FunctionDeclaration decl = document.getDeclByLine(5);
decl.reparse(/*start_line=*/ 3, docBuffer);
--------------------------------------------

I think these are the two critical points related to this for the proposed std.lexer:

* How can I tell the lexer to start lexing at line/character n? Of
  course the input could be sliced, but then line number and position
  information in the Token struct is wrong.
* I guess std.lexer slices the original input? This could make things
  difficult if the file buffer is edited in place. But this can
  probably be dealt with outside of std.lexer. (By reallocating all
  .value members)


(And once this is working, an example in the docs would be great)

But to be honest I'm not sure how important this really is. I think it should help for more responsive IDEs but maybe parsing is not a bottleneck at all?
September 11, 2013
On Wednesday, 11 September 2013 at 15:02:00 UTC, Dicebot wrote:
> std.d.lexer is standard module for lexing D code, written by Brian Schott
>
> Documentation:
> http://hackerpilot.github.io/experimental/std_lexer/phobos/lexer.html

The documentation for Token twice says "measured in ASCII characters or UTF-8 code units", which sounds confusing to me.

Is it UTF-8, which includes ASCII? Then it should not be "or".

This is nitpicking. Overall, I like the proposal. Great work!
September 11, 2013
On 9/11/2013 8:01 AM, Dicebot wrote:
> std.d.lexer is standard module for lexing D code, written by Brian Schott

Thank you, Brian! This is important work.

Not a thorough review, just some notes from reading the doc file:

1. I don't like the _ suffix for keywords. Just call it kwimport or something like that.

2. The example uses an if-then sequence of isBuiltType, isKeyword, etc. Should be an enum so a switch can be done for speed.

3. I assumed TokenType is a type. But it's not, it's an enum. Even the document says it's a 'type', but it's not a type.

4. When naming tokens like .. 'slice', it is giving it a syntactic/semantic name rather than a token name. This would be awkward if .. took on new meanings in D. Calling it 'dotdot' would be clearer. Ditto for the rest. For example that is done better, '*' is called 'star', rather than 'dereference'.

5. The LexerConfig initialization should be a constructor rather than a sequence of assignments. LexerConfig documentation is awfully thin. For example, 'tokenStyle' is explained as being 'Token style', whatever that is.

6. No clue how lookahead works with this. Parsing D requires arbitrary lookahead.

7. uint line; Should indicate that lines start with '1', not '0'. Ditto for columns.

8. 'default_' Again with the awful practice of appending _.

9. Need to insert intra-page navigation links, such as when 'byToken()' appears in the text, it should be link to where byToken is described.


> Goal of this thread is to detect if there are any outstanding
> issues that need to fixed before formal "yes"/"no" voting
> happens. If no critical objections will arise, voting will begin
> starting with a next week. Otherwise it depends on time module author needs to implement suggestions.

I believe the state of the documentation is a showstopper, and needs to be extensively fleshed out before it can be considered ready for voting.


September 11, 2013
The choice of ending token names with underscores was made according to the Phobos style guide.

http://dlang.org/dstyle.html
September 11, 2013
On 9/11/2013 12:10 PM, Brian Schott wrote:
> The choice of ending token names with underscores was made according to the
> Phobos style guide.
>
> http://dlang.org/dstyle.html

I didn't realize that was in the style guide. I guess I can't complain about it, then :-)
September 11, 2013
On 9/11/2013 11:45 AM, qznc wrote:
> On Wednesday, 11 September 2013 at 15:02:00 UTC, Dicebot wrote:
>> std.d.lexer is standard module for lexing D code, written by Brian Schott
>>
>> Documentation:
>> http://hackerpilot.github.io/experimental/std_lexer/phobos/lexer.html
>
> The documentation for Token twice says "measured in ASCII characters or UTF-8
> code units", which sounds confusing to me.
>
> Is it UTF-8, which includes ASCII? Then it should not be "or".

Pedantically, it is just UTF-8 code units, which are a superset of ASCII.

September 11, 2013
On 9/11/2013 11:43 AM, Johannes Pfau wrote:
> But to be honest I'm not sure how important this really is. I think it
> should help for more responsive IDEs but maybe parsing is not a
> bottleneck at all?


It is important, and I'm glad you brought it up. The LexerConfig can provide a spot to put a starting line/column value.
September 11, 2013
Walter Bright wrote:
> Parsing D requires arbitrary lookahead.

Why---and since which version?

-manfred
« First   ‹ Prev
1 2 3 4 5 6 7 8 9 10 11