Jump to page: 1 25  
Page
Thread overview
Formal review of std.lexer
Feb 21, 2014
Dicebot
Feb 21, 2014
Vladimir Panteleev
Feb 21, 2014
Joseph Cassman
Feb 21, 2014
Andrej Mitrovic
Feb 22, 2014
Marc Schütz
Mar 16, 2014
Martin Nowak
Apr 14, 2014
Brian Schott
May 05, 2014
Dmitry Olshansky
Feb 21, 2014
Andrej Mitrovic
Feb 21, 2014
Brian Schott
Feb 21, 2014
Jacob Carlborg
Feb 21, 2014
Dicebot
Feb 21, 2014
Brian Schott
Feb 24, 2014
Dicebot
Feb 26, 2014
Brian Schott
Feb 26, 2014
Dicebot
Feb 24, 2014
Dejan Lekic
Feb 24, 2014
Meta
Feb 24, 2014
Adam Wilson
Feb 24, 2014
Meta
Feb 25, 2014
Adam Wilson
Feb 25, 2014
Meta
Feb 25, 2014
Adam Wilson
Feb 25, 2014
H. S. Teoh
Feb 25, 2014
Dicebot
Feb 26, 2014
Brian Schott
Feb 25, 2014
Jacob Carlborg
Feb 25, 2014
Meta
Feb 26, 2014
Brian Schott
Feb 26, 2014
Jacob Carlborg
Feb 26, 2014
Jacob Carlborg
Feb 25, 2014
Dicebot
Feb 26, 2014
Jacob Carlborg
Mar 10, 2014
Dicebot
Mar 03, 2014
Dicebot
Mar 10, 2014
Dicebot
Mar 16, 2014
Dicebot
Apr 14, 2014
Alix Pexton
Apr 14, 2014
Brian Schott
Apr 15, 2014
Alix Pexton
February 21, 2014
http://wiki.dlang.org/Review/std.lexer

This is follow-up by Brian to his earlier proposal (http://wiki.dlang.org/Review/std.d.lexer). This time proposed module focuses instead on generic lexer generation as discussed in matching voting thread.

Docs: http://hackerpilot.github.io/experimental/std_lexer/phobos/lexer.html
Code: https://github.com/Hackerpilot/Dscanner/blob/master/stdx/lexer.d

Example topics to evaluate during review:
 - Feasibility of overall design and concept
 - Quality of documentation
 - Consistency with existing Phobos modules
 - Overall quality of implementation

Initial review will end on March the 8th.
February 21, 2014
On Friday, 21 February 2014 at 12:12:17 UTC, Dicebot wrote:
> http://wiki.dlang.org/Review/std.lexer

First of all, thank you for the great work. This is a very important project.

I'll begin with reviewing the documentation.

> Summary

Some simple explanation of the terminology and concepts would be nice. At least a link to Wikipedia.

> Create the string array costants for your language.

Typo ("costants")

> Examples:

An inline complete example of a very simple language would be nice.

> A lexer for D is available here.

Although good to have, this is too much to take in all at once, for documentation purposes.

> A lexer for Lua is available here.

Nary a comment in sight. This might serve as the example lexer if only it was better commented. The comments can be copy-pasted from the module documentation, even that would make the code much easier to grok.

> Template Parameter Definitions

What does this mean? Parameters to what template?

Can this section be moved to inside the documentation of Lexer, and Lexer be moved to the first documented symbol in the file?

> A function that serves as the default token lexing function. For most languages this will be the identifier lexing function.

Should the function signature and contracts be explained here?

> This function must return bool and take a single size_t argument representing the number of bytes to skip over before looking for a separating character.

I think it's better to describe the signature in D syntax rather than English.

> A listing of tokens whose value is variable, such as whitespace, identifiers, number literals, and string literals.

No mention of how the list is represented (is it an array? what type of elements should the array have? how are the array values used?), the reader is left to figure that out from the example below.

> Template for determining the type used for a token type. Selects the smallest unsigned integral type that is able to hold the value staticTokens.length + dynamicTokens.length + possibleDefaultTokens.length. For example if there are 20 static tokens, 30 dynamic tokens, and 10 possible default tokens, this template will alias itself to ubyte, as 20 + 30 + 10 < ubyte.max.

Should this be documented? I understand that this will be instantiated only once, by std.lexer.

Utility declarations should preferably be at the end of the module, so that they appear last in the documentation.

> Generates the token type identifier for the given symbol. There are two special cases:

Are these magic constants necessary? Why not declare them as enums?

>  In all cases this template will alias itself to a constant of type IdType. This template will fail at compile time if symbol is not one of the staticTokens, dynamicTokens, or possibleDefaultTokens.

Style nit: D symbols should be wrapped in the $D(...) macro.

> == overload for the the token type.

Is it really necessary to document opEquals?

But since it's here: how does it interact with extraFields?

> The Column number at which this token occurs.

There was a lot of bikeshedding regarding the correct terminology to use when adding -vcolumns to DMD ( https://github.com/D-Programming-Language/dmd/pull/3077 ). I think the documentation should at least mention what exactly it is counting.

> A function that serves as the default token lexing function. For most languages this will be the identifier lexing function.

What should the function's name be? How will it interact with Lexer? (It's not clear that this refers to the defaultTokenFunction parameter, especially after the previous list item, popFront, is a different piece of the puzzle.)

The documentation for Lexer's arguments seem to be thrown all around the module. I suggest to document them only once, all in Lexer's DDoc, add example signatures, and move Lexer to the top of the module.

> Examples:
> struct CalculatorLexer

I think this should be expanded into a full, well-documented example featured in the module DDoc.

> _popFront();

Where did that come from? Perhaps you meant Lexer's DDoc to say "which should call this mixin's _popFront()", and the DDoc escaped the _ character? If so, why not use a named mixin to disambiguate instead of _?

> struct LexerRange;
> struct StringCache;

These are thoroughly documented, but will they be used by anything other than std.lexer?
February 21, 2014
On Friday, 21 February 2014 at 12:12:17 UTC, Dicebot wrote:
> http://wiki.dlang.org/Review/std.lexer
>
> This is follow-up by Brian to his earlier proposal (http://wiki.dlang.org/Review/std.d.lexer). This time proposed module focuses instead on generic lexer generation as discussed in matching voting thread.
>
> Docs: http://hackerpilot.github.io/experimental/std_lexer/phobos/lexer.html
> Code: https://github.com/Hackerpilot/Dscanner/blob/master/stdx/lexer.d

Thanks for all the work Brian. Read through the previous threads about the development of this code (links at the bottom) and I can see a lot of effort has gone into it. So the following comments may come across as uninformed, but hopefully they will be helpful.

1. StringCache is a custom hash table. It looks like it's primary role is to reduce some sort of duplication. Hash tables, though, are difficult to get right. So perhaps could a benchmark comparison be made against the built-in HT to show what savings it brings? Since it is in the public interface should its payload also be public? Although it is built using GC.malloc how about the in-the-works std.allocator module? Perhaps a version 1 could use GC.malloc but if a later PR could make it possible to use a custom allocator that would be nice.

2. I like the fact that a range interface is provided. I realize that the previous discussions stipulated the use of ubyte to avoid encoding work during scanning. The reasoning about performance makes sense to me. That being the case, could a code example be provided showing how to use this module to scan a UTF-8 encoded string? Even if this is going to focus only on scanning code files, the D language spec allows for arbitrary Unicode in a code file. How is this possible? (I have a general idea, just looking for some explicit code sample help).

3. I tried to understand the reason for and usage of the "extraFields" parameter in "TokenStructure" but couldn't figure it out. Could some more explanation of its intent and usage be provided?

4. Do you want the commented-out pragma statement left over on line 601?

5. Should the template "TokenId" perhaps be something like "generateTokenId" instead? I am not sure what an "Id" for a token means. Is it an integral hash value? Had difficulty seeing how it ties in with the concept of "value" in the header documentation. If this is a numerical hash of a string token, why is the string still stored and used in "tokenStringRepresentation"? I probably am missing something big but couldn't the number be used to represent the string everywhere, saving on time and space?

6. I tried but had difficulty understanding the difference between the four token types -- "staticTokens", "dynamicTokens", "possibleDefaultTokens", "tokenHandlers" -- provided as arguments to "Lexer". What is a token that has a value that changes versus a token that does not change? I am not sure where to put my token definitions.

7. Just thinking about using the module and I would like to use it to make a scanner for xml, json, csv, c/c++, etc. I wasn't able to figure out how to do so, however. The initial code example is nice. But could some additional guidance be provided? Also, I wasn't sure how to make use of a lexer once created. The documentation focuses well on how to initialize a "Lexer" but could some guidance also be provided on how to use one past initialization?

8. Andrei's trie search (http://forum.dlang.org/thread/eeenynxifropasqcufdg@forum.dlang.org?page=4#post-l2nm7m:2416e1:241:40digitalmars.com) seemed like a really interesting idea. And I saw in that thread you continued with his ideas. Does this module incorporate that work? Or was it less performant in the end?

9. I ran "dmd -cov" against the module and got zero percent unit test coverage. Perhaps adding some test code will help clarify usage patterns?

You have put a lot of work into this code so I apologize if the above comes across as picking it apart. Just some questions I had in trying to make use of the code. Hopefully some of it is helpful.

Joseph

Other related posts
http://forum.dlang.org/thread/jsnhlcbulwyjuqcqoepe@forum.dlang.org
http://forum.dlang.org/thread/dpdgcycrgfspcxenzrjf@forum.dlang.org
http://forum.dlang.org/thread/eeenynxifropasqcufdg@forum.dlang.org

February 21, 2014
On 2/21/14, 2:12 PM, Dicebot wrote:
> http://wiki.dlang.org/Review/std.lexer
>
> This is follow-up by Brian to his earlier proposal
> (http://wiki.dlang.org/Review/std.d.lexer). This time proposed module
> focuses instead on generic lexer generation as discussed in matching
> voting thread.
>
> Docs: http://hackerpilot.github.io/experimental/std_lexer/phobos/lexer.html
> Code: https://github.com/Hackerpilot/Dscanner/blob/master/stdx/lexer.d
>
> Example topics to evaluate during review:
>   - Feasibility of overall design and concept
>   - Quality of documentation
>   - Consistency with existing Phobos modules
>   - Overall quality of implementation
>
> Initial review will end on March the 8th.

Can we please defer this by one week?

Thanks,

Andrei

February 21, 2014
On 2014-02-21 18:06, Andrei Alexandrescu wrote:

> Can we please defer this by one week?

Just make the review period one week longer.

-- 
/Jacob Carlborg
February 21, 2014
On Friday, 21 February 2014 at 17:15:41 UTC, Jacob Carlborg wrote:
> On 2014-02-21 18:06, Andrei Alexandrescu wrote:
>
>> Can we please defer this by one week?
>
> Just make the review period one week longer.

This. There is no real rationale behind existing default review period so extending it until March the 15th won't cause any issues.
February 21, 2014
Does this mean that you're finally getting approval to release your lexer generator?

On Friday, 21 February 2014 at 17:06:23 UTC, Andrei Alexandrescu wrote:
> Can we please defer this by one week?
>
> Thanks,
>
> Andrei

February 21, 2014
On 2/21/14, Joseph Cassman <jc7919@outlook.com> wrote:
> 1. StringCache is a custom hash table. It looks like it's primary role is to reduce some sort of duplication.

It's used for string interning.

> 3. I tried to understand the reason for and usage of the "extraFields" parameter in "TokenStructure" but couldn't figure it out. Could some more explanation of its intent and usage be provided?

You could used it to inject a toString method. Here's what I did when I used it:

alias Token = TokenStructure!(TokID,
q{
    /// Better string representation
    void toString(scope void delegate(const(char)[]) sink) const
    {
        import std.conv;
        import %s;

        sink("(");
        sink(this.line.to!string);
        sink(":");
        sink(this.column.to!string);
        sink(")");
        sink(": ");
        sink(this.text ? this.text : tokToString(this.type));
    }
}.format(__MODULE__)
);

Side-note: Note how I've had to inject an import statement to my own module because the string is mixed-in at the declaration site of a module which I can't modify, where IIRC tokToString didn't exist because it's located in *my* module. It's interesting how you can use this feature in D, IOW:

-----
module mymodule;
import other_module;
void foo() { }
mixin_elsewhere!("import mymodule; foo();");
-----

-----
module other_module;
void mixin_elsewhere(string str)()
{
    mixin(str);
}
-----

P.S. the DLexer links here: https://github.com/Hackerpilot/lexer-demo/

I've used it to learn how to use DLexer, here's my more-documented version of the above if it helps anyone, which also takes the D parser's whitespace function and uses that in order to track column numbers, it injects a toString like I've mentioned, and adds some static asserts for sanity checks:

https://github.com/AndrejMitrovic/lexer-demo
February 21, 2014
On 2/21/14, Andrej Mitrovic <andrej.mitrovich@gmail.com> wrote:
> I've used it to learn how to use DLexer, here's my more-documented version of the above if it helps anyone, which also takes the D parser's whitespace function and uses that in order to track column numbers, it injects a toString like I've mentioned, and adds some static asserts for sanity checks:
>
> https://github.com/AndrejMitrovic/lexer-demo

Hmm, maybe I should make a pull request upstream?
February 21, 2014
On 2/21/14, 9:51 PM, Brian Schott wrote:
> Does this mean that you're finally getting approval to release your
> lexer generator?

Affirmative. It'll happen Real Soon Now(tm).

Andrei

« First   ‹ Prev
1 2 3 4 5