std.d.lexer : voting thread (page 9) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » std.d.lexer : voting thread (page 9)

October 08, 2013

Re: std.d.lexer : voting thread

Posted by Jakob Ovrum
in reply to Andrei Alexandrescu

Jakob Ovrum

Posted in reply to Andrei Alexandrescu

On Tuesday, 8 October 2013 at 04:37:31 UTC, Andrei Alexandrescu wrote:
> So I guess it's your turn.

I was going to cook something up with `groupBy` (taken from the still-open Phobos PR #1186) and `toTypeTuple`(from Phobos PR #1472, also still open!), but the former isn't CTFEable. Blergh. I'm still adamant this is the way to go, but I'm putting away the torch for now.

October 08, 2013

Re: std.d.lexer : voting thread

Posted by Andrei Alexandrescu
in reply to Jakob Ovrum

Andrei Alexandrescu

Posted in reply to Jakob Ovrum

On 10/8/13 7:02 AM, Jakob Ovrum wrote:
> On Tuesday, 8 October 2013 at 04:37:31 UTC, Andrei Alexandrescu wrote:
>> So I guess it's your turn.
>
> I was going to cook something up with `groupBy` (taken from the
> still-open Phobos PR #1186) and `toTypeTuple`(from Phobos PR #1472, also
> still open!), but the former isn't CTFEable. Blergh. I'm still adamant
> this is the way to go, but I'm putting away the torch for now.

Fair enough. (Again, it would be unfair to compare an existing design against a hypothetical one.) I suspect at some point you will need to generate some custom code, which will come as a string that you need to mixin.

But no matter. My most significant bit is, we need a trie lexer generator ONLY from the token strings, no TK_XXX user-provided symbols necessary. If all we need is one language (D) this is a non-issue because the library writer provides the token definitions. If we need to support user-provided languages, having the library manage the string -> small integer mapping becomes essential.

Andrei

October 09, 2013

Re: std.d.lexer : voting thread

Posted by Walter Bright
in reply to Andrei Alexandrescu

Walter Bright

Posted in reply to Andrei Alexandrescu

On 10/4/2013 5:24 PM, Andrei Alexandrescu wrote:
>  [...]

Some points:

1. This is a replacement for the switch statement starting at around line 505 in advance()

https://github.com/Hackerpilot/phobos/blob/9bdb7f97bb8021f3b0d0291896b8fe21a6fead23/std/d/lexer.d
It is not a replacement for the rest of the lexer.

2. Instead of explicit token type enums, such as:

    mod, /// %

it would just be referred to as:

    tok!"%"

Andrei pointed out to me that he has fixed the latter so it resolves to a small integer - meaning it works efficiently as cases in switch statements. This removes my primary objection to it.

3. This level of abstraction combined with efficient generation cannot be currently done in any other language. Hence, it makes for a sweet showcase of what D can do.

Hence, I think we ought to adapt Brian's lexer by replacing the switch with Andrei's trie searcher, and replacing the enum TokenType with the tok!"string" syntax.

October 09, 2013

Re: std.d.lexer : voting thread

Posted by Brad Anderson
in reply to Walter Bright

Brad Anderson

Posted in reply to Walter Bright

On Wednesday, 9 October 2013 at 01:27:22 UTC, Walter Bright wrote:
> On 10/4/2013 5:24 PM, Andrei Alexandrescu wrote:
>> [...]
>
> Some points:
>
> 1. This is a replacement for the switch statement starting at around line 505 in advance()
>
> https://github.com/Hackerpilot/phobos/blob/9bdb7f97bb8021f3b0d0291896b8fe21a6fead23/std/d/lexer.d

Github tip: You can link to a specific line by clicking the line number and copying and pasting your new URL.

October 09, 2013

Re: std.d.lexer : voting thread

Posted by Andrei Alexandrescu
in reply to Walter Bright

Andrei Alexandrescu

Posted in reply to Walter Bright

On 10/8/13 6:26 PM, Walter Bright wrote:
> On 10/4/2013 5:24 PM, Andrei Alexandrescu wrote:
>>  [...]
>
> Some points:
>
> 1. This is a replacement for the switch statement starting at around
> line 505 in advance()
>
> https://github.com/Hackerpilot/phobos/blob/9bdb7f97bb8021f3b0d0291896b8fe21a6fead23/std/d/lexer.d
>
> It is not a replacement for the rest of the lexer.
>
> 2. Instead of explicit token type enums, such as:
>
>      mod, /// %
>
> it would just be referred to as:
>
>      tok!"%"
>
> Andrei pointed out to me that he has fixed the latter so it resolves to
> a small integer - meaning it works efficiently as cases in switch
> statements. This removes my primary objection to it.
>
> 3. This level of abstraction combined with efficient generation cannot
> be currently done in any other language. Hence, it makes for a sweet
> showcase of what D can do.
>
> Hence, I think we ought to adapt Brian's lexer by replacing the switch
> with Andrei's trie searcher, and replacing the enum TokenType with the
> tok!"string" syntax.

Thanks, that's exactly what I had in mind. Also the trie searcher should be exposed by the library so people can implement other languages.

Let me make another, more strategic, point. Projects like Rust and Go have dozens of people getting paid to work on them. In the time it takes us to crank one conventional lexer/parser for a language, they can crank five. The answer is we can't win with a conventional approach. We must leverage D's strengths to amplify our speed of execution, and in this context an integrated generic lexer generator is the ticket.

There is one thing I neglected to mention, and I apologize for that. Coming with this all on the eve of voting must be quite demotivating for Brian, who's been through all the arduous steps to get his work to production quality. I hope the compensating factor is that the proposed change is a net positive for the greater good.

Andrei

October 09, 2013

Re: std.d.lexer : voting thread

Posted by deadalnix
in reply to Andrei Alexandrescu

deadalnix

Posted in reply to Andrei Alexandrescu

On Wednesday, 9 October 2013 at 03:55:42 UTC, Andrei Alexandrescu wrote:
> On 10/8/13 6:26 PM, Walter Bright wrote:
>> On 10/4/2013 5:24 PM, Andrei Alexandrescu wrote:
>>> [...]
>>
>> Some points:
>>
>> 1. This is a replacement for the switch statement starting at around
>> line 505 in advance()
>>
>> https://github.com/Hackerpilot/phobos/blob/9bdb7f97bb8021f3b0d0291896b8fe21a6fead23/std/d/lexer.d
>>
>> It is not a replacement for the rest of the lexer.
>>
>> 2. Instead of explicit token type enums, such as:
>>
>>     mod, /// %
>>
>> it would just be referred to as:
>>
>>     tok!"%"
>>
>> Andrei pointed out to me that he has fixed the latter so it resolves to
>> a small integer - meaning it works efficiently as cases in switch
>> statements. This removes my primary objection to it.
>>
>> 3. This level of abstraction combined with efficient generation cannot
>> be currently done in any other language. Hence, it makes for a sweet
>> showcase of what D can do.
>>
>> Hence, I think we ought to adapt Brian's lexer by replacing the switch
>> with Andrei's trie searcher, and replacing the enum TokenType with the
>> tok!"string" syntax.
>
> Thanks, that's exactly what I had in mind. Also the trie searcher should be exposed by the library so people can implement other languages.
>
> Let me make another, more strategic, point. Projects like Rust and Go have dozens of people getting paid to work on them. In the time it takes us to crank one conventional lexer/parser for a language, they can crank five. The answer is we can't win with a conventional approach. We must leverage D's strengths to amplify our speed of execution, and in this context an integrated generic lexer generator is the ticket.
>
> There is one thing I neglected to mention, and I apologize for that. Coming with this all on the eve of voting must be quite demotivating for Brian, who's been through all the arduous steps to get his work to production quality. I hope the compensating factor is that the proposed change is a net positive for the greater good.
>
>
> Andrei

Overall, I think this is going into the right direction. However, there is one thing I don't like with that design.

When you go throw the big switch of death, you match the beginning of the string and then you go back to a function that will test where does it come from and act accordingly. That is kind of wasteful.

What SDC does is that it calls a function-template with the part matched by the big switch of death passed as template argument. The nice thing about it is that it is easy to trnsform this compile time argument into a runtime one by simply forwarding it (what is done to parse identifier that begins by a keyword for instance).

October 09, 2013

Re: std.d.lexer : voting thread

Posted by Brian Schott
in reply to Andrei Alexandrescu

Brian Schott

Posted in reply to Andrei Alexandrescu

On Wednesday, 9 October 2013 at 03:55:42 UTC, Andrei Alexandrescu wrote:
> for the greater good.

YOU CALL YOURSELVES A COMMUNITY THAT CARES?

http://www.youtube.com/watch?v=yUpbOliTHJY

October 09, 2013

Re: std.d.lexer : voting thread

Posted by Andrei Alexandrescu
in reply to Brian Schott

Andrei Alexandrescu

Posted in reply to Brian Schott

On 10/8/13 9:33 PM, Brian Schott wrote:
> On Wednesday, 9 October 2013 at 03:55:42 UTC, Andrei Alexandrescu wrote:
>> for the greater good.
>
> YOU CALL YOURSELVES A COMMUNITY THAT CARES?
>
> http://www.youtube.com/watch?v=yUpbOliTHJY

I swear I had that in mind when I wrote "the greater good". Awesome movie, and quite fit for the situation :o).

Andrei

October 09, 2013

Re: std.d.lexer : voting thread

Posted by Andrei Alexandrescu
in reply to deadalnix

Andrei Alexandrescu

Posted in reply to deadalnix

On 10/8/13 9:32 PM, deadalnix wrote:
> Overall, I think this is going into the right direction. However, there
> is one thing I don't like with that design.
>
> When you go throw the big switch of death, you match the beginning of
> the string and then you go back to a function that will test where does
> it come from and act accordingly. That is kind of wasteful.
>
> What SDC does is that it calls a function-template with the part matched
> by the big switch of death passed as template argument. The nice thing
> about it is that it is easy to trnsform this compile time argument into
> a runtime one by simply forwarding it (what is done to parse identifier
> that begins by a keyword for instance).

I think a bit of code would make all that much clearer.

Andrei

October 09, 2013

Re: std.d.lexer : voting thread

Posted by deadalnix
in reply to Andrei Alexandrescu

deadalnix

Posted in reply to Andrei Alexandrescu

On Wednesday, 9 October 2013 at 04:38:02 UTC, Andrei Alexandrescu wrote:
> On 10/8/13 9:32 PM, deadalnix wrote:
>> Overall, I think this is going into the right direction. However, there
>> is one thing I don't like with that design.
>>
>> When you go throw the big switch of death, you match the beginning of
>> the string and then you go back to a function that will test where does
>> it come from and act accordingly. That is kind of wasteful.
>>
>> What SDC does is that it calls a function-template with the part matched
>> by the big switch of death passed as template argument. The nice thing
>> about it is that it is easy to trnsform this compile time argument into
>> a runtime one by simply forwarding it (what is done to parse identifier
>> that begins by a keyword for instance).
>
> I think a bit of code would make all that much clearer.
>
> Andrei

Sure.

So here is the lexer generation infos (this can be simplified by using the tok!"foobar" thing) : http://dpaste.dzfl.pl/7ec225ee

Using theses infos, a huge switch based boilerplate is generated. Each "leaf" of the huge switch tree call a function template as follow, by passing as template argument what has been matched so far. You can then proceed as follow :
http://dpaste.dzfl.pl/f2f0d22c

You may wonder about the "?lexComment". The boilerplate generator understand ? as an indication that lexComment may or may not return a token (depending on lexer configuration) and generate what is needed to handle that (by testing if the function return a token, via some static ifs).

You obviously ends up with a log of instance of lexIdentifier(string s)(), but this simply forward to lexIdentifier()(string s) and the forwarding function is removed trivially by the inliner.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation