Thread overview | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
|
December 13, 2013 Lexers (again) | ||||
---|---|---|---|---|
| ||||
I've been working on the next attepmpt at a std.lexer / std.d.lexer recently. You can follow the progress on Github here: https://github.com/Hackerpilot/lexer-work |
December 13, 2013 Re: Lexers (again) | ||||
---|---|---|---|---|
| ||||
Posted in reply to Brian Schott | On Friday, 13 December 2013 at 10:17:49 UTC, Brian Schott wrote:
> I've been working on the next attepmpt at a std.lexer / std.d.lexer recently. You can follow the progress on Github here: https://github.com/Hackerpilot/lexer-work
A problem I noticed was your using ubyte[] at least in the runlexer. Does it work with string and wstring though?
Also why is it required to pass the type to the lexer of the code to pass?
Is there another way to make it easier to use? Or is the only way to wrap the constructor in a templated function?
There also seem to be a lot of generic type method implementations in DLexer that I would expect to be done inside the Lexer super (well template I spose).
All in all looks promising.
|
December 13, 2013 Re: Lexers (again) | ||||
---|---|---|---|---|
| ||||
Posted in reply to Brian Schott | On 12/13/2013 11:17 AM, Brian Schott wrote:
> I've been working on the next attepmpt at a std.lexer / std.d.lexer
> recently. You can follow the progress on Github here:
> https://github.com/Hackerpilot/lexer-work
Looks promising.
I hope that I find some time to work on a completely generic DFA lexer generator (regex based). I found a few papers/had some ideas on how to vectorize the DFA processing to make it fast enough.
|
December 15, 2013 Re: Lexers (again) | ||||
---|---|---|---|---|
| ||||
Posted in reply to Brian Schott | On Friday, 13 December 2013 at 10:17:49 UTC, Brian Schott wrote: > I've been working on the next attepmpt at a std.lexer / std.d.lexer recently. You can follow the progress on Github here: https://github.com/Hackerpilot/lexer-work I've ported DScanner over to this new lexer code. It's on a branch here: https://github.com/Hackerpilot/Dscanner/tree/NewLexer. One limitation I've noticed with the new tok!"tokenName" approach is that while dmd has no problem with case tok!"class": it does have a problem with goto case tok!"class": I managed to work around this by adding new labels and "goto"-ing them instead. Is this a bug or intentional? |
December 15, 2013 Re: Lexers (again) | ||||
---|---|---|---|---|
| ||||
Posted in reply to Brian Schott | On 12/15/2013 12:12 PM, Brian Schott wrote:
> On Friday, 13 December 2013 at 10:17:49 UTC, Brian Schott wrote:
>> I've been working on the next attepmpt at a std.lexer / std.d.lexer
>> recently. You can follow the progress on Github here:
>> https://github.com/Hackerpilot/lexer-work
>
> I've ported DScanner over to this new lexer code. It's on a branch here:
> https://github.com/Hackerpilot/Dscanner/tree/NewLexer.
>
> One limitation I've noticed with the new tok!"tokenName" approach is
> that while dmd has no problem with
>
> case tok!"class":
>
> it does have a problem with
>
> goto case tok!"class":
>
> I managed to work around this by adding new labels and "goto"-ing them
> instead. Is this a bug or intentional?
I cannot reproduce your problem. If this does not work, it is a bug.
|
December 15, 2013 Re: Lexers (again) | ||||
---|---|---|---|---|
| ||||
Posted in reply to Timon Gehr | On 12/15/13 3:45 AM, Timon Gehr wrote:
> On 12/15/2013 12:12 PM, Brian Schott wrote:
>> On Friday, 13 December 2013 at 10:17:49 UTC, Brian Schott wrote:
>>> I've been working on the next attepmpt at a std.lexer / std.d.lexer
>>> recently. You can follow the progress on Github here:
>>> https://github.com/Hackerpilot/lexer-work
>>
>> I've ported DScanner over to this new lexer code. It's on a branch here:
>> https://github.com/Hackerpilot/Dscanner/tree/NewLexer.
>>
>> One limitation I've noticed with the new tok!"tokenName" approach is
>> that while dmd has no problem with
>>
>> case tok!"class":
>>
>> it does have a problem with
>>
>> goto case tok!"class":
>>
>> I managed to work around this by adding new labels and "goto"-ing them
>> instead. Is this a bug or intentional?
>
> I cannot reproduce your problem. If this does not work, it is a bug.
The problem is that tok is a dynamic value. It should be a static value. Current code:
static @property IDType tok(string symbol)()
{
...
}
It should be:
template IDType tok(string symbol)()
{
alias tok = ...;
}
This is important - if the compiler thinks tok is a dynamic value, it'll generate crappy switch statements.
BTW @Brian - I didn't look at this in depth yet but it's very promising work. Thanks!
Andrei
|
December 15, 2013 Re: Lexers (again) | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On 12/15/2013 05:38 PM, Andrei Alexandrescu wrote: >>> >>> One limitation I've noticed with the new tok!"tokenName" approach is >>> that while dmd has no problem with >>> >>> case tok!"class": >>> >>> it does have a problem with >>> >>> goto case tok!"class": >>> >>> I managed to work around this by adding new labels and "goto"-ing them >>> instead. Is this a bug or intentional? >> >> I cannot reproduce your problem. If this does not work, it is a bug. > > The problem is that tok is a dynamic value. It should be a static value. Note that the spec has this to say: http://dlang.org/statement.html#SwitchStatement "Expression is evaluated. The result type T must be of integral type or char[], wchar[] or dchar[]. The result is compared against each of the case expressions. If there is a match, the corresponding case statement is transferred to. The case expressions must all evaluate to a constant value or array, or a runtime initialized const or immutable variable of integral type. They must be implicitly convertible to the type of the switch Expression. Case expressions must all evaluate to distinct values. Const or immutable variables must all have different names. If they share a value, the first case statement with that value gets control. There must be exactly one default statement." Arguably, this is a questionable language design decision that should IMO be revisited anyway, but DMD clearly does not follow the spec here. Also, there is this: "The fourth form, goto case Expression;, transfers to the CaseStatement of the innermost enclosing SwitchStatement with a matching Expression." It does not say anything about what kind of expression is required. |
December 16, 2013 Re: Lexers (again) | ||||
---|---|---|---|---|
| ||||
Posted in reply to Brian Schott | On Friday, 13 December 2013 at 10:17:49 UTC, Brian Schott wrote:
> I've been working on the next attepmpt at a std.lexer / std.d.lexer recently. You can follow the progress on Github here: https://github.com/Hackerpilot/lexer-work
knit picking... but shouldn't:
size_t line() pure nothrow const @property { return _line; }
be more like:
@property size_t line() pure nothrow const { return _line; }
to be consistent with phobos coding style?
/Jonas
|
December 16, 2013 Re: Lexers (again) | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On Sunday, 15 December 2013 at 16:38:15 UTC, Andrei Alexandrescu wrote: > The problem is that tok is a dynamic value. It should be a static value. Current code: This seems to have fixed the case/goto issues. > This is important - if the compiler thinks tok is a dynamic value, it'll generate crappy switch statements. It seems it's hard to keep dmd from generating crappy code even with this fix. I tried it with both LDC and DMD. The code from DMD takes 3.5 times as long to execute. > BTW @Brian - I didn't look at this in depth yet but it's very promising work. Thanks! It's based off of the gist you posted a while back. I'll have to compare this to what you(r team) came up with for Facebook's C++ analyzer. |
Copyright © 1999-2021 by the D Language Foundation