Let's stop parser Hell (page 4) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Let's stop parser Hell (page 4)

July 05, 2012

Re: Let's stop parser Hell

Posted by Dmitry Olshansky
in reply to Denis Shelomovskij

Dmitry Olshansky

Posted in reply to Denis Shelomovskij

On 05-Jul-12 22:23, Denis Shelomovskij wrote:
> 05.07.2012 20:28, Jonathan M Davis пишет:
>> It's all too common for someone to suggest that we should
>> do something or implement something without ever attempting to do it
>> themselves, and in general, stuff around here gets done because
>> someone really
>> wants it done, takes the time to do it, and sees it through until its
>> done and
>> in Phobos.
>
> I didn't want for someone to code anything at all! I on the contrary
> want them to stop writing parsers because it results in the only
> consequence: one have to spend more time to find a better parser.
>

It doesn't work like that in OpenSource. No matter what you do people keep writing code ;)

-- 
Dmitry Olshansky

July 05, 2012

Re: Let's stop parser Hell

Posted by Roman D. Boiko
in reply to Dmitry Olshansky

Roman D. Boiko

Posted in reply to Dmitry Olshansky

On Thursday, 5 July 2012 at 18:33:50 UTC, Dmitry Olshansky wrote:
> Count me as interested.
> CTFE needs more correctness & speed though. So to put it blantly - no it's not possible right NOW.
> BUT it doesn't prevent us from planing and doing a proof of concept. Pegged seems a good starting point even if we end up re-writing it from scratch.

IMO, first priority is to make code generated by Pegged work fast, and second priority - make code generation itself fast.

July 05, 2012

Re: Let's stop parser Hell

Posted by Andrei Alexandrescu
in reply to Dmitry Olshansky

Andrei Alexandrescu

Posted in reply to Dmitry Olshansky

On 7/5/12 2:33 PM, Dmitry Olshansky wrote:
> On 05-Jul-12 22:22, Andrei Alexandrescu wrote:
>> I'd really want to create a task force on this, it is of strategic
>> importance to D. In Walter's own words, no new feature is going to push
>> us forward since we're not really using the great goodies we've got, and
>> CTFE technology is the most important.
>>
>
> Count me as interested.
> CTFE needs more correctness & speed though. So to put it blantly - no
> it's not possible right NOW.
> BUT it doesn't prevent us from planing and doing a proof of concept.
> Pegged seems a good starting point even if we end up re-writing it from
> scratch.

Excellent point. Walter is 100% behind the general strategy and we can bring him to work on specific bug reports and enhancement requests if we make a case they are important for Pegged.

>> I also am actively opposed to a project of just translating D's
>> front-end to D and dropping it into Phobos because it would smother (a)
>> work on generic parser generators, and (b) strong, dependable
>> formalization of D's syntax.
>>
>
> Well put. It shouldn't stop people from doing parsers, IMO the more the
> merrier.

Exactly.


Andrei

July 05, 2012

Re: Let's stop parser Hell

Posted by Andrei Alexandrescu
in reply to Roman D. Boiko

Andrei Alexandrescu

Posted in reply to Roman D. Boiko

On 7/5/12 2:38 PM, Roman D. Boiko wrote:
> On Thursday, 5 July 2012 at 18:33:50 UTC, Dmitry Olshansky wrote:
>> Count me as interested.
>> CTFE needs more correctness & speed though. So to put it blantly - no
>> it's not possible right NOW.
>> BUT it doesn't prevent us from planing and doing a proof of concept.
>> Pegged seems a good starting point even if we end up re-writing it
>> from scratch.
>
> IMO, first priority is to make code generated by Pegged work fast, and
> second priority - make code generation itself fast.

Well I'd say there are things like supporting large, realistic grammars without blowing up or taking long compilation times is the first priority.

Andrei

July 05, 2012

Re: Let's stop parser Hell

Posted by Philippe Sigaud
in reply to Andrei Alexandrescu

Philippe Sigaud

Posted in reply to Andrei Alexandrescu

On Thu, Jul 5, 2012 at 8:28 PM, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:

> I'll be glad to buy for you any book you might feel you need for this. Again, there are few things more important for D right now than exploiting its unmatched-by-competition features to great ends. I don't want the lack of educational material to hold you down. Please continue working on this and let me know of what you need.

That's nice of you, if a bit extreme for a public mailing list :)
Andrei, money is no problem :)
Interest in the field of parsing is no problem.
Interest in D future is no problem.
Having a demanding job, and three children, is a problem. No, scratch
that, you know what I mean.

But hey, Roman is doing interesting things on keyword parsing right now, and changing the parser generated by Pegged is not difficult. We will see where this thread lead. (Roman, you should send your results here, because I'm still taken aback by the built-in AA speed compared to linear array look-up for 100 keywords).

As Dmitry said, we might hit a CTFE wall: memory consumption,
computation speed, ...
(*channelling Andrei*: then we will correct whatever makes CTFE a
problem. Right)

Philippe

(Hesitating between 'The Art of the Metaobject Protocol' and 'Compilers, Techniques and Tools', right now)

July 05, 2012

Re: Let's stop parser Hell

Posted by Jacob Carlborg
in reply to Andrei Alexandrescu

Jacob Carlborg

Posted in reply to Andrei Alexandrescu

On 2012-07-05 20:22, Andrei Alexandrescu wrote:
> I also am actively opposed to a project of just translating D's
> front-end to D and dropping it into Phobos because it would smother (a)
> work on generic parser generators, and (b) strong, dependable
> formalization of D's syntax.

I don't see why you are against this. I'm seeing this as basically the only change for DMD every being written in D. Sure I would much rather have a completely new compiler written in D, with a module approach and designed from scratch to be used as a library, but I'm trying to realistic here.

-- 
/Jacob Carlborg

July 05, 2012

Re: Let's stop parser Hell

Posted by Tobias Pankrath
in reply to Philippe Sigaud

Tobias Pankrath

Posted in reply to Philippe Sigaud

On Thursday, 5 July 2012 at 18:17:06 UTC, Philippe Sigaud wrote:
> As a parser proper, Pegged is awful :-) Nothing I'm ashamed of, as I learn
> by coding. Hey, I just received the Dragon Book (International Edition),

If you are interested in parsing, than I wouldn't recommend the dragon book, but
this one http://dickgrune.com/Books/PTAPG_2nd_Edition/Additional.html

It really is an awesome book.

July 05, 2012

Re: Let's stop parser Hell

Posted by Roman D. Boiko
in reply to Philippe Sigaud

Roman D. Boiko

Posted in reply to Philippe Sigaud

On Thursday, 5 July 2012 at 19:54:39 UTC, Philippe Sigaud wrote:
> On Thu, Jul 5, 2012 at 8:28 PM, Andrei Alexandrescu
> <SeeWebsiteForEmail@erdani.org> wrote:
>
>> I'll be glad to buy for you any book you might feel you need for this.
>> Again, there are few things more important for D right now than exploiting
>> its unmatched-by-competition features to great ends. I don't want the lack
>> of educational material to hold you down. Please continue working on this
>> and let me know of what you need.
>
> That's nice of you, if a bit extreme for a public mailing list :)
> Andrei, money is no problem :)
> Interest in the field of parsing is no problem.
> Interest in D future is no problem.
> Having a demanding job, and three children, is a problem. No, scratch
> that, you know what I mean.

I have four, from 1 to 7 years old... Wouldn't call them a problem, though :)))

> But hey, Roman is doing interesting things on keyword parsing right
> now, and changing the parser generated by Pegged is not difficult. We
> will see where this thread lead. (Roman, you should send your results
> here, because I'm still taken aback by the built-in AA speed compared
> to linear array look-up for 100 keywords).

Well, I wouldn't call those "results" yet. Just some benchmarks. Here they are:

isKeyword_Dummy (baseline): 427 [microsec] total, 28 [nanosec / lookup].
isKeyword_Dictionary: 1190 [microsec] total, 214 [nanosec / lookup].
isKeyword_SwitchByLengthThenByChar: 466 [microsec] total, 83 [nanosec / lookup].
isKeyword_BinaryArrayLookup: 2722 [microsec] total, 490 [nanosec / lookup].
isKeyword_LinearArrayLookup: 13822 [microsec] total, 2490 [nanosec / lookup].
isKeyword_UnicodeTrie: 1317 [microsec] total, 237 [nanosec / lookup].
isKeyword_UnicodeTrieBoolLookup: 1072 [microsec] total, 193 [nanosec / lookup].
Total: 22949 identifiers + 5551 keywords.

isKeyword_Dummy (baseline): 2738 [microsec] total, 50 [nanosec / lookup].
isKeyword_Dictionary: 4247 [microsec] total, 242 [nanosec / lookup].
isKeyword_SwitchByLengthThenByChar: 1593 [microsec] total, 91 [nanosec / lookup].
isKeyword_BinaryArrayLookup: 14351 [microsec] total, 820 [nanosec / lookup].
isKeyword_LinearArrayLookup: 59564 [microsec] total, 3405 [nanosec / lookup].
isKeyword_UnicodeTrie: 4167 [microsec] total, 238 [nanosec / lookup].
isKeyword_UnicodeTrieBoolLookup: 3466 [microsec] total, 198 [nanosec / lookup].
Total: 104183 identifiers + 17488 keywords.

> As Dmitry said, we might hit a CTFE wall: memory consumption,
> computation speed, ...
> (*channelling Andrei*: then we will correct whatever makes CTFE a
> problem. Right)
>
> Philippe
>
> (Hesitating between 'The Art of the Metaobject Protocol' and
> 'Compilers, Techniques and Tools', right now)

July 05, 2012

Re: Let's stop parser Hell

Posted by Roman D. Boiko
in reply to Roman D. Boiko

Roman D. Boiko

Posted in reply to Roman D. Boiko

> isKeyword_Dummy (baseline): 2738 [microsec] total, 50 [ns / lookup].
This one calculates a sum of all identifier code units. Included for comparison.

> isKeyword_Dictionary: 4247 [microsec] total, 242 [ns / lookup].
Check whether an identifier is a keyword using AA (dictionary) lookup.

> isKeyword_SwitchByLengthThenByChar: 1593 [microsec] total, 91 [ns / lookup].
Switch by identifier length at the top, then recursively switch by each character. Clearly a winner, but I think it can be improved further.

> isKeyword_BinaryArrayLookup: 14351 [microsec] total, 820 [ns / lookup].
Binary search in an ordered array of keywords.

> isKeyword_LinearArrayLookup: 59564 [microsec] total, 3405 [ns / lookup].
Ditto, search is linear.

> isKeyword_UnicodeTrie: 4167 [microsec] total, 238 [ns / lookup].
Lookup a keyword in a trie, created by Dmitry. This will be improved.

> isKeyword_UnicodeTrieBoolLookup: 3466 [microsec] total, 198 [ns / lookup].
The same, but only determines whether an identifier is a keyword, not which exactly.

> Total: 104183 identifiers + 17488 keywords.
Analyzed the largest Phobos file (DateTime? I didn't check the name.) Results are consistent for other files, too.

July 05, 2012

Re: Let's stop parser Hell

Posted by Dmitry Olshansky
in reply to Roman D. Boiko

Dmitry Olshansky

Posted in reply to Roman D. Boiko

On 06-Jul-12 00:16, Roman D. Boiko wrote:
>> isKeyword_Dummy (baseline): 2738 [microsec] total, 50 [ns / lookup].
> This one calculates a sum of all identifier code units. Included for
> comparison.
>
>> isKeyword_Dictionary: 4247 [microsec] total, 242 [ns / lookup].
> Check whether an identifier is a keyword using AA (dictionary) lookup.
>
>> isKeyword_SwitchByLengthThenByChar: 1593 [microsec] total, 91 [ns /
>> lookup].
> Switch by identifier length at the top, then recursively switch by each
> character. Clearly a winner, but I think it can be improved further.
>
I'd stress the fact that it's a fully unrolled & hard-coded
switch that takes a lot of pages (~72Kbytes). It's easily be perfect.
Sorry, couldn't resist ;)
And I'm not sure how much it could be optimized maybe some measly 10-30%.

>> Total: 104183 identifiers + 17488 keywords.
> Analyzed the largest Phobos file (DateTime? I didn't check the name.)
> Results are consistent for other files, too.
It is std.datetime as I've been running this benchmark against it and seen the same numbers :)

-- 
Dmitry Olshansky

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation