View mode: basic / threaded / horizontal-split · Log in · Help
July 05, 2012
Re: Let's stop parser Hell
On 05-Jul-12 22:23, Denis Shelomovskij wrote:
> 05.07.2012 20:28, Jonathan M Davis пишет:
>> It's all too common for someone to suggest that we should
>> do something or implement something without ever attempting to do it
>> themselves, and in general, stuff around here gets done because
>> someone really
>> wants it done, takes the time to do it, and sees it through until its
>> done and
>> in Phobos.
>
> I didn't want for someone to code anything at all! I on the contrary
> want them to stop writing parsers because it results in the only
> consequence: one have to spend more time to find a better parser.
>

It doesn't work like that in OpenSource. No matter what you do people 
keep writing code ;)

-- 
Dmitry Olshansky
July 05, 2012
Re: Let's stop parser Hell
On Thursday, 5 July 2012 at 18:33:50 UTC, Dmitry Olshansky wrote:
> Count me as interested.
> CTFE needs more correctness & speed though. So to put it 
> blantly - no it's not possible right NOW.
> BUT it doesn't prevent us from planing and doing a proof of 
> concept. Pegged seems a good starting point even if we end up 
> re-writing it from scratch.

IMO, first priority is to make code generated by Pegged work 
fast, and second priority - make code generation itself fast.
July 05, 2012
Re: Let's stop parser Hell
On 7/5/12 2:33 PM, Dmitry Olshansky wrote:
> On 05-Jul-12 22:22, Andrei Alexandrescu wrote:
>> I'd really want to create a task force on this, it is of strategic
>> importance to D. In Walter's own words, no new feature is going to push
>> us forward since we're not really using the great goodies we've got, and
>> CTFE technology is the most important.
>>
>
> Count me as interested.
> CTFE needs more correctness & speed though. So to put it blantly - no
> it's not possible right NOW.
> BUT it doesn't prevent us from planing and doing a proof of concept.
> Pegged seems a good starting point even if we end up re-writing it from
> scratch.

Excellent point. Walter is 100% behind the general strategy and we can 
bring him to work on specific bug reports and enhancement requests if we 
make a case they are important for Pegged.

>> I also am actively opposed to a project of just translating D's
>> front-end to D and dropping it into Phobos because it would smother (a)
>> work on generic parser generators, and (b) strong, dependable
>> formalization of D's syntax.
>>
>
> Well put. It shouldn't stop people from doing parsers, IMO the more the
> merrier.

Exactly.


Andrei
July 05, 2012
Re: Let's stop parser Hell
On 7/5/12 2:38 PM, Roman D. Boiko wrote:
> On Thursday, 5 July 2012 at 18:33:50 UTC, Dmitry Olshansky wrote:
>> Count me as interested.
>> CTFE needs more correctness & speed though. So to put it blantly - no
>> it's not possible right NOW.
>> BUT it doesn't prevent us from planing and doing a proof of concept.
>> Pegged seems a good starting point even if we end up re-writing it
>> from scratch.
>
> IMO, first priority is to make code generated by Pegged work fast, and
> second priority - make code generation itself fast.

Well I'd say there are things like supporting large, realistic grammars 
without blowing up or taking long compilation times is the first priority.

Andrei
July 05, 2012
Re: Let's stop parser Hell
On Thu, Jul 5, 2012 at 8:28 PM, Andrei Alexandrescu
<SeeWebsiteForEmail@erdani.org> wrote:

> I'll be glad to buy for you any book you might feel you need for this.
> Again, there are few things more important for D right now than exploiting
> its unmatched-by-competition features to great ends. I don't want the lack
> of educational material to hold you down. Please continue working on this
> and let me know of what you need.

That's nice of you, if a bit extreme for a public mailing list :)
Andrei, money is no problem :)
Interest in the field of parsing is no problem.
Interest in D future is no problem.
Having a demanding job, and three children, is a problem. No, scratch
that, you know what I mean.

But hey, Roman is doing interesting things on keyword parsing right
now, and changing the parser generated by Pegged is not difficult. We
will see where this thread lead. (Roman, you should send your results
here, because I'm still taken aback by the built-in AA speed compared
to linear array look-up for 100 keywords).

As Dmitry said, we might hit a CTFE wall: memory consumption,
computation speed, ...
(*channelling Andrei*: then we will correct whatever makes CTFE a
problem. Right)

Philippe

(Hesitating between 'The Art of the Metaobject Protocol' and
'Compilers, Techniques and Tools', right now)
July 05, 2012
Re: Let's stop parser Hell
On 2012-07-05 20:22, Andrei Alexandrescu wrote:
> I also am actively opposed to a project of just translating D's
> front-end to D and dropping it into Phobos because it would smother (a)
> work on generic parser generators, and (b) strong, dependable
> formalization of D's syntax.

I don't see why you are against this. I'm seeing this as basically the 
only change for DMD every being written in D. Sure I would much rather 
have a completely new compiler written in D, with a module approach and 
designed from scratch to be used as a library, but I'm trying to 
realistic here.

-- 
/Jacob Carlborg
July 05, 2012
Re: Let's stop parser Hell
On Thursday, 5 July 2012 at 18:17:06 UTC, Philippe Sigaud wrote:
> As a parser proper, Pegged is awful :-) Nothing I'm ashamed of, 
> as I learn
> by coding. Hey, I just received the Dragon Book (International 
> Edition),

If you are interested in parsing, than I wouldn't recommend the 
dragon book, but
this one 
http://dickgrune.com/Books/PTAPG_2nd_Edition/Additional.html

It really is an awesome book.
July 05, 2012
Re: Let's stop parser Hell
On Thursday, 5 July 2012 at 19:54:39 UTC, Philippe Sigaud wrote:
> On Thu, Jul 5, 2012 at 8:28 PM, Andrei Alexandrescu
> <SeeWebsiteForEmail@erdani.org> wrote:
>
>> I'll be glad to buy for you any book you might feel you need 
>> for this.
>> Again, there are few things more important for D right now 
>> than exploiting
>> its unmatched-by-competition features to great ends. I don't 
>> want the lack
>> of educational material to hold you down. Please continue 
>> working on this
>> and let me know of what you need.
>
> That's nice of you, if a bit extreme for a public mailing list 
> :)
> Andrei, money is no problem :)
> Interest in the field of parsing is no problem.
> Interest in D future is no problem.
> Having a demanding job, and three children, is a problem. No, 
> scratch
> that, you know what I mean.

I have four, from 1 to 7 years old... Wouldn't call them a 
problem, though :)))

> But hey, Roman is doing interesting things on keyword parsing 
> right
> now, and changing the parser generated by Pegged is not 
> difficult. We
> will see where this thread lead. (Roman, you should send your 
> results
> here, because I'm still taken aback by the built-in AA speed 
> compared
> to linear array look-up for 100 keywords).

Well, I wouldn't call those "results" yet. Just some benchmarks. 
Here they are:

isKeyword_Dummy (baseline): 427 [microsec] total, 28 [nanosec / 
lookup].
isKeyword_Dictionary: 1190 [microsec] total, 214 [nanosec / 
lookup].
isKeyword_SwitchByLengthThenByChar: 466 [microsec] total, 83 
[nanosec / lookup].
isKeyword_BinaryArrayLookup: 2722 [microsec] total, 490 [nanosec 
/ lookup].
isKeyword_LinearArrayLookup: 13822 [microsec] total, 2490 
[nanosec / lookup].
isKeyword_UnicodeTrie: 1317 [microsec] total, 237 [nanosec / 
lookup].
isKeyword_UnicodeTrieBoolLookup: 1072 [microsec] total, 193 
[nanosec / lookup].
Total: 22949 identifiers + 5551 keywords.

isKeyword_Dummy (baseline): 2738 [microsec] total, 50 [nanosec / 
lookup].
isKeyword_Dictionary: 4247 [microsec] total, 242 [nanosec / 
lookup].
isKeyword_SwitchByLengthThenByChar: 1593 [microsec] total, 91 
[nanosec / lookup].
isKeyword_BinaryArrayLookup: 14351 [microsec] total, 820 [nanosec 
/ lookup].
isKeyword_LinearArrayLookup: 59564 [microsec] total, 3405 
[nanosec / lookup].
isKeyword_UnicodeTrie: 4167 [microsec] total, 238 [nanosec / 
lookup].
isKeyword_UnicodeTrieBoolLookup: 3466 [microsec] total, 198 
[nanosec / lookup].
Total: 104183 identifiers + 17488 keywords.

> As Dmitry said, we might hit a CTFE wall: memory consumption,
> computation speed, ...
> (*channelling Andrei*: then we will correct whatever makes CTFE 
> a
> problem. Right)
>
> Philippe
>
> (Hesitating between 'The Art of the Metaobject Protocol' and
> 'Compilers, Techniques and Tools', right now)
July 05, 2012
Re: Let's stop parser Hell
> isKeyword_Dummy (baseline): 2738 [microsec] total, 50 [ns / 
> lookup].
This one calculates a sum of all identifier code units. Included 
for comparison.

> isKeyword_Dictionary: 4247 [microsec] total, 242 [ns / lookup].
Check whether an identifier is a keyword using AA (dictionary) 
lookup.

> isKeyword_SwitchByLengthThenByChar: 1593 [microsec] total, 91 
> [ns / lookup].
Switch by identifier length at the top, then recursively switch 
by each character. Clearly a winner, but I think it can be 
improved further.

> isKeyword_BinaryArrayLookup: 14351 [microsec] total, 820 [ns / 
> lookup].
Binary search in an ordered array of keywords.

> isKeyword_LinearArrayLookup: 59564 [microsec] total, 3405 [ns / 
> lookup].
Ditto, search is linear.

> isKeyword_UnicodeTrie: 4167 [microsec] total, 238 [ns / lookup].
Lookup a keyword in a trie, created by Dmitry. This will be 
improved.

> isKeyword_UnicodeTrieBoolLookup: 3466 [microsec] total, 198 [ns 
> / lookup].
The same, but only determines whether an identifier is a keyword, 
not which exactly.

> Total: 104183 identifiers + 17488 keywords.
Analyzed the largest Phobos file (DateTime? I didn't check the 
name.) Results are consistent for other files, too.
July 05, 2012
Re: Let's stop parser Hell
On 06-Jul-12 00:16, Roman D. Boiko wrote:
>> isKeyword_Dummy (baseline): 2738 [microsec] total, 50 [ns / lookup].
> This one calculates a sum of all identifier code units. Included for
> comparison.
>
>> isKeyword_Dictionary: 4247 [microsec] total, 242 [ns / lookup].
> Check whether an identifier is a keyword using AA (dictionary) lookup.
>
>> isKeyword_SwitchByLengthThenByChar: 1593 [microsec] total, 91 [ns /
>> lookup].
> Switch by identifier length at the top, then recursively switch by each
> character. Clearly a winner, but I think it can be improved further.
>
I'd stress the fact that it's a fully unrolled & hard-coded
switch that takes a lot of pages (~72Kbytes). It's easily be perfect.
Sorry, couldn't resist ;)
And I'm not sure how much it could be optimized maybe some measly 10-30%.

>> Total: 104183 identifiers + 17488 keywords.
> Analyzed the largest Phobos file (DateTime? I didn't check the name.)
> Results are consistent for other files, too.
It is std.datetime as I've been running this benchmark against it and 
seen the same numbers :)

-- 
Dmitry Olshansky
1 2 3 4 5 6 7 8
Top | Discussion index | About this forum | D home