October 02, 2014
> Chances are that I will be able to get the original GLL parser generator from one of algorithm authors (Adrian Johnstone). He's really helpful here. From that point, I will only have to add a frontend for generating a concrete parser, starting with Python - it already has a fully working grammar. Hopefully, I will also be able to find a suitable grammar for D, it is always a pleasure to play with the language.
>
> Otherwise, I will continue my current effort - implementing the GLL parser generator in D.

I did that during this summer, almost to the point it was self-sustained (that is, the GLL parser generator was used to generate a parser for grammars files). I chose a range interface: input is a string, the parse forest is output as a range of parse trees.

I loved to see it generate the different possible parse trees on ambiguous grammars, and accepting left- and right-recursing grammars! GLL is a very noce algorithm.

Halas, even with some shortcuts on Kleene stars it was quite slow. I tried to use threads (spawning worker threads on alternatives), but that did not change the timings much.

I could make it generate a parser for JSON and compared it to the jsonx module that Sönke presented here. Bah, it was 1000 times slower (which is all relative: it still takes only a few ms to parse a big JSON file. But still, far away from the microseconds it should need). Pegged was faster that this GLL generator, but of course still far slower than jsonx.

[Of course, Pegged can cheat: it can parse your file at compile-time, resulting in 0-µs parsing time at runtime :-)]

Also, the way I coded it I hit CTFE limits and could not have it parse at compile-time. A shame, really.

> This is one of the reasons I prefer LL/RD parsers :-) They are easy to follow.

I would like to try a LALR compile-time generator and compile-time parser. I'm pretty sure LALR tables could be expanded directly as D code instead of being read during parsing.

Philippe

October 02, 2014
On Thursday, 2 October 2014 at 18:20:36 UTC, Philippe Sigaud via Digitalmars-d wrote:
> I did that during this summer, almost to the point it was
> self-sustained (that is, the GLL parser generator was used to generate
> a parser for grammars files). I chose a range interface: input is a
> string, the parse forest is output as a range of parse trees.

Nice! Is it public? Github?

> Halas, even with some shortcuts on Kleene stars it was quite slow. I
> tried to use threads (spawning worker threads on alternatives), but
> that did not change the timings much.

AFAIK, multithreading is a bad idea in parsing.

Actually, in the gll-combinators Scala project they have similar speed problems. I don't if it's a fundamental algorithm problem or an implementation details that lead to slowdowns.
October 02, 2014
>> I did that during this summer, almost to the point it was self-sustained (that is, the GLL parser generator was used to generate a parser for grammars files). I chose a range interface: input is a string, the parse forest is output as a range of parse trees.
>
>
> Nice! Is it public? Github?

No github repo. I could put it alongside Pegged, I suppose. I used git internally, though.

 I was a bit disheartened by the speed I got, so did not publish nor
announced it here.

Note also that I saw it as an alternative engine for my own Pegged project, so I used the same way of defining grammars (some prefixes ans suffixes for dropping nodes in the parse tree and so on).

I can send you the code (well the entire repo), if you wish. I did not
touch it for the past 3 months, so I already don't remember what state
it was in :-(.
Looking at the code now, it seems I'm still using Pegged to parse the
initial grammar. Bootstrapping did not go well.

Send me an email at firstname . lastname @gmail.com

(philippe sigaud)


>> Halas, even with some shortcuts on Kleene stars it was quite slow. I tried to use threads (spawning worker threads on alternatives), but that did not change the timings much.
>
>
> AFAIK, multithreading is a bad idea in parsing.

I think, as many things in CS, that people developed parsing
algorithms before multicores existed.
Spawning threads or fibers for some alternatives (A | B) seems nice to
me. I got interesting results with a purely multithreaded parser that
spawned children for each choice.


> Actually, in the gll-combinators Scala project they have similar speed problems. I don't if it's a fundamental algorithm problem or an implementation details that lead to slowdowns.

Well, when all resulting implementations have speed problems...
I found an article by the GLL authors explaining how they coded their
algorithm for more speed, but I couldn't wrap my head around it.

Philippe
October 02, 2014
On Thursday, 2 October 2014 at 20:28:47 UTC, Philippe Sigaud via Digitalmars-d
>
> No github repo. I could put it alongside Pegged, I suppose. I used git
> internally, though.
>
>  I was a bit disheartened by the speed I got, so did not publish nor
> announced it here.

I have a huge collection of projects I never published :-) We all do, I guess.

> I think, as many things in CS, that people developed parsing
> algorithms before multicores existed.
> Spawning threads or fibers for some alternatives (A | B) seems nice to
> me. I got interesting results with a purely multithreaded parser that
> spawned children for each choice.

No, this is not exactly what I mean. Multithreading can be perfectly fine in some cases, very fruitful sometimes. Say, in NLP, when one has to process long or highly ambiguous strings, and the parse tree can become huge and is of importance in itself... Yes. In programming language parsing this is just a small step towards further stages within a longer pipeline. It has to be fast enough to make multithreading on this step an overkill.

> Well, when all resulting implementations have speed problems...

Well... This is why I want to study the original implementation. I had read the story of red-black trees some time ago - it took a while to get it right, more to make it elegant.

BTW, there's one an interesting related work that probably should be taken as a positive example of generalized parsing: the Elkhound parser generator. It uses a hybrid LALR/GLR approach, thus achieving better performance. There's much more to it, I didn't go too deep into it.

> I found an article by the GLL authors explaining how they coded their
> algorithm for more speed, but I couldn't wrap my head around it.

By now, I have read the original Tomita's GLR paper, Antlr ALL(*) paper, a few recent GLR papers, three papers on GLL and a few related ones . It took... A while. I sort of understand the idea, but still not sure about the details :-)

What's the name of the paper you read? "Modelling GLL Parser Implementation"?
October 02, 2014
On Thursday, 2 October 2014 at 20:28:47 UTC, Philippe Sigaud via Digitalmars-d wrote:
>>> I did that during this summer, almost to the point it was
>>> self-sustained (that is, the GLL parser generator was used to generate
>>> a parser for grammars files). I chose a range interface: input is a
>>> string, the parse forest is output as a range of parse trees.
>>
>>
>> Nice! Is it public? Github?
>
> No github repo. I could put it alongside Pegged, I suppose. I used git
> internally, though.
>
>  I was a bit disheartened by the speed I got, so did not publish nor
> announced it here.
>
> Note also that I saw it as an alternative engine for my own Pegged
> project, so I used the same way of defining grammars (some prefixes
> ans suffixes for dropping nodes in the parse tree and so on).
>
> I can send you the code (well the entire repo), if you wish. I did not
> touch it for the past 3 months, so I already don't remember what state
> it was in :-(.
> Looking at the code now, it seems I'm still using Pegged to parse the
> initial grammar. Bootstrapping did not go well.
>
> Send me an email at firstname . lastname @gmail.com
>
> (philippe sigaud)
>

Check the mailbox,

Thank you
October 02, 2014
On Thursday, 2 October 2014 at 18:06:04 UTC, Vladimir Kazanov wrote:
> Chances are that I will be able to get the original GLL parser generator from one of algorithm authors (Adrian Johnstone). He's really helpful here. From that point, I will only have to add a frontend for generating a concrete parser, starting with Python - it already has a fully working grammar. Hopefully, I will also be able to find a suitable grammar for D, it is always a pleasure to play with the language.

Nice! If you manage to get a GLL generator for D (or C++) going I'd love to try it out.

The only general parser generator I've tested so far is the Tomita-style GLR parser called DParser, but IIRC the documentation could need some improvement:

http://sourceforge.net/projects/dparser/

October 02, 2014
On Thursday, 2 October 2014 at 21:26:15 UTC, Ola Fosheim Grøstad wrote:

>
> Nice! If you manage to get a GLL generator for D (or C++) going I'd love to try it out.

You will definitely hear about it here :-) I don't know yet if the parser itself will be worth the trouble, but a more or less complete resulting blog post with tests/results/examples will definitely see the world.

> The only general parser generator I've tested so far is the Tomita-style GLR parser called DParser, but IIRC the documentation could need some improvement:
>
> http://sourceforge.net/projects/dparser/

Yes, I know this one. A scannerless GLR parser. GLRs are trendy these days :) Easy to find many more, there's even an Emacs mode using GLR for Ada parsing.
October 03, 2014
> I have a huge collection of projects I never published :-) We all do, I guess.

Oh, the ratio is more and 100 projects for one published...


> No, this is not exactly what I mean. Multithreading can be perfectly fine in some cases, very fruitful sometimes. Say, in NLP, when one has to process long or highly ambiguous strings, and the parse tree can become huge and is of importance in itself... Yes. In programming language parsing this is just a small step towards further stages within a longer pipeline. It has to be fast enough to make multithreading on this step an overkill.

I don't know. Using fibers, I'm hoping to get interesting results one day. I got it by a workstorm before trying fibers. OS threads were a bit to heavy and didn't work that well.


> BTW, there's one an interesting related work that probably should be taken as a positive example of generalized parsing: the Elkhound parser generator. It uses a hybrid LALR/GLR approach, thus achieving better performance. There's much more to it, I didn't go too deep into it.

Yes, Elkhound is interesting, their approach is nice. But It gave me the impression to be abandoned for a few years?

>
>> I found an article by the GLL authors explaining how they coded their algorithm for more speed, but I couldn't wrap my head around it.
>
>
> By now, I have read the original Tomita's GLR paper, Antlr ALL(*) paper, a few recent GLR papers, three papers on GLL and a few related ones . It took... A while. I sort of understand the idea, but still not sure about the details :-)

ALL(*) is on my todo list. I tried to implement it in Spring, but got
bogged down in the details. Even the white paper has some imprecisions
when you really sit down and try to code it.
I could have a look at ANTLR v4 source, but wanted to have a clean
start, right from the white paper.

> What's the name of the paper you read? "Modelling GLL Parser Implementation"?

Yes.
October 03, 2014
On Thu, Oct 2, 2014 at 11:19 PM, Vladimir Kazanov via Digitalmars-d <digitalmars-d@puremagic.com> wrote:

> Check the mailbox,
>
> Thank you

I sent it to you. I was asleep, sorry :-)
October 03, 2014
On Friday, 3 October 2014 at 05:10:45 UTC, Philippe Sigaud via Digitalmars-d wrote:


> Yes, Elkhound is interesting, their approach is nice. But It gave me
> the impression to be abandoned for a few years?
>

Don't know and don't care, really. All I know is that Scott sort of managed to deal with the main generalized parser application problems by avoiding them most of the time :)) May be a bad sign... After all, most of modern parser generators or parser combinators do not use GLRs, although they do sound interesting in theory.

Something tells me one has to stress this word here: *theory* :-)

>> By now, I have read the original Tomita's GLR paper, Antlr ALL(*) paper, a
>> few recent GLR papers, three papers on GLL and a few related ones . It
>> took... A while. I sort of understand the idea, but still not sure about the
>> details :-)
>
> ALL(*) is on my todo list. I tried to implement it in Spring, but got
> bogged down in the details. Even the white paper has some imprecisions
> when you really sit down and try to code it.
> I could have a look at ANTLR v4 source, but wanted to have a clean
> start, right from the white paper.
>
>> What's the name of the paper you read? "Modelling GLL Parser
>> Implementation"?
>
> Yes.

Scientists... "The algorithm is hard to implement... Okay, let's invent an esoteric paper-only language to explain things to people" :-)

Thanks a lot, by the way!

I've just skimmed through the code and the README... You did not use the packed forest representation, did you..?