View mode: basic / threaded / horizontal-split · Log in · Help
July 11, 2012
Re: Let's stop parser Hell
On 07/11/2012 11:55 PM, David Piepgrass wrote:
> On Tuesday, 10 July 2012 at 23:49:58 UTC, Timon Gehr wrote:
>> On 07/11/2012 01:16 AM, deadalnix wrote:
>>> On 09/07/2012 10:14, Christophe Travert wrote:
>>>> deadalnix , dans le message (digitalmars.D:171330), a écrit :
>>>>> D isn't 100% CFG. But it is close.
>>>> What makes D fail to be a CFG?
>>> type[something] <= something can be a type or an expression.
>>> typeid(somethning) <= same here
>>> identifier!(something) <= again
>>
>> 'something' is context-free:
>>
>> something ::= type | expression.
>
> I don't see how "type | expression" is context free. The input "Foo"
> could be a type or expression, you can't tell which without looking at
> the context.

'Context free' means that all the production rule left hand sides do
not have a context, i.e. they consist of a single non-terminal.

What you are describing is the fact that the grammar is ambiguous. The 
parser can just assume whatever works and the analyzer takes care of the
interpretation of the resulting AST. The parser does not have to
give a meaning to the code. It just turns it into a representation that
is easy to work with.
July 23, 2012
Re: Let's stop parser Hell
On Thu, Jul 5, 2012 at 10:00 PM, Tobias Pankrath <tobias@pankrath.net> wrote:

> If you are interested in parsing, than I wouldn't recommend the dragon book,
> but this one
>
> It really is an awesome book.

This little post to say a big 'Thanks' to Tobias. I'm reading Grune's
Parsing Techniques and find it absolutely wonderful: easy to read,
going into details and limits of the many parsing algos. Very cool!

I'm playing with fully-backtracking recursive-descent parsers right
now, to deal with ambiguous grammars and the small parser works
beautifully. Hey, I just saw he gives the code on his website, nice!

Now, on to LR. Nick, here I come. I'll at last know how Goldie works.


Philippe
July 23, 2012
Re: Let's stop parser Hell
On 7/23/12 9:22 AM, Philippe Sigaud wrote:
> On Thu, Jul 5, 2012 at 10:00 PM, Tobias Pankrath<tobias@pankrath.net>  wrote:
>
>> If you are interested in parsing, than I wouldn't recommend the dragon book,
>> but this one
>>
>> It really is an awesome book.
>
> This little post to say a big 'Thanks' to Tobias. I'm reading Grune's
> Parsing Techniques and find it absolutely wonderful: easy to read,
> going into details and limits of the many parsing algos. Very cool!
>
> I'm playing with fully-backtracking recursive-descent parsers right
> now, to deal with ambiguous grammars and the small parser works
> beautifully. Hey, I just saw he gives the code on his website, nice!
>
> Now, on to LR. Nick, here I come. I'll at last know how Goldie works.

When will we see the fruit of that all in the form of D libraries?

Andrei
July 23, 2012
Re: Let's stop parser Hell
On Monday, 23 July 2012 at 13:22:47 UTC, Philippe Sigaud wrote:
> On Thu, Jul 5, 2012 at 10:00 PM, Tobias Pankrath 
> <tobias@pankrath.net> wrote:
>
>> If you are interested in parsing, than I wouldn't recommend 
>> the dragon book,
>> but this one
>>
>> It really is an awesome book.
>
> This little post to say a big 'Thanks' to Tobias. I'm reading 
> Grune's
> Parsing Techniques and find it absolutely wonderful: easy to 
> read,
> going into details and limits of the many parsing algos. Very 
> cool!

Yes, I'm also reading it and find it amusingly high-quality.
July 23, 2012
Re: Let's stop parser Hell
On Monday, 23 July 2012 at 13:22:47 UTC, Philippe Sigaud wrote:
> On Thu, Jul 5, 2012 at 10:00 PM, Tobias Pankrath 
> <tobias@pankrath.net> wrote:
>
>> If you are interested in parsing, than I wouldn't recommend 
>> the dragon book,
>> but this one
>>
>> It really is an awesome book.
>
> This little post to say a big 'Thanks' to Tobias. I'm reading 
> Grune's
> Parsing Techniques and find it absolutely wonderful: easy to 
> read,
> going into details and limits of the many parsing algos. Very 
> cool!
>

I'm glad that you like it.
July 31, 2012
Re: Let's stop parser Hell
On 07/05/2012 06:11 AM, Denis Shelomovskij wrote:
> There are more and more projects requiring parsing D code (IDE plugins,
> DCT and dfmt now).
>
> We definitely need a _standard_ fast D parser (suitable as tokenizer).
> We already have a good one at least in Visual D. Maybe dmd's parser is
> faster. If so, it can be (easily?) rewritten in D. We also already have
> some other parsers (in C# from Mono-D etc.).
>
> What about to get one and call it standard?
>

I know I'm a little late coming into this conversation. This seems like 
a nice thread to toss myself into. I've started working on a generic 
lexer that is based off of a defined grammar.

As I read through the thread (I unfortunately don't have enough time to 
read every post, but I skimmed through as many as I could, and read the 
ones that seemed important), it seems like we need a parser in D that 
can lex D, and provide a Range of tokens that could be consumed.

With some very minor tweaks, and a properly written Grammar class, I 
basically have it already done. D was going to be one of the first 
languages I would have written a definition for.

https://github.com/kai4785/firstfront

I haven't had time to look through Pegged, but there are some 
differences in my approach that I can immediately see in Pegged's.

1) Pegged does compile-time or run-time code generation for your parser. 
Mine is run-time only and regex based.
2) Pegged uses PEG format, which I have been introduced to through this 
thread. One of my goals was to actually invent something like PEG. This 
will save me time :)

I would love to receive some critical feedback on my approach as well as 
the actual code base.

To build it, you'll need cmake, and cmaked2 from here:
http://code.google.com/p/cmaked2/wiki/GettingStarted

Or just dmd *.d :) I haven't tried to build it on Windows yet, but I 
don't see anything that immediately jumps out as not cross-platform. 
I've been working on it on both Fedora and CentOS.

-Kai Meyer
July 31, 2012
Re: Let's stop parser Hell
On Tuesday, 31 July 2012 at 16:10:14 UTC, Kai Meyer wrote:

> I know I'm a little late coming into this conversation. This 
> seems like a nice thread to toss myself into. I've started 
> working on a generic lexer that is based off of a defined 
> grammar.

Every helping hand is appreciated :-)

> As I read through the thread (I unfortunately don't have enough 
> time to read every post, but I skimmed through as many as I 
> could, and read the ones that seemed important), it seems like 
> we need a parser in D that can lex D, and provide a Range of 
> tokens that could be consumed.

Yes. To make this statement more precise: We need a lexer that 
provides
a range of tokens and we need a parser which makes it possible to 
build
an AST. Pegged combines both approaches but imposes an overhead 
if you
just need a token list. However I'm not sure if this is a problem.

There are already some working D-Parsers buried in this thread.

> With some very minor tweaks, and a properly written Grammar 
> class, I basically have it already done. D was going to be one 
> of the first languages I would have written a definition for.
>
> https://github.com/kai4785/firstfront

I've only glimpsed at your code. For most languages lexing is far 
more expensive then parsing and thus the lexer has to be very 
fast and I wouldn't pursue your approach and instead use 
something like ragel. It already has D output but needs a 
separate build step.
July 31, 2012
Re: Let's stop parser Hell
On Tue, Jul 31, 2012 at 7:29 PM, Tobias Pankrath <tobias@pankrath.net> wrote:
> On Tuesday, 31 July 2012 at 16:10:14 UTC, Kai Meyer wrote:
>
>> I know I'm a little late coming into this conversation. This seems like a
>> nice thread to toss myself into. I've started working on a generic lexer
>> that is based off of a defined grammar.
>
>
> Every helping hand is appreciated :-)

Hi Kai, welcome here!

>> As I read through the thread (I unfortunately don't have enough time to
>> read every post, but I skimmed through as many as I could, and read the ones
>> that seemed important), it seems like we need a parser in D that can lex D,
>> and provide a Range of tokens that could be consumed.
>
>
> Yes. To make this statement more precise: We need a lexer that provides
> a range of tokens and we need a parser which makes it possible to build
> an AST. Pegged combines both approaches but imposes an overhead if you
> just need a token list. However I'm not sure if this is a problem.

I think we need a tokenizer generator and a parser generator. They can
be grouped or not, but we need both, in Phobos.

We also need to define what's needed in a token. Kai's approach is OK,
but what's the _type field for an operator or and identifier?
Also, a lexer can fill a symbol table and produce identifier tokens
that refer to a particular entry in the symbol table.

I guess creating a tree of symbol tables according to scope visibility
is then more the job of the parser, but I'm not sure.


> There are already some working D-Parsers buried in this thread.

Yes. We also need a D parser (not generic), but no one pushed one for
Phobos inclusion right now.
Anyway, we can put generic parsers in Phobos too (LALR, LL(*), etc),
but I'd say the first priority would be to have a D lexer (producing a
range of tokens) and a D parser (consuming a range of tokens, and
executing semantic actions, like the building of an AST). Generic
parsers can come later on. Having a D parser in the standard
distribution would create much goodwill. Many people want that.


> I've only glimpsed at your code. For most languages lexing is far more
> expensive then parsing

Is that so?

> and thus the lexer has to be very fast and I wouldn't
> pursue your approach and instead use something like ragel. It already has D
> output but needs a separate build step.

Having std.lexer in Phobos would be quite good. With a pre-compiled lexer for D.

The only problem I see in Kai's approach (which is the standard one, a
prioritized list of regexes) is how to recognize tokens that are not
regular (I mean, that cannot be encoded as a regex), like nesting
comments? How does the lexer know when to stop producing a 'comment'
token?

Philippe
July 31, 2012
Re: Let's stop parser Hell
On Tuesday, July 31, 2012 23:10:37 Philippe Sigaud wrote:
> Having std.lexer in Phobos would be quite good. With a pre-compiled lexer
> for D.

I'm actually quite far along with one now - one which is specifically written 
and optimized for lexing D. I'll probably be done with it not too long after 
the 2.060 release (though we'll see). Writing it has been going surprisingly 
quickly actually, and I've already found some bugs in the spec as a result 
(some of which have been fixed, some of which I still need to create pull 
requests for). So, regardless of what happens with my lexer, at least the spec 
will be more accurate.

- Jonathan M Davis
July 31, 2012
Re: Let's stop parser Hell
On 31-Jul-12 20:10, Kai Meyer wrote:
> On 07/05/2012 06:11 AM, Denis Shelomovskij wrote:
>> There are more and more projects requiring parsing D code (IDE plugins,
>> DCT and dfmt now).
>>
>> We definitely need a _standard_ fast D parser (suitable as tokenizer).
>> We already have a good one at least in Visual D. Maybe dmd's parser is
>> faster. If so, it can be (easily?) rewritten in D. We also already have
>> some other parsers (in C# from Mono-D etc.).
>>
>> What about to get one and call it standard?
>>
>
> I know I'm a little late coming into this conversation. This seems like
> a nice thread to toss myself into. I've started working on a generic
> lexer that is based off of a defined grammar.
>
> As I read through the thread (I unfortunately don't have enough time to
> read every post, but I skimmed through as many as I could, and read the
> ones that seemed important), it seems like we need a parser in D that
> can lex D, and provide a Range of tokens that could be consumed.
>
> With some very minor tweaks, and a properly written Grammar class, I
> basically have it already done. D was going to be one of the first
> languages I would have written a definition for.
>
> https://github.com/kai4785/firstfront
>
> I haven't had time to look through Pegged, but there are some
> differences in my approach that I can immediately see in Pegged's.
>
> 1) Pegged does compile-time or run-time code generation for your parser.
> Mine is run-time only and regex based.
> 2) Pegged uses PEG format, which I have been introduced to through this
> thread. One of my goals was to actually invent something like PEG. This
> will save me time :)
>

> I would love to receive some critical feedback on my approach as well as
> the actual code base.

Okay. Critics go as follows:

First, as an author of std.regex I'm pleased that you find it suitable 
to use it in lexer.  :)

Still there is a BIG word of warning:

Lexer based on trying out an array of regexes until one will match is 
NOT fast and not even close to fast ones. It is acceptable as proof of 
concept/prototype kind of thing only.

To help this use case I though of making multi-regex matching a-la:
match(text, regex1, regex2, regex3...);
then  lexing would be effectively a loop of form:

foreach(tok; match(text, regex1, regex2, regex3...)){
	switch(tok[0]){//suppose match returns tuple - N of regex matched + the 
usual piece of text
		...
	}
}
(with some glitches on /+ ... +/ and other non-regular stuff).

But until such time it's not to be used seriously in this context.

The reason is that each call to match does have some overhead to setup 
regex engine, it's constant but it's huge compared to what usual lexers 
typically do. The other thing is that you should use ctRegex for this 
kind of job if you can (i.e. compiler doesn't explode on it).


(keep in mind I only glimpsed the source, will post more feedback later)

>
> To build it, you'll need cmake, and cmaked2 from here:
> http://code.google.com/p/cmaked2/wiki/GettingStarted
>
> Or just dmd *.d :) I haven't tried to build it on Windows yet, but I
> don't see anything that immediately jumps out as not cross-platform.
> I've been working on it on both Fedora and CentOS.
>

rdmd for the win!


-- 
Dmitry Olshansky
15 16 17 18 19 20 21 22 23
Top | Discussion index | About this forum | D home