Thread overview | |||||||
---|---|---|---|---|---|---|---|
|
May 13, 2014 RegEx for a simple Lexer | ||||
---|---|---|---|---|
| ||||
Hi there, I read a book about an introduction to creating programming languages (really basic). The sample code is written in Ruby, but I want to rewrite the examples in D. However, the Lexer uses Ruby's regex features to scan the code. I'm not very familiar with D's RegEx system (nor with another..), so it would be very helpful to receive some tips on how to "translate" the ruby RegEx's to D's implementation. If in Ruby I have a string called src, I just can use this: src[/\A([A-Z]\w*)/, 1]. Would match( src, r"([A-Z]\w*)" ); essentially do the same? (I know I have to use .captures to receive the found expression) If I also want to create a RegEx to filter string-expressions a la " xyz ", how would I do this? At least match( src, r"^\" (.*) $\" " ); doesn't seem to work and I couldn't find in the Library Reference how to change it.. Sorry if these questions seem dumb to you.. Ahh, I forgot one: In the book a parser generator like Yacc is used to create a suitable parser. Is there an equivalent for D? Or if not: is it really that hard to create a parser that is able to parse sth. like this: // Example class Foo: def name: "name" def asdf: 100 foo = Foo.new print( foo.nam ) print( foo.asdf ) Thank you for helping, Tim |
May 13, 2014 Re: RegEx for a simple Lexer | ||||
---|---|---|---|---|
| ||||
Posted in reply to Tim Holzschuh | On Tuesday, 13 May 2014 at 19:53:17 UTC, Tim Holzschuh via
Digitalmars-d-learn wrote:
> If I also want to create a RegEx to filter string-expressions a la " xyz ", how would I do this?
>
> At least match( src, r"^\" (.*) $\" " ); doesn't seem to work and I couldn't find in the Library Reference how to change it..
That string literal is malformed. WYSIWYG strings (r"...") don't
know escape sequences. So, the string ends at the second quote,
and the rest is syntactical garbage to the compiler.
"^\" (.*) $\" "
would be a proper D string literal. You could also use the
alternative WYSIWYG syntax:
`^" (.*) $" `
That dollar sign looks off, though. It matches the end of the
input. You probably want to put that at the end of the regex:
"^\" (.*) \"$"
Meaning: The match has to start at the beginning of the input
(^). Matches a quote, then a space, then anything (.*), then a
space, then a quote. The match has to end at the end of the input
($).
Then again, when you're writing a tokenizer/parser, you usually
don't require an expression to span the whole input, but just
match as far as it goes. In that case, drop the dollar sign. And
think about what happens when there are quotes in the payload.
|
May 13, 2014 Re: RegEx for a simple Lexer | ||||
---|---|---|---|---|
| ||||
Posted in reply to Tim Holzschuh | On Tuesday, 13 May 2014 at 19:53:17 UTC, Tim Holzschuh via Digitalmars-d-learn wrote: > Hi there, > I read a book about an introduction to creating programming languages (really basic). > > The sample code is written in Ruby, but I want to rewrite the examples in D. > > However, the Lexer uses Ruby's regex features to scan the code. > > I'm not very familiar with D's RegEx system (nor with another..), so it would be very helpful to receive some tips on how to "translate" the ruby RegEx's to D's implementation. You may find the following useful: http://hackerpilot.github.io/experimental/std_lexer/phobos/lexer.html The source of the lexer generator is located here: https://github.com/Hackerpilot/Dscanner/blob/master/std/lexer.d D lexer: https://github.com/Hackerpilot/Dscanner/blob/master/std/d/lexer.d There's also a parser and AST library for D in that same project. The lexer generator may not be as simple as what you're using right now, but it is very fast. |
May 13, 2014 Re: RegEx for a simple Lexer | ||||
---|---|---|---|---|
| ||||
Posted in reply to anonymous | On 5/13/14, 5:43 PM, anonymous wrote:
> On Tuesday, 13 May 2014 at 19:53:17 UTC, Tim Holzschuh via
> Digitalmars-d-learn wrote:
>> If I also want to create a RegEx to filter string-expressions a la "
>> xyz ", how would I do this?
>>
>> At least match( src, r"^\" (.*) $\" " ); doesn't seem to work and I
>> couldn't find in the Library Reference how to change it..
I think he's confusing r"..." with a regular expression literal (I also confused them)
|
May 14, 2014 Re: RegEx for a simple Lexer | ||||
---|---|---|---|---|
| ||||
Posted in reply to Tim Holzschuh | On Tuesday, 13 May 2014 at 20:02:59 UTC, Tim Holzschuh via Digitalmars-d-learn wrote: > Still: Would it be very difficult to write a suitable parser from scratch? See http://forum.dlang.org/post/lbnheh$2ssm$1@digitalmars.com with duscussion about parsers on reddit. |
Copyright © 1999-2021 by the D Language Foundation