Thread overview
RegEx for a simple Lexer
May 13, 2014
Tim Holzschuh
May 13, 2014
anonymous
May 13, 2014
Ary Borenszweig
May 13, 2014
Brian Schott
May 14, 2014
Kagamin
May 13, 2014
Hi there,
I read a book about an introduction to creating programming languages (really basic).

The sample code is written in Ruby, but I want to rewrite the examples in D.

However, the Lexer uses Ruby's regex features to scan the code.

I'm not very familiar with D's RegEx system (nor with another..), so it would be very helpful to receive some tips on how to "translate" the ruby RegEx's to D's implementation.

If in Ruby I have a string called src, I just can use this: src[/\A([A-Z]\w*)/, 1].

Would match( src, r"([A-Z]\w*)" ); essentially do the same?
(I know I have to use .captures to receive the found expression)

If I also want to create a RegEx to filter string-expressions a la " xyz ", how would I do this?

At least match( src, r"^\" (.*) $\" " ); doesn't seem to work and I couldn't find in the Library Reference how to change it..

Sorry if these questions seem dumb to you..

Ahh, I forgot one:
In the book a parser generator like Yacc is used to create a suitable parser.
Is there an equivalent for D?
Or if not: is it really that hard to create a parser that is able to parse sth. like this:

// Example
class Foo:
    def name:
        "name"

    def asdf:
        100

foo = Foo.new

print( foo.nam )
print( foo.asdf )


Thank you for helping,
    Tim
May 13, 2014
On Tuesday, 13 May 2014 at 19:53:17 UTC, Tim Holzschuh via
Digitalmars-d-learn wrote:
> If I also want to create a RegEx to filter string-expressions a la " xyz ", how would I do this?
>
> At least match( src, r"^\" (.*) $\" " ); doesn't seem to work and I couldn't find in the Library Reference how to change it..

That string literal is malformed. WYSIWYG strings (r"...") don't
know escape sequences. So, the string ends at the second quote,
and the rest is syntactical garbage to the compiler.
   "^\" (.*) $\" "
would be a proper D string literal. You could also use the
alternative WYSIWYG syntax:
   `^" (.*) $" `

That dollar sign looks off, though. It matches the end of the
input. You probably want to put that at the end of the regex:
   "^\" (.*) \"$"
Meaning: The match has to start at the beginning of the input
(^). Matches a quote, then a space, then anything (.*), then a
space, then a quote. The match has to end at the end of the input
($).

Then again, when you're writing a tokenizer/parser, you usually
don't require an expression to span the whole input, but just
match as far as it goes. In that case, drop the dollar sign. And
think about what happens when there are quotes in the payload.
May 13, 2014
On Tuesday, 13 May 2014 at 19:53:17 UTC, Tim Holzschuh via Digitalmars-d-learn wrote:
> Hi there,
> I read a book about an introduction to creating programming languages (really basic).
>
> The sample code is written in Ruby, but I want to rewrite the examples in D.
>
> However, the Lexer uses Ruby's regex features to scan the code.
>
> I'm not very familiar with D's RegEx system (nor with another..), so it would be very helpful to receive some tips on how to "translate" the ruby RegEx's to D's implementation.

You may find the following useful:

http://hackerpilot.github.io/experimental/std_lexer/phobos/lexer.html

The source of the lexer generator is located here: https://github.com/Hackerpilot/Dscanner/blob/master/std/lexer.d

D lexer: https://github.com/Hackerpilot/Dscanner/blob/master/std/d/lexer.d

There's also a parser and AST library for D in that same project. The lexer generator may not be as simple as what you're using right now, but it is very fast.

May 13, 2014
On 5/13/14, 5:43 PM, anonymous wrote:
> On Tuesday, 13 May 2014 at 19:53:17 UTC, Tim Holzschuh via
> Digitalmars-d-learn wrote:
>> If I also want to create a RegEx to filter string-expressions a la "
>> xyz ", how would I do this?
>>
>> At least match( src, r"^\" (.*) $\" " ); doesn't seem to work and I
>> couldn't find in the Library Reference how to change it..

I think he's confusing r"..." with a regular expression literal (I also confused them)

May 14, 2014
On Tuesday, 13 May 2014 at 20:02:59 UTC, Tim Holzschuh via Digitalmars-d-learn wrote:
> Still: Would it be very difficult to write a suitable parser from scratch?

See http://forum.dlang.org/post/lbnheh$2ssm$1@digitalmars.com with duscussion about parsers on reddit.