Thread overview
re2d lexer generator
Nov 25
Ulya
Nov 25
Sergey
Nov 25
Ulya
Nov 25
Sergey
Nov 25
Ulya
November 25

Regular expression compiler re2c now supports D.

A short intro from the official website: re2c stands for Regular Expressions to Code. It is a free and open-source lexer generator that supports C, C++, D, Go, Haskell, Java, JavaScript, OCaml, Python, Rust, V, Zig, and can be extended to other languages by implementing a single syntax file. The primary focus of re2c is on generating fast code: it compiles regular expressions to deterministic finite automata and translates them into direct-coded lexers in the target language (such lexers are generally faster and easier to debug than their table-driven analogues). Secondary re2c focus is on flexibility: it does not assume a fixed program template; instead, it allows the user to embed lexers anywhere in the source code and configure them to avoid unnecessary buffering and bounds checks. Internal algorithm used by re2c is based on a special kind of deterministic finite automata: lookahead TDFA. These automata are as fast as ordinary DFA, but they are also capable of performing submatch extraction with minimal overhead.

There is a detailed user guide an online playground with many examples.

November 25

On Monday, 25 November 2024 at 16:01:54 UTC, Ulya wrote:

>

a special kind of deterministic finite automata: lookahead TDFA. These automata are as fast as ordinary DFA, but they are also capable of performing submatch extraction with minimal overhead.

There is a detailed user guide an online playground with many examples.

Hi Ulya. I don't have an account on LOR so glad you wrote here :)

Based on some examples from the playground it seems re2c is inserting #line directives.
I think it is not supported by D lang.

I've checked for example 'reuse.re'

November 25

On Monday, 25 November 2024 at 19:18:40 UTC, Sergey wrote:

>

On Monday, 25 November 2024 at 16:01:54 UTC, Ulya wrote:

>

a special kind of deterministic finite automata: lookahead TDFA. These automata are as fast as ordinary DFA, but they are also capable of performing submatch extraction with minimal overhead.

There is a detailed user guide an online playground with many examples.

Hi Ulya. I don't have an account on LOR so glad you wrote here :)

Based on some examples from the playground it seems re2c is inserting #line directives.
I think it is not supported by D lang.

I've checked for example 'reuse.re'

Hi Sergey :)

I believe #line directives are supported, as described here: https://dlang.org/spec/lex.html#special-token-sequence.

All examples are compiled with dmd -g -wi and tested that they produce the expected output: https://github.com/skvadrik/re2c/blob/master/examples/d/__run_all.sh#L26.

It is possible to disable line directives for an individual file using -i, or disable them globally with this setting in syntax file.

November 25

On Monday, 25 November 2024 at 16:01:54 UTC, Ulya wrote:

>

Regular expression compiler re2c now supports D.

[...]

BTW this is completely different from https://code.dlang.org/packages/re2d. The latter is bindings to re2 library, while re2c is an ahead of time regexp compiler (a port of a tool that existed since 1993). The name clash is unfortunate.

November 25

On Monday, 25 November 2024 at 21:33:35 UTC, Ulya wrote:

>

Hi Sergey :)

I believe #line directives are supported, as described here: https://dlang.org/spec/lex.html#special-token-sequence.

Oh cool. I didn't know that and it is kinda unexpected for me :)
Thanks!