October 21, 2006 Re: Would there be interest in a SERIOUS compile-time regex parser? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Bruno Medeiros | Bruno Medeiros wrote:
> Don Clugston wrote:
>>
>> It would behave *exactly* like the existing std.regexp, except that compilation into the internal form would happen via template metaprogramming, so that
>> (1) all errors would be caught at compile time, and
>> (2) there'd be a minor speedup because the compilation step would not happen at runtime, and
>> (3) otherwise it wouldn't be any faster than the existing regexp. However, there'd be no template code bloat, either.
>>
>
> Whoa, "internal form" and "bytecoded program"? Out of curiosity, for those ignorant on the matter, like me, what kind of processing is done when creating a regexp, in terms of this internal form you speak of? Is it converted to a simple internal form, or something more complex? Bytecoded program seems pretty complex stuff, especially for a regexp (isn't the translation direct) ?
It's *nowhere near* as complicated as it sounds. If you look into the source of std.regexp, you'll see what I mean -- there's a function called 'compile'.
|
October 24, 2006 Re: Would there be interest in a SERIOUS compile-time regex parser? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Bruno Medeiros | Bruno Medeiros wrote:
> Whoa, "internal form" and "bytecoded program"? Out of curiosity, for those ignorant on the matter, like me, what kind of processing is done when creating a regexp, in terms of this internal form you speak of?
Generally speaking, regular expressions are compiled into a directed state graph. For example, a regex like this:
a(b|c)d
...compiles into a state-graph like this:
<mono-spaced>
b
a -< >- d
c
</mono-spaced>
The regex engine is a state-machine-crawler that knows how to navigate from node to node in the graph, including rules about when it is allowed to backtrack, etc.
So, the compiled representation of a regular expression isn't "opcodes" in the traditional sense of the word. It's just a very particular kind of state graph.
--Benji
|
October 26, 2006 Re: Would there be interest in a SERIOUS compile-time regex parser? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Don Clugston | Don Clugston wrote: > Bruno Medeiros wrote: >> Don Clugston wrote: >>> >>> It would behave *exactly* like the existing std.regexp, except that compilation into the internal form would happen via template metaprogramming, so that >>> (1) all errors would be caught at compile time, and >>> (2) there'd be a minor speedup because the compilation step would not happen at runtime, and >>> (3) otherwise it wouldn't be any faster than the existing regexp. However, there'd be no template code bloat, either. >>> >> >> Whoa, "internal form" and "bytecoded program"? Out of curiosity, for those ignorant on the matter, like me, what kind of processing is done when creating a regexp, in terms of this internal form you speak of? Is it converted to a simple internal form, or something more complex? Bytecoded program seems pretty complex stuff, especially for a regexp (isn't the translation direct) ? > > It's *nowhere near* as complicated as it sounds. If you look into the source of std.regexp, you'll see what I mean -- there's a function called 'compile'. From my quick look, it is a simpler representation of the regexp string, but it still a linear (opcode) representation. It is not a graph state like I'd expect (and like benji said). (one is used/created later? I couldn't tell from std.regexp.test() ) -- Bruno Medeiros - MSc in CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D |
October 27, 2006 Re: Would there be interest in a SERIOUS compile-time regex parser? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Bruno Medeiros | Bruno Medeiros wrote:
> Don Clugston wrote:
>> Bruno Medeiros wrote:
>>> Don Clugston wrote:
>>>>
>>>> It would behave *exactly* like the existing std.regexp, except that compilation into the internal form would happen via template metaprogramming, so that
>>>> (1) all errors would be caught at compile time, and
>>>> (2) there'd be a minor speedup because the compilation step would not happen at runtime, and
>>>> (3) otherwise it wouldn't be any faster than the existing regexp. However, there'd be no template code bloat, either.
>>>>
>>>
>>> Whoa, "internal form" and "bytecoded program"? Out of curiosity, for those ignorant on the matter, like me, what kind of processing is done when creating a regexp, in terms of this internal form you speak of? Is it converted to a simple internal form, or something more complex? Bytecoded program seems pretty complex stuff, especially for a regexp (isn't the translation direct) ?
>>
>> It's *nowhere near* as complicated as it sounds. If you look into the source of std.regexp, you'll see what I mean -- there's a function called 'compile'.
>
> From my quick look, it is a simpler representation of the regexp string, but it still a linear (opcode) representation. It is not a graph state like I'd expect (and like benji said). (one is used/created later? I couldn't tell from std.regexp.test() )
The graph part is done via 'goto' opcodes. But the majority of the simple graph branches get converted into filters.
|
Copyright © 1999-2021 by the D Language Foundation