October 21, 2006
Bruno Medeiros wrote:
> Don Clugston wrote:
>>
>> It would behave *exactly* like the existing std.regexp, except that compilation into the internal form would happen via template metaprogramming, so that
>> (1) all errors would be caught at compile time, and
>> (2) there'd be a minor speedup because the compilation step would not happen at runtime, and
>> (3) otherwise it wouldn't be any faster than the existing regexp. However, there'd be no template code bloat, either.
>>
> 
> Whoa, "internal form" and "bytecoded program"? Out of curiosity, for those ignorant on the matter, like me, what kind of processing is done when creating a regexp, in terms of this internal form you speak of? Is it converted to a simple internal form, or something more complex? Bytecoded program seems pretty complex stuff, especially for a regexp (isn't the translation direct) ?

It's *nowhere near* as complicated as it sounds. If you look into the source of std.regexp, you'll see what I mean -- there's a function called 'compile'.
October 24, 2006
Bruno Medeiros wrote:
> Whoa, "internal form" and "bytecoded program"? Out of curiosity, for those ignorant on the matter, like me, what kind of processing is done when creating a regexp, in terms of this internal form you speak of?

Generally speaking, regular expressions are compiled into a directed state graph. For example, a regex like this:

  a(b|c)d

...compiles into a state-graph like this:

<mono-spaced>

     b
a -<   >- d
     c

</mono-spaced>

The regex engine is a state-machine-crawler that knows how to navigate from node to node in the graph, including rules about when it is allowed to backtrack, etc.

So, the compiled representation of a regular expression isn't "opcodes" in the traditional sense of the word. It's just a very particular kind of state graph.

--Benji
October 26, 2006
Don Clugston wrote:
> Bruno Medeiros wrote:
>> Don Clugston wrote:
>>>
>>> It would behave *exactly* like the existing std.regexp, except that compilation into the internal form would happen via template metaprogramming, so that
>>> (1) all errors would be caught at compile time, and
>>> (2) there'd be a minor speedup because the compilation step would not happen at runtime, and
>>> (3) otherwise it wouldn't be any faster than the existing regexp. However, there'd be no template code bloat, either.
>>>
>>
>> Whoa, "internal form" and "bytecoded program"? Out of curiosity, for those ignorant on the matter, like me, what kind of processing is done when creating a regexp, in terms of this internal form you speak of? Is it converted to a simple internal form, or something more complex? Bytecoded program seems pretty complex stuff, especially for a regexp (isn't the translation direct) ?
> 
> It's *nowhere near* as complicated as it sounds. If you look into the source of std.regexp, you'll see what I mean -- there's a function called 'compile'.

From my quick look, it is a simpler representation of the regexp string, but it still a linear (opcode) representation. It is not a graph state like I'd expect (and like benji said). (one is used/created later? I couldn't tell from std.regexp.test() )

-- 
Bruno Medeiros - MSc in CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
October 27, 2006
Bruno Medeiros wrote:
> Don Clugston wrote:
>> Bruno Medeiros wrote:
>>> Don Clugston wrote:
>>>>
>>>> It would behave *exactly* like the existing std.regexp, except that compilation into the internal form would happen via template metaprogramming, so that
>>>> (1) all errors would be caught at compile time, and
>>>> (2) there'd be a minor speedup because the compilation step would not happen at runtime, and
>>>> (3) otherwise it wouldn't be any faster than the existing regexp. However, there'd be no template code bloat, either.
>>>>
>>>
>>> Whoa, "internal form" and "bytecoded program"? Out of curiosity, for those ignorant on the matter, like me, what kind of processing is done when creating a regexp, in terms of this internal form you speak of? Is it converted to a simple internal form, or something more complex? Bytecoded program seems pretty complex stuff, especially for a regexp (isn't the translation direct) ?
>>
>> It's *nowhere near* as complicated as it sounds. If you look into the source of std.regexp, you'll see what I mean -- there's a function called 'compile'.
> 
>  From my quick look, it is a simpler representation of the regexp string, but it still a linear (opcode) representation. It is not a graph state like I'd expect (and like benji said). (one is used/created later? I couldn't tell from std.regexp.test() )

The graph part is done via 'goto' opcodes. But the majority of the simple graph branches get converted into filters.
1 2 3
Next ›   Last »