compile-time regex redux (page 2) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » compile-time regex redux (page 2)

February 07, 2007

Re: compile-time regex redux

Posted by Walter Bright
in reply to Bill Baxter

Walter Bright

Posted in reply to Bill Baxter

Bill Baxter wrote:
> That would help I suppose, but at the same time regexps themselves have a tendancy to end up being 'write-only' code.  The heavy use of them in perl is I think a large part of what gives it a rep as a write-only language.   Heh heh.  I just found this regexp for matching RFC 822 email addresses:
>     http://www.regular-expressions.info/email.html
> (the one at the bottom of the page)

I agree that non-trivial regexes can be pretty intimidating - but writing templates to do the same will be even more intimidating.

February 07, 2007

Re: compile-time regex redux

Posted by Walter Bright
in reply to Sean Kelly

Walter Bright

Posted in reply to Sean Kelly

Sean Kelly wrote:
> Just a quick comment--I want to think about this a bit more.  If we are given compile-time regular expressions it may be useful if we could obtain more information than this.  For example, I would probably also want to know where in the source string the match begins.

One idea is to have it return an array of strings, and then you'd index that to get the desired result string.

February 07, 2007

Re: compile-time regex redux

Posted by Walter Bright
in reply to kenny

Walter Bright

Posted in reply to kenny

kenny wrote:
> I know I'm asking for a lot, but the way templates handle string are still kinda weird to me. Would string parsing in this sort of way be absolutely impossible with templates? I have not had good luck with it. 

I just haven't thought about this enough. Certainly, however, solving the problem in a more general, D-ish way than regex would be a much bigger win. Regex works only for a subset of problems (can't do recursive descent parsing with it).

February 07, 2007

Re: compile-time regex redux

Posted by BCS
in reply to Walter Bright

BCS

Posted in reply to Walter Bright

Walter Bright wrote:
> kenny wrote:
> 
>> I know I'm asking for a lot, but the way templates handle string are still kinda weird to me. Would string parsing in this sort of way be absolutely impossible with templates? I have not had good luck with it. 
> 
> 
> I just haven't thought about this enough. Certainly, however, solving the problem in a more general, D-ish way than regex would be a much bigger win. Regex works only for a subset of problems (can't do recursive descent parsing with it).

As I see it the biggest problem with compile time parsing in D is that building non linear structure is a pain. Tuples implicitly cated when passed together an this make some things really hard. Allowing a tuple to be a member of another tuple would put D template in the same class as LISP.

Another things that might make things easier is some way to mark a template as "evaluate to value and abandon". This would cause the template to be processed but none of the symbols generated by it would be kept, only the value. Of course, suitable restrictions would apply.

February 07, 2007

Re: compile-time regex redux

Posted by Chris Miller
in reply to kenny

Chris Miller

Posted in reply to kenny

On Wed, 07 Feb 2007 12:04:10 -0500, kenny <funisher@gmail.com> wrote:
> Walter, I don't hate regex -- I just don't use it. It seems to me that to figure out regex syntax takes longer than writing quick for/while statements, and I usually forget cases in regex too...
>
> just being able to write like I can in D with compile time variables would be so much easier for me, and it would only require one template function instead of 35 to parse a simple string... for example.
>
> 1. A while back, I needed something very quickly to remove whitespace. it took me much less time with loops than I ever could have done with a regex. I want to be able to do the same in templates, if possible. I will be trying to reproduce later this, but I think that it will require a lot of templates.

I generally dislike regex for anything semi-complex. It's handy for simple things, like it's great as a find/replace feature in an editor, but anything more advanced and it's a huge pain.

February 07, 2007

Re: compile-time regex redux

Posted by Chris Nicholson-Sauls
in reply to Bill Baxter

Chris Nicholson-Sauls

Posted in reply to Bill Baxter

Bill Baxter wrote:
> Walter Bright wrote:
>> String mixins, in order to be useful, need an ability to manipulate strings at compile time. Currently, the core operations on strings that can be done are:
>>
>> 1) indexed access
>> 2) slicing
>> 3) comparison
>> 4) getting the length
>> 5) concatenation
>>
>> Any other functionality can be built up from these using template metaprogramming.
>>
>> The problem is that parsing strings using templates generates a large number of template instantiations, is (relatively) very slow, and consumes a lot of memory (at compile time, not runtime). For example, ParseInteger would need 4 template instantiations to parse 5678, and each template instantiation would also include the rest of the input as part of the template instantiation's mangled name.
>>
>> At some point, this will prove a barrier to large scale use of this feature.
>>
>> Andrei suggested using compile time regular expressions to shoulder much of the burden, reducing parsing of any particular token to one instantiation.
> 
> That would help I suppose, but at the same time regexps themselves have a tendancy to end up being 'write-only' code.  The heavy use of them in perl is I think a large part of what gives it a rep as a write-only language.   Heh heh.  I just found this regexp for matching RFC 822 email addresses:
>     http://www.regular-expressions.info/email.html
> (the one at the bottom of the page)
> 
> 
> --bb

Wow... I'm actually missing hair now just from trying to read that.  My internal regexp engine crashed, too -- had a neural buffer overflow.

-- Chris Nicholson-Sauls

February 07, 2007

Re: compile-time regex redux

Posted by Sean Kelly
in reply to Walter Bright

Sean Kelly

Posted in reply to Walter Bright

Walter Bright wrote:
>
> At some point, this will prove a barrier to large scale use of this feature.

I agree, though I'm not sure this feature will see large scale use either way.  Template metaprogramming is still very uncommon outside of library code.

> Andrei suggested using compile time regular expressions to shoulder much of the burden, reducing parsing of any particular token to one instantiation.
> 
> The last time I introduced core regular expressions into D, it was soundly rejected by the community and was withdrawn, and for good reasons.
> 
> But I think we now have good reasons to revisit this, at least for compile time use only. For example:
> 
>     ("aa|b" ~~ "ababb") would evaluate to "ab"
> 
> I expect one would generally only see this kind of thing inside templates, not user code.

I agree that this would eliminate the need for a lot of template library code and would speed compilation for applications using such techniques.  I am still unsure whether this is sufficient to warrant its inclusion to the language, but I'm not strongly opposed to the idea.  However, for this to be useful I'd like to reiterate that I would want some way to continue parsing after the match point.  The most obvious would be to return an index/string pair where the index contains the position of the match in the source string, or as you mentioned, perhaps an array consisting of three slices: the source string preceding the match, the match itself, and the source string following the match.

Sean

February 07, 2007

Re: compile-time regex redux

Posted by Andrei Alexandrescu (See Website For Email)
in reply to Sean Kelly

Andrei Alexandrescu (See Website For Email)

Posted in reply to Sean Kelly

Sean Kelly wrote:
> Walter Bright wrote:
>  >
>> At some point, this will prove a barrier to large scale use of this feature.
> 
> I agree, though I'm not sure this feature will see large scale use either way.  Template metaprogramming is still very uncommon outside of library code.

If we want to make D a language for the future, we must thoroughly rid ourselves of such a view.

>> Andrei suggested using compile time regular expressions to shoulder much of the burden, reducing parsing of any particular token to one instantiation.
>>
>> The last time I introduced core regular expressions into D, it was soundly rejected by the community and was withdrawn, and for good reasons.
>>
>> But I think we now have good reasons to revisit this, at least for compile time use only. For example:
>>
>>     ("aa|b" ~~ "ababb") would evaluate to "ab"
>>
>> I expect one would generally only see this kind of thing inside templates, not user code.
> 
> I agree that this would eliminate the need for a lot of template library code and would speed compilation for applications using such techniques.  I am still unsure whether this is sufficient to warrant its inclusion to the language, but I'm not strongly opposed to the idea.  However, for this to be useful I'd like to reiterate that I would want some way to continue parsing after the match point.  The most obvious would be to return an index/string pair where the index contains the position of the match in the source string, or as you mentioned, perhaps an array consisting of three slices: the source string preceding the match, the match itself, and the source string following the match.

Parens will allow grouping much like in Perl. If a regex contains groupings, then the result will be a compile-time array with the matches. All you have to do then is to group subparts appropriately, e.g.:

("templated regex rocks" ~~ "([a-z]+) +(.*)")

returns a compile-time array ["templated", "regex rocks"].


Andrei

February 07, 2007

Re: compile-time regex redux

Posted by kris
in reply to Walter Bright

kris

Posted in reply to Walter Bright

Walter Bright wrote:
> String mixins, in order to be useful, need an ability to manipulate strings at compile time. Currently, the core operations on strings that can be done are:
> 
> 1) indexed access
> 2) slicing
> 3) comparison
> 4) getting the length
> 5) concatenation
> 
> Any other functionality can be built up from these using template metaprogramming.
> 
> The problem is that parsing strings using templates generates a large number of template instantiations, is (relatively) very slow, and consumes a lot of memory (at compile time, not runtime). For example, ParseInteger would need 4 template instantiations to parse 5678, and each template instantiation would also include the rest of the input as part of the template instantiation's mangled name.
> 
> At some point, this will prove a barrier to large scale use of this feature.
> 
> Andrei suggested using compile time regular expressions to shoulder much of the burden, reducing parsing of any particular token to one instantiation.
> 
> The last time I introduced core regular expressions into D, it was soundly rejected by the community and was withdrawn, and for good reasons.
> 
> But I think we now have good reasons to revisit this, at least for compile time use only. For example:
> 
>     ("aa|b" ~~ "ababb") would evaluate to "ab"
> 
> I expect one would generally only see this kind of thing inside templates, not user code.

compile-time regex is only part of the picture. A small one too. I rather expect we'd wind up finding the manner it was exposed was just too limiting in one way or another. Exposing, as was apparently suggested, the full API of RegExp inside the compiler sounds a tad distasteful.

You'll perhaps forgive me if I question whether this is driven primarily from an academic interest?  What I mean is this: if and when D goes mainstream, perhaps just one in ten-thousand developers will actually use this kind of feature more than 5 times (and still find themselves limited). Perhaps I'm being generous with those numbers also?

What is wrong with runtime execution anyway? It sure is easier to write and maintain clean D code than (for many ppl) complex concepts that are, what amount to, nothing more than runtime optimizations. Isn't that true?

It would seem that adding such features does not address the type of things that would be useful to 80% of developers? Surely that should be far more important?

And, no ... I'm not just pooh poohing the idea ... I'm really serious about D getting some realistic market traction, and I don't see how adding more compile-time 'specialities' can help in any way other than generating a little bit of 'novelty' interest. Isn't this a good example of "premature optimization" ?

Surely some of the others long-term concerns, such as solid debugging support, simmering code/dataseg bloat, lib support for templates, etc, etc, should deserve full attention instead? Surely that is a more successful approach to getting D adopted in the marketplace?

Lot's of questions, and I hope you can give them serious consideration, Walter.

- Kris

February 07, 2007

Re: compile-time regex redux

Posted by Andrei Alexandrescu (See Website For Email)
in reply to kenny

Andrei Alexandrescu (See Website For Email)

Posted in reply to kenny

kenny wrote:
> Walter, I don't hate regex -- I just don't use it. It seems to me that to figure out regex syntax takes longer than writing quick for/while statements, and I usually forget cases in regex too...

I think this is an age-old issue: if you don't know something, you find it harder to do things that way. The telling sign is that people who know _both_ simple loops and regexes do use regexes, and as a consequence are way more productive at a certain category of tasks.

> just being able to write like I can in D with compile time variables would be so much easier for me, and it would only require one template function instead of 35 to parse a simple string... for example.
> 
> 1. A while back, I needed something very quickly to remove whitespace. it took me much less time with loops than I ever could have done with a regex. I want to be able to do the same in templates, if possible. I will be trying to reproduce later this, but I think that it will require a lot of templates.
> 2. what about building associative arrays out of a string? I have this function from existing code. It didn't take too long to write. I want to be able to write something like this in templates to build assoc arrays dynamically.
> 
> I know I'm asking for a lot, but the way templates handle string are still kinda weird to me. Would string parsing in this sort of way be absolutely impossible with templates? I have not had good luck with it. Perhaps I missed something...

That would require functional-style programming - which, of course, also  seems hard before you learn it. So either way, we're hosed :o).


Andrei

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation