compile-time regex redux (page 4) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » compile-time regex redux (page 4)

February 08, 2007

Re: compile-time regex redux

Posted by kris
in reply to Andrei Alexandrescu (See Website For Email)

kris

Posted in reply to Andrei Alexandrescu (See Website For Email)

Andrei Alexandrescu (See Website For Email) wrote:
> kris wrote:
>> compile-time regex is only part of the picture. A small one too. I rather expect we'd wind up finding the manner it was exposed was just too limiting in one way or another. Exposing, as was apparently suggested, the full API of RegExp inside the compiler sounds a tad distasteful.
> 
> 
> Au contraire, I think it's a definite step in the right direction. Writing programs that write programs is a great way of doing more with less effort. Various languages can do that to various extents, and it's very heartening that D is taking steps in that direction. Allowing the programmer to manipulate strings during compilation is definitely a good step.

You're saying that a 'normal' D program is not sufficiently powerful to write other programs?  Au contraire!  There's nothing wrong with doing that at runtime, rather than turning the compiler itself into an abstract virtual machine?

> 
>> You'll perhaps forgive me if I question whether this is driven primarily from an academic interest?  What I mean is this: if and when D goes mainstream, perhaps just one in ten-thousand developers will actually use this kind of feature more than 5 times (and still find themselves limited). Perhaps I'm being generous with those numbers also?
> 
> 
> Perhaps, just like me, you simply aren't in the position to evaluate them. I will notice, however, a few historical trends. C++ got a shot in the arm from the STL. STL = advanced programming. Interesting. The STL did much to educate the C++ community towards code generation, which continues to be the reason why many influential gurus hang out with C++.

Are you saying that adding regex support at compile-time will take the world by storm? I hope not, because STL and its ilk are about productivity for a mass audience. Not for the few who work with DSL on a regular basis. Besides D is perfectly capable of DSL handling at runtime; there's just no overpowering need for it to do that at /compile-time/.

We might as well be discussing whether the compiler should embed a GUI generator. So it can be used at compile-time. There are better ways of doing that. Ways that are more accessible, more maintainable, and have a much easier learning curve. The OSX GUI builder is one fine example.

> To survive, D must compensate for its relative lack of clout and publicity by offering above and beyond what more mainstream languages offer.

To survive, D needs to get serious about being taken seriously.

Concentrating on what /might/ become a niche ideal doesn't strike me as the best approach. Compare and constrast with, say, targeting D for cell-phone devices? It's the biggest market on the planet, just sitting there /waiting/ for D to come along with the right set of features.

> 
>> What is wrong with runtime execution anyway? It sure is easier to write and maintain clean D code than (for many ppl) complex concepts that are, what amount to, nothing more than runtime optimizations. Isn't that true?
> 
> 
> No. Accommodating DSLs and generating code has more to do with correctness and avoiding duplication of source code, than anything else.

Yes and no. I will defer, of course, to your experience in the matter; but will note that there's /always/ a point of diminishing return.

> 
>> It would seem that adding such features does not address the type of things that would be useful to 80% of developers? Surely that should be far more important?
> 
> 
> No. You are missing a key point - that some code is more influential than other. 2% of programmers may write libraries that work for 90% of programmers.

Indeed. I'm partially responsible for a rather large library. And there's no way that I can see a use for this feature in there. That doesn't mean it won't get used by somebody somewhere (of course), but I put it to you that it does indicate just how little need there is for such features in the language (at compile time).

Yes, I'd personally like to see some better template handling. I'd like to see IFTI fixed - not some new feature that this (extensive) library will never use :)

> a "white hole" class is an implementation of A that implements all methods to throw, and a "black hole" class is an implementation of A that implements all methods to return the default value of the return type.
> 
> This pattern is very useful for either quick starting points for writing true classes implementing A, or as standalone degenerate implementations.
> 
> To some programmers, black and white holes might not even raise a "duplicated code" flag. They sit down and write:

That could reasonably be argued as a point of diminishing return? If it takes more effort or knowledge of how to abstract the pattern from two or more concepts, and to implement it using something unfamiliar, then 99% of developers will ignore it completely.

I fully agree that /idealistically/ such patterns would be nice to have, but that's not reality. And I can't see how this could possibly help D get notable traction, since it can also be done using the tools already available in D: contracts via interfaces or abstract base-classes.

>> Lot's of questions, and I hope you can give them serious consideration, Walter.
> 
> 
> I think it's good to be sure only when there's a solid basis.

Yes, I agree. Does that work both ways (serious question) ?

- Kris

February 08, 2007

Re: compile-time regex redux

Posted by kris
in reply to Walter Bright

kris

Posted in reply to Walter Bright

Walter Bright wrote:
> Walter Bright wrote:
> 
>> kris wrote:
>>
>>> Surely some of the others long-term concerns, such as solid debugging support, simmering code/dataseg bloat, lib support for templates, etc, etc, should deserve full attention instead? Surely that is a more successful approach to getting D adopted in the marketplace?
>>
>>
>> Those are all extremely important, too.
> 
> 
> I wish to add that if you look at the changelog, the bread and butter issues (see the list of bugs fixed) get a solid share of attention.

Yes, you're quite right, and I hope there was no impression given to the contrary.

The concern for many of us is purely about D getting mainstream traction. That's why we keep harping on about things that matter to the majority of developers :)

I hate to say this (because it's somewhat tricky) but I suppose it's perhaps a question of priorities? Are compile-time features so much more important than things that limit adoption of D as it is today?

D is already /jam-packed/ with features. Some of the existing ones are broken, and have been for a long time. Some of them hinder adoption. If you were a potential D user, what what you prefer to see happen?

Far be it from me, or any of us, to dictate priority; but you surely have to see that mass adoption of D ain't gonna happen because of exotic compile-time features, when run-of-the-mill issues are prevalent?

- Kris

February 08, 2007

Re: compile-time regex redux

Posted by Bill Baxter
in reply to Andrei Alexandrescu (See Website For Email)

Bill Baxter

Posted in reply to Andrei Alexandrescu (See Website For Email)

Andrei Alexandrescu (See Website For Email) wrote:
> kenny wrote:
>> Walter, I don't hate regex -- I just don't use it. It seems to me that to figure out regex syntax takes longer than writing quick for/while statements, and I usually forget cases in regex too...
> 
> I think this is an age-old issue: if you don't know something, you find it harder to do things that way. The telling sign is that people who know _both_ simple loops and regexes do use regexes, and as a consequence are way more productive at a certain category of tasks.

Hmm.  More productive, probably.   Writing better code?  Not clear.  I would guess that in many cases the results are not as easy to maintain as non-regexp code.

Anyway, I think the question is whether compile-time regexp is really the right level of abstraction to be targeting.  Wouldn't it be infinitely better to have the compile-time code facilities be so good that you could just write a regexp parser as a compile-time D library?

I mean what is regexp, but a particular DSL?  If the new facilities are trying to make DSL's easier to create, regexp is a great target DSL.  So what compile-time language facilities do you need to implement an efficient and clean compile-time regexp library?

It would be nice if we could write more-or-less generic D code with a few compile time restrictions.  For instance you can write any function you want that takes only const values as arguments and returns a const value, and refers to only global const values and other such const-only functions.

--bb

February 08, 2007

Re: compile-time regex redux

Posted by Andrei Alexandrescu (See Website For Email)
in reply to Bill Baxter

Andrei Alexandrescu (See Website For Email)

Posted in reply to Bill Baxter

Bill Baxter wrote:
[snip]
> I my opinion about regexps is that they're too dense and full of abbreviations.  And the typical methods for creating them don't encourage encapsulation and abstraction, which are the foundations of software.  For instance, every time you look at the above you have to re-interpret what [A-Z0-9._%-] really means.  When I'm writing regular expressions I always have to have that chart next to me to remember all those \s \b \w \S \W \ codes, and then again when trying to figure out what the code does later.  There has to be a better way.  Apparently the Perl guys thing so too, because they're redoing regular expressions completely for Perl 6.

(Well not completely.) That's why we should keep a close eye on those. The Perl community is much more experienced with regex usage than me and possibly yourself. I just want us to not delude ourselves with the idea that we could just sit down and write a better regex syntax just because we don't remember what \s and \b mean. (I happen to remember. :o))


Andrei

February 08, 2007

Re: compile-time regex redux

Posted by kris
in reply to Bill Baxter

kris

Posted in reply to Bill Baxter

Bill Baxter wrote:
> Andrei Alexandrescu (See Website For Email) wrote:
> 
>> kenny wrote:
>>
>>> Walter, I don't hate regex -- I just don't use it. It seems to me that to figure out regex syntax takes longer than writing quick for/while statements, and I usually forget cases in regex too...
>>
>>
>> I think this is an age-old issue: if you don't know something, you find it harder to do things that way. The telling sign is that people who know _both_ simple loops and regexes do use regexes, and as a consequence are way more productive at a certain category of tasks.
> 
> 
> Hmm.  More productive, probably.   Writing better code?  Not clear.  I would guess that in many cases the results are not as easy to maintain as non-regexp code.
> 
> Anyway, I think the question is whether compile-time regexp is really the right level of abstraction to be targeting.  Wouldn't it be infinitely better to have the compile-time code facilities be so good that you could just write a regexp parser as a compile-time D library?
> 
> I mean what is regexp, but a particular DSL?  If the new facilities are trying to make DSL's easier to create, regexp is a great target DSL.  So what compile-time language facilities do you need to implement an efficient and clean compile-time regexp library?
> 
> It would be nice if we could write more-or-less generic D code with a few compile time restrictions.  For instance you can write any function you want that takes only const values as arguments and returns a const value, and refers to only global const values and other such const-only functions.
> 
> --bb

bump+

February 08, 2007

Re: compile-time regex redux

Posted by Andrei Alexandrescu (See Website For Email)
in reply to kris

Andrei Alexandrescu (See Website For Email)

Posted in reply to kris

kris wrote:
> Andrei Alexandrescu (See Website For Email) wrote:
>> kris wrote:
>>> compile-time regex is only part of the picture. A small one too. I rather expect we'd wind up finding the manner it was exposed was just too limiting in one way or another. Exposing, as was apparently suggested, the full API of RegExp inside the compiler sounds a tad distasteful.
>>
>>
>> Au contraire, I think it's a definite step in the right direction. Writing programs that write programs is a great way of doing more with less effort. Various languages can do that to various extents, and it's very heartening that D is taking steps in that direction. Allowing the programmer to manipulate strings during compilation is definitely a good step.
> 
> You're saying that a 'normal' D program is not sufficiently powerful to write other programs?  Au contraire!  There's nothing wrong with doing that at runtime, rather than turning the compiler itself into an abstract virtual machine?

You don't understand. Code generation is only interesting when the generated code works together with handwritten code and lives within the same symbolic ecosystem.

>>> You'll perhaps forgive me if I question whether this is driven primarily from an academic interest?  What I mean is this: if and when D goes mainstream, perhaps just one in ten-thousand developers will actually use this kind of feature more than 5 times (and still find themselves limited). Perhaps I'm being generous with those numbers also?
>>
>>
>> Perhaps, just like me, you simply aren't in the position to evaluate them. I will notice, however, a few historical trends. C++ got a shot in the arm from the STL. STL = advanced programming. Interesting. The STL did much to educate the C++ community towards code generation, which continues to be the reason why many influential gurus hang out with C++.
> 
> Are you saying that adding regex support at compile-time will take the world by storm? I hope not, because STL and its ilk are about productivity for a mass audience. Not for the few who work with DSL on a regular basis. Besides D is perfectly capable of DSL handling at runtime; there's just no overpowering need for it to do that at /compile-time/.

I tried to provide solid argumentation and/or evidence for my statements. The post I replied to originally, and the post I am replying to now, use rhetoric and bare statements somehow implying they are drawn from common knowledge. I will not simply agree with a bare statement claiming that this is not needed or that is not necessary.

> We might as well be discussing whether the compiler should embed a GUI generator. So it can be used at compile-time. There are better ways of doing that. Ways that are more accessible, more maintainable, and have a much easier learning curve. The OSX GUI builder is one fine example.
> 
> 
>> To survive, D must compensate for its relative lack of clout and publicity by offering above and beyond what more mainstream languages offer.
> 
> To survive, D needs to get serious about being taken seriously.

Exactly what is anyone to make of this? Reminds me of Jerome K. Jerome: "Never do something shameful, my son," the mother said, "and then you'll never be ashamed of what you did."

> Concentrating on what /might/ become a niche ideal doesn't strike me as the best approach. Compare and constrast with, say, targeting D for cell-phone devices? It's the biggest market on the planet, just sitting there /waiting/ for D to come along with the right set of features.

Sure if there are abstractions of interest to embedded programs they are worth discussing. But again, leaving it at the level of Zen statements is only empty rhetoric.

>>> What is wrong with runtime execution anyway? It sure is easier to write and maintain clean D code than (for many ppl) complex concepts that are, what amount to, nothing more than runtime optimizations. Isn't that true?
>>
>>
>> No. Accommodating DSLs and generating code has more to do with correctness and avoiding duplication of source code, than anything else.
> 
> Yes and no. I will defer, of course, to your experience in the matter; but will note that there's /always/ a point of diminishing return.

To this I'll insert the obligatory answer that that's a truism.

>> a "white hole" class is an implementation of A that implements all methods to throw, and a "black hole" class is an implementation of A that implements all methods to return the default value of the return type.
>>
>> This pattern is very useful for either quick starting points for writing true classes implementing A, or as standalone degenerate implementations.
>>
>> To some programmers, black and white holes might not even raise a "duplicated code" flag. They sit down and write:
> 
> That could reasonably be argued as a point of diminishing return? If it takes more effort or knowledge of how to abstract the pattern from two or more concepts, and to implement it using something unfamiliar, then 99% of developers will ignore it completely.

I think any developer can write an alias statement. The point of the example was to illustrate how an advanced feature can be used by an expert to democratize efficient development.

Abstraction is hard, no question about that. (Another truism. :o)) The problem is that often, even when the abstraction is understood, reflecting it in code is a complete mess.

Walter gave another good case study: Ruby on Rails. The success of Ruby on Rails has a lot to do with its ability to express abstractions that were a complete mess to deal with in concreteland.

> I fully agree that /idealistically/ such patterns would be nice to have, but that's not reality.

I'm not sure what this statement is based on. For example, there are Perl libraries for white holes and black holes:

http://cpan.uwinnipeg.ca/htdocs/Class-BlackHole/Class/BlackHole.html
http://cpan.uwinnipeg.ca/htdocs/Class-WhiteHole/Class/WhiteHole.html

> And I can't see how this could possibly help D get notable traction, since it can also be done using the tools already available in D: contracts via interfaces or abstract base-classes.

This reflects misunderstanding of the stakes. Interfaces and abstract base classes are a recipe for _handwritten_ code. black_hole and white_hole are tools that generate code _mechanically_.

>>> Lot's of questions, and I hope you can give them serious consideration, Walter.
>>
>>
>> I think it's good to be sure only when there's a solid basis.
> 
> Yes, I agree. Does that work both ways (serious question) ?

I did my best to explain the basis of my opinions.


Andrei

February 08, 2007

Re: compile-time regex redux

Posted by Bill Baxter
in reply to Andrei Alexandrescu (See Website For Email)

Bill Baxter

Posted in reply to Andrei Alexandrescu (See Website For Email)

Andrei Alexandrescu (See Website For Email) wrote:
> Bill Baxter wrote:
> [snip]
>> I my opinion about regexps is that they're too dense and full of abbreviations.  And the typical methods for creating them don't encourage encapsulation and abstraction, which are the foundations of software.  For instance, every time you look at the above you have to re-interpret what [A-Z0-9._%-] really means.  When I'm writing regular expressions I always have to have that chart next to me to remember all those \s \b \w \S \W \ codes, and then again when trying to figure out what the code does later.  There has to be a better way.  Apparently the Perl guys thing so too, because they're redoing regular expressions completely for Perl 6.
> 
> (Well not completely.) That's why we should keep a close eye on those. The Perl community is much more experienced with regex usage than me and possibly yourself. I just want us to not delude ourselves with the idea that we could just sit down and write a better regex syntax just because we don't remember what \s and \b mean. (I happen to remember. :o))

Yes and I don't want us to go and make Perl5-ish regular expressions part of the core D language spec without understanding how and why that very expert Perl community is changing their regular expressions in the next round.  I haven't followed developments with Perl 6 closely, though.  Just glanced at the link someone posted the other day.

I also don't want us to go make regexp part of the language spec without thoroughly ruling out the potentially much cooler ability to write that regexp parser using more fundamental but yet-to-be-invented building blocks.

--bb

February 08, 2007

Re: compile-time regex redux

Posted by Andrei Alexandrescu (See Website For Email)
in reply to Bill Baxter

Andrei Alexandrescu (See Website For Email)

Posted in reply to Bill Baxter

Bill Baxter wrote:
> Andrei Alexandrescu (See Website For Email) wrote:
>> Bill Baxter wrote:
>> [snip]
>>> I my opinion about regexps is that they're too dense and full of abbreviations.  And the typical methods for creating them don't encourage encapsulation and abstraction, which are the foundations of software.  For instance, every time you look at the above you have to re-interpret what [A-Z0-9._%-] really means.  When I'm writing regular expressions I always have to have that chart next to me to remember all those \s \b \w \S \W \ codes, and then again when trying to figure out what the code does later.  There has to be a better way.  Apparently the Perl guys thing so too, because they're redoing regular expressions completely for Perl 6.
>>
>> (Well not completely.) That's why we should keep a close eye on those. The Perl community is much more experienced with regex usage than me and possibly yourself. I just want us to not delude ourselves with the idea that we could just sit down and write a better regex syntax just because we don't remember what \s and \b mean. (I happen to remember. :o))
> 
> Yes and I don't want us to go and make Perl5-ish regular expressions part of the core D language spec without understanding how and why that very expert Perl community is changing their regular expressions in the next round.  I haven't followed developments with Perl 6 closely, though.  Just glanced at the link someone posted the other day.

I did. Perl 6 is going to be great. The regexes had a few warts that were overdue for a fix. The spirit remains the same, and the new full-fledged grammars will take care of the larger parsing tasks.

> I also don't want us to go make regexp part of the language spec without thoroughly ruling out the potentially much cooler ability to write that regexp parser using more fundamental but yet-to-be-invented building blocks.

I think that's a great spirit.


Andrei

February 08, 2007

Re: compile-time regex redux

Posted by Andrei Alexandrescu (See Website For Email)
in reply to Bill Baxter

Andrei Alexandrescu (See Website For Email)

Posted in reply to Bill Baxter

Bill Baxter wrote:
> Andrei Alexandrescu (See Website For Email) wrote:
>> kenny wrote:
>>> Walter, I don't hate regex -- I just don't use it. It seems to me that to figure out regex syntax takes longer than writing quick for/while statements, and I usually forget cases in regex too...
>>
>> I think this is an age-old issue: if you don't know something, you find it harder to do things that way. The telling sign is that people who know _both_ simple loops and regexes do use regexes, and as a consequence are way more productive at a certain category of tasks.
> 
> Hmm.  More productive, probably.   Writing better code?  Not clear.  I would guess that in many cases the results are not as easy to maintain as non-regexp code.

I don't think the guess is that right. Following the logic of even a simple parsing task (e.g. floating-point number in all of its splendor) is horrendous. For somebody who knows regexes, the pattern is obvious in a second.

I do agree that code written by somebody who knows regexes is hard-to-maintain by somebody who does not know regexes, but that's pretty much self-understood and goes with any other technique.

All I can say is that I got significantly enriched and more effective as a programmer at large after I sat down and understood Perl's regex bestiary. I now see my previous arguments against them as rationalizations of my resistance to go through the effort of learning. Again comparing myself with my former self, I understand it's hard to discuss relative advantages and disadvantages with someone who doesn't know them because of a bootstrap problem: I say they make code much simpler and easier to comprehend, while my former self would say exactly the opposite. It's pretty much like math notation, eating vegetables, or classical music: it's hard to bootstrap oneself into appreciating it.

> Anyway, I think the question is whether compile-time regexp is really the right level of abstraction to be targeting.  Wouldn't it be infinitely better to have the compile-time code facilities be so good that you could just write a regexp parser as a compile-time D library?

This is possible in today's D. The problem is that it would be a Pyrrhic victory: the resulting engine would be very slow and big.

I do agree that it would be nice to look into creating compile-time amenities that make such an engine fast and small.

> I mean what is regexp, but a particular DSL?  If the new facilities are trying to make DSL's easier to create, regexp is a great target DSL.  So what compile-time language facilities do you need to implement an efficient and clean compile-time regexp library?

Conceptually, you'd need the following: (1) compile-time functions, (2) compile-time mutable variables, and (3) compile-time loops. We already have the rest. Then you can write compile-time code as comfortably as writing run-of-the-mill run-time code. D is heading that way, but with small steps.

Implementation-wise, string-based templates must be made cheaper. If we'll have compile-time mutation probably this is not going to be much of a problem because much functional-style code can be written using mutation. I personally enjoy functional-style code, but it's not really needed during compilation and is a bit foreign from the rest of D, which remains largely imperative.

> It would be nice if we could write more-or-less generic D code with a few compile time restrictions.  For instance you can write any function you want that takes only const values as arguments and returns a const value, and refers to only global const values and other such const-only functions.

Templates already do that, albeit with a slightly odd syntax. But stay tuned, Walter is eyeing $ as the prefix to denote compile-time variables, and sure enough, compile-time functions will then emerge naturally :o).

Andrei

February 08, 2007

Re: compile-time regex redux

Posted by Walter Bright
in reply to Andrei Alexandrescu (See Website For Email)

Walter Bright

Posted in reply to Andrei Alexandrescu (See Website For Email)

Andrei Alexandrescu (See Website For Email) wrote:
> Walter gave another good case study: Ruby on Rails. The success of Ruby on Rails has a lot to do with its ability to express abstractions that
> were a complete mess to deal with in concreteland.

I found this essay to be pivotal in piquing my interest in this:

http://www.paulgraham.com/avg.html

and a related one:

http://lib.store.yahoo.net/lib/paulgraham/bbnexcerpts.txt

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation