compile-time regex redux (page 6) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » compile-time regex redux (page 6)

February 08, 2007

Re: compile-time regex redux

Posted by Lars Ivar Igesund
in reply to Bill Baxter

Lars Ivar Igesund

Posted in reply to Bill Baxter

Bill Baxter wrote:

> Andrei Alexandrescu (See Website For Email) wrote:
>> kenny wrote:
>>> Walter, I don't hate regex -- I just don't use it. It seems to me that to figure out regex syntax takes longer than writing quick for/while statements, and I usually forget cases in regex too...
>> 
>> I think this is an age-old issue: if you don't know something, you find it harder to do things that way. The telling sign is that people who know _both_ simple loops and regexes do use regexes, and as a consequence are way more productive at a certain category of tasks.
> 
> Hmm.  More productive, probably.   Writing better code?  Not clear.  I would guess that in many cases the results are not as easy to maintain as non-regexp code.
> 
> Anyway, I think the question is whether compile-time regexp is really the right level of abstraction to be targeting.  Wouldn't it be infinitely better to have the compile-time code facilities be so good that you could just write a regexp parser as a compile-time D library?
> 
> I mean what is regexp, but a particular DSL?  If the new facilities are trying to make DSL's easier to create, regexp is a great target DSL.  So what compile-time language facilities do you need to implement an efficient and clean compile-time regexp library?
> 
> It would be nice if we could write more-or-less generic D code with a few compile time restrictions.  For instance you can write any function you want that takes only const values as arguments and returns a const value, and refers to only global const values and other such const-only functions.
> 
> --bb

I very much agree with Bill here, because that will create true power to such features in a much wider space. In addition, it was mentioned early on which regex syntax should be used - and then we're onto the fact that regex'es look different and operate different in different settings - some are enhanced with this, and some with that (and many seems to think std.regex is somewhat bad behaved). Having this easily implementable as (compile time) libraries would allow for (or at least more likely inspire) as many as users would feel is needed.

-- 
Lars Ivar Igesund
blog at http://larsivi.net
DSource & #D: larsivi
Dancing the Tango

February 08, 2007

Re: compile-time regex redux

Posted by Lars Ivar Igesund
in reply to Walter Bright

Lars Ivar Igesund

Posted in reply to Walter Bright

Walter Bright wrote:
> 
>> Surely some of the others long-term concerns, such as solid debugging support, simmering code/dataseg bloat, lib support for templates, etc, etc, should deserve full attention instead? Surely that is a more successful approach to getting D adopted in the marketplace?
> 
> Those are all extremely important, too.

I tend to think that these are important enough, together with a whole slew of other language improvements, for compile-time regex to be pushed back to post 2.0 (if they still are "needed" then!).

-- 
Lars Ivar Igesund
blog at http://larsivi.net
DSource & #D: larsivi
Dancing the Tango

February 08, 2007

Re: compile-time regex redux

Posted by BCS
in reply to kenny

BCS

Posted in reply to kenny

Reply to Kenny,

> BCS wrote:
> 
>> As I see it the biggest problem with compile time parsing in D is
>> that building non linear structure is a pain. Tuples implicitly cated
>> when passed together an this make some things really hard. Allowing a
>> tuple to be a member of another tuple would put D template in the
>> same class as LISP.
>> 
>> Another things that might make things easier is some way to mark a
>> template as "evaluate to value and abandon". This would cause the
>> template to be processed but none of the symbols generated by it
>> would be kept, only the value. Of course, suitable restrictions would
>> apply.
>> 
> so it would be like writing normal D code inside of a template? Could
> we use phobos or do more metastring functions like find, strip, etc.
> need to be reinvented in D. Or can I take those functions and just put
> a wrapper around it? I think this is getting into the security issues
> again, but something like this:
> 

Errr. I was thinking of having this apply to the const folding stuff. Nothing that looks like runtime code would be allowed.

The point would be to reduce the overhead of evaluating stuff. After thinking about it, I'm not sure that this isn't already done for template that consist of only const declarations.

This would result in exactly one string being added to the compile time data set.

meta tempate rmwhite(char[] c)
{
 static if(c[0] != ' ')
   const rmwhite = c;
 else
   const rmwhite = rmwhite!(c[1..0]);
}

February 08, 2007

DeRailed DSL (was Re: compile-time regex redux)

Posted by kris
in reply to Andrei Alexandrescu (See Website For Email)

kris

Posted in reply to Andrei Alexandrescu (See Website For Email)

Andrei Alexandrescu (See Website For Email) wrote:
> kris wrote:
> 
>> Robby wrote:
>>
>>>
>>>>>
>>>>> Walter gave another good case study: Ruby on Rails. The success of Ruby on Rails has a lot to do with its ability to express abstractions that were a complete mess to deal with in concreteland.
>>>>>
>>>>
>>>> Let look at that case study, then. The /real/ power in RoR comes from being able to dynamically bind via rich reflection. What we're talking about here does not add full reflection to D. Neither does it assist in getting D modules dynamically loaded at runtime.
>>>>
>>>> As it turns out, some of us are actively looking /specifically/ at the killer RoR for D; far beyond what RoR does. Oddly enough, our working name for it is - DeRailed -
>>>>
>>>> We have solid notions of what's needed; and several of us have build related platforms in the past. But this topic, at face value, doesn't appear to help us in any notable fashion. Perhaps you can expain this further?
>>>>
>>>> - Kris
>>>
>>>
>>>
>>> I'm having a hard time putting together the association with RoR, DSL's and the regex feature together. Perhaps they're completely separate.
>>
>>
>> Me too. I failed to see any connection that would measurably assist DeRailed. And the question above was sadly left unaddressed.
> 
> 
> It's very simple. A scheme based on compile-time in(tro)spection has superior and automatic means to detect, say, mismatches between an expected database schema and the runtime reality.

Let's step back for a moment, please?

In a practical sense, the user/developer cares mostly about the extent and capability of the development facilities exposed. Yes? That includes the whole edit, compile, debug, edit cycle along with the quality of the tools and environment presented.

Whether the scheme you mention is implemented at compile-time or at runtime has little bearing in the overall practical picture; e.g. as long as the cycle is short, intuitive and effective, either approach works. At that point, it's all about practical tradeoffs instead of theoretical one-upmanship?

For example, having a DSL go and hit a database at /compile time/ sounds like an appalling reduction in /perceived/ compiler efficiency. If I have to hit the DB for every single compilation of each module with such a DSL embedded, I will simply discard the toolset. That approach would be borderline insanity :p

One might argue that such a DSL design is incorrect? OK; then what about the security aspects? Your example is talking about a DSL that can be verified at compile-time; against a database scheme. Yes? How could that possibly be /permitted/ to run at compile time? It's a /gaping/ security issue. With sandboxing, any compiler 'extension' would likely have to eschew OS handles, which means no DB access, no network access, no file access, and no registry access. How does your example operate under such conditions? I fail to see that there's any practical thought behind the example given, and sincerely hope you can correct that?

Lastly: the example you give fails to meet the criteria of "measurably assist DeRailed". Even if it /were/ feasible from a security and compile-cycle efficiency standpoint, it still would have little bearing on the overall productivity of a user/developer (using DeRailed). In other words, there's a whole lot of pain for very little gain. What there is of value quickly vanishes against the far greater concerns over full-on reflection and dynamic linking.

- Kris

February 08, 2007

Re: DeRailed DSL (was Re: compile-time regex redux)

Posted by Andrei Alexandrescu (See Website For Email)
in reply to kris

Andrei Alexandrescu (See Website For Email)

Posted in reply to kris

kris wrote:
> For example, having a DSL go and hit a database at /compile time/ sounds like an appalling reduction in /perceived/ compiler efficiency. If I have to hit the DB for every single compilation of each module with such a DSL embedded, I will simply discard the toolset. That approach would be borderline insanity :p

Probably we haven't worked in the same environments. In the large database systems I worked with (Chase Manhattan), the schema and the basic views change very rarely, and whenever that happens, the stored procedures are spilled automatically in text files that can be read by the build process. It was entirely reasonable to recompile the system whenever that happened, and it was highly desirable to fix mismatches before the system runs.

In other cases, a dynamic approach does better. Dynamic has always been more flexible, no doubt about that. The point is that each approach has its advantages and disadvantages.

I can also do without the belligerent tone. Not knowing or not understanding does not automatically lend insanity on the interlocutor.


Andrei

February 08, 2007

Re: DeRailed DSL (was Re: compile-time regex redux)

Posted by kris
in reply to Andrei Alexandrescu (See Website For Email)

kris

Posted in reply to Andrei Alexandrescu (See Website For Email)

Andrei Alexandrescu (See Website For Email) wrote:
> kris wrote:
> 
>> For example, having a DSL go and hit a database at /compile time/ sounds like an appalling reduction in /perceived/ compiler efficiency. If I have to hit the DB for every single compilation of each module with such a DSL embedded, I will simply discard the toolset. That approach would be borderline insanity :p
> 
> 
> Probably we haven't worked in the same environments. In the large database systems I worked with (Chase Manhattan), the schema and the basic views change very rarely, and whenever that happens, the stored procedures are spilled automatically in text files that can be read by the build process. It was entirely reasonable to recompile the system whenever that happened, and it was highly desirable to fix mismatches before the system runs.

Yes, that's one approach (I wasn't trying to limit the perspective at all). Yet, the DSL would still have to read the spilled data files; at compile time? That's a blatant security breach, is it not? Or are you hinting that sort of thing is acceptable within specific organizations?

> 
> In other cases, a dynamic approach does better. Dynamic has always been more flexible, no doubt about that. The point is that each approach has its advantages and disadvantages.

Yes, I fully agree. It just that this discourse has been heavily slanted toward compile-time instead. Trade-offs are prevalent everywhere, and I honestly feel we're not getting a clear (fully unbiased) picture of what the pros and cons are.

> 
> I can also do without the belligerent tone. Not knowing or not understanding does not automatically lend insanity on the interlocutor.

Hrm, you have me all wrong, Andrei. I got the distinct impression that particular tone was pointed in my direction instead? I hope you'll let it go by, and focus on the more valuable aspects.

So how about it? What can DSL really do, in a practical sense, for someone using a D-based DeRailed environment? We /really/ do wish to hear what the options are, and your input on that subject is valuable ...

February 08, 2007

Re: DeRailed DSL (was Re: compile-time regex redux)

Posted by Sean Kelly
in reply to Andrei Alexandrescu (See Website For Email)

Sean Kelly

Posted in reply to Andrei Alexandrescu (See Website For Email)

Andrei Alexandrescu (See Website For Email) wrote:
> kris wrote:
>> For example, having a DSL go and hit a database at /compile time/ sounds like an appalling reduction in /perceived/ compiler efficiency. If I have to hit the DB for every single compilation of each module with such a DSL embedded, I will simply discard the toolset. That approach would be borderline insanity :p
> 
> Probably we haven't worked in the same environments. In the large database systems I worked with (Chase Manhattan)

Funny.  I worked in that building (Chase Manhattan Plaza 1) up through the end of 2000.  Was for a different firm though.

> the schema and the
> basic views change very rarely, and whenever that happens, the stored procedures are spilled automatically in text files that can be read by the build process. It was entirely reasonable to recompile the system whenever that happened, and it was highly desirable to fix mismatches before the system runs.

What I've done in the past is manage the entire schema, stored procedures and all, in a modeling system like ErWin.  From there I'll dump the lot to a series of scripts which are then applied to the DB. In this case, the DSL would be the intermediate query files, though parsing the complete SQL query syntax (since the files include transactions, etc), sounds sub-optimal.  I suppose a peripheral data format would perhaps be more appropriate for generating code based on the static representation of a DB schema.  UML perhaps?

My only concern here is that the process seems confusing and unwieldy: manage the schema in one tool, dump the data description in a meta-language to a file, and then have template code in the application parse that file during compilation to generate code.  Each of these translation points creates a potential for failure, and the process and code risks being incomprehensible and unmanageable for new employees.

Since you've established that the schema for large systems changes only rarely and that the changes are a careful and deliberate process, is it truly the best approach to attempt to automate code changes in this way?  I would think that a well-designed application or interface library could be modified manually in concert with the schema changes to produce the same result, and with a verifiable audit trail to boot.

Alternately, if the process were truly to be automated, it seems preferable to generate D code directly from the schema management application or via a standalone tool operating on the intermediate data rather than in preprocessor code during compilation.  This approach would give much more informative error messages, and the process could be easily tracked and debugged.

Please note that I'm not criticizing in-language DSL parsing as a general idea so much as questioning whether this is truly the best example for the usefulness of such a feature.

> In other cases, a dynamic approach does better. Dynamic has always been more flexible, no doubt about that. The point is that each approach has its advantages and disadvantages.

I would think it is perhaps worth comparing the two here since they can both be used for the same thing (ie. customizing an application for a database), and prior discussion had already mentioned RoR as a motivating factor for these features?  Or perhaps I misunderstood.

I'll grant that the comparison isn't entirely fair because Ruby is a dynamic language while D is a static language, but since the tasks the new import/mixin intend to solve are essentially a compile-time equivalent of what is done in Ruby at run-time (as I understand it anyway--I don't have much Ruby experience), then the utility of each approach can perhaps be weighed against the other in an attempt to understand the situations where the D approach may or may not be appropriate?

> I can also do without the belligerent tone. Not knowing or not understanding does not automatically lend insanity on the interlocutor.

No offense, but this statement is a bit patronizing.  I think Kris was merely attempting to explain his position?

Sean

February 09, 2007

Re: compile-time regex redux

Posted by janderson
in reply to Walter Bright

janderson

Posted in reply to Walter Bright

Walter Bright wrote:
> Bill Baxter wrote:
>> That would help I suppose, but at the same time regexps themselves have a tendancy to end up being 'write-only' code.  The heavy use of them in perl is I think a large part of what gives it a rep as a write-only language.   Heh heh.  I just found this regexp for matching RFC 822 email addresses:
>>     http://www.regular-expressions.info/email.html
>> (the one at the bottom of the page)
> 
> I agree that non-trivial regexes can be pretty intimidating - but writing templates to do the same will be even more intimidating.

Agreed!  They are both intimidating which is not a good thing.  There has to be a better way.

-Joel

February 09, 2007

Re: compile-time regex redux

Posted by janderson
in reply to kenny

janderson

Posted in reply to kenny

kenny wrote:
> BCS wrote:
>> Walter Bright wrote:
>>> kenny wrote:
>>>
>>>> I know I'm asking for a lot, but the way templates handle string are still kinda weird to me. Would string parsing in this sort of way be absolutely impossible with templates? I have not had good luck with it. 
>>>
>>>
>>> I just haven't thought about this enough. Certainly, however, solving the problem in a more general, D-ish way than regex would be a much bigger win. Regex works only for a subset of problems (can't do recursive descent parsing with it).
>>
>> As I see it the biggest problem with compile time parsing in D is that building non linear structure is a pain. Tuples implicitly cated when passed together an this make some things really hard. Allowing a tuple to be a member of another tuple would put D template in the same class as LISP.
>>
>> Another things that might make things easier is some way to mark a template as "evaluate to value and abandon". This would cause the template to be processed but none of the symbols generated by it would be kept, only the value. Of course, suitable restrictions would apply.
> 
> so it would be like writing normal D code inside of a template? Could we use phobos or do more metastring functions like find, strip, etc. need to be reinvented in D. Or can I take those functions and just put a wrapper around it? I think this is getting into the security issues again, but something like this:
> 
> auto my_text = meta trim_whitespace(import("myxml.xml")); // obviously a better keyword should be used, and can only be used in global scope too.
> 
> where trim_whitespace is an actual function that is actually defined up in the file somewhere, and will be compiled, used on the import file, then discarded and the result stored into auto my_text? A day or so ago, someone mentioned a .rc compiler. I personally would use it to parse stuff for interface elements .. to generate them dynamically. We already have libraries to parse XML (XHTML), CSS, and other config type things. 

I don't see any security issues if the compile-time language is restrictive enough.  You could still load in files using the new import command.

> It would be SUPER AWESOME to be able to just re-use them as a template
> without any other extra work!!! Oh man, that has me really excited, if
> that's possible :)
>
> woah!

I agree, this is only a sub-set of what could be done.  You could even try out / invent new language features for D before suggesting them on the newsgroup and post the code.  Of course that may be a possibility with regex as well.

-Joel

February 09, 2007

Re: compile-time regex redux

Posted by janderson
in reply to Andrei Alexandrescu (See Website For Email)

janderson

Posted in reply to Andrei Alexandrescu (See Website For Email)

Andrei Alexandrescu (See Website For Email) wrote:
> Bill Baxter wrote:
> 
> Templates already do that, albeit with a slightly odd syntax. But stay tuned, Walter is eyeing $ as the prefix to denote compile-time variables, and sure enough, compile-time functions will then emerge naturally :o).
> 
> 
> Andrei

While its good that Walter is considering compile-time variable.  I don't see why you need a symbol $ inside a template.  Of course if your going to use them out side then you do.  I think template code could look almost the same as normal code, which would make it much more writable/readable and reusable.

-Joel

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation