February 17, 2006
"Georg Wrede" <georg.wrede@nospam.org> wrote in message news:43F530AC.9010101@nospam.org...
> Syntactic sugar is ok in general. But not "semantic" or "hieroglyphic" sugar. Let's see how the brand new stuff works, and whether any additional sugar ever becomes needed here!

I think the $` is pretty much dead now <g>.


February 17, 2006
"Walter Bright" <newshound@digitalmars.com> wrote...
>
> "Kris" <fu@bar.com> wrote in message news:dt3cc2$1pc7$1@digitaldaemon.com...
>>> It sucks in C, and why do I say that? I've shipped a C compiler for 22 years now, and not once, not ever, did anyone ask for a regex library for it. Regex wasn't put in the C standard, or the C++ one. Yet regex is considered a core capability of several other languages. There are many ways to interpret that - I am interpreting it as meaning that regex sucks in C, and so people seem to just never even think of using C when they need to process strings.
>>
>> I'm surprised that you'd interpret it that way. I've used regex in C for decades. There was one great implementation from, uhhh, Ian somebody from Edinburgh Uni, which generated x86 code on the fly. I used that to great effect ~ a truly impressive utility.
>
> How do you interpret the fact that it has failed to gain traction among the general C population?

I noted a few reasons previously, regarding differing approaches and mindsets between script developers and systems developers. Even when the same person does both. George Wrede just posted some very similar reasoning too. The upshot is that (IMO) the general C population rarely have a compelling need for regex. Where regex might seem (perhaps mistakenly) like using a sledgehammer to crack a nut in C, it's usage is often not given a second thought in scripts.

Speaking personally, I don't expect high performance out of a script, and don't give two hoots about Q & D hacking therein. That's not the case with systems-programming (for me), where I'm likely to use something more lightweight as appropriate. On the other hand, I've written a lot of the type of code that really benefits from the state-machinery exposed by a good regex engine. Other times I've hand-tuned my own state-machines to do the work instead. Sometimes in assembly.

As noted previously, I don't think it's a question of visibility at all ~ more a question of task, applicability, priorities, and various other cost factors.

One has to wonder how much script-regex actually leverages the power within? I'd bet a large % are completely trivial. The kind which can easily be handled by other (more efficient) means in systems languages.


February 17, 2006
Walter Bright wrote:
> "David Medlock" <noone@nowhere.com> wrote in message news:dt2mpk$17aj$1@digitaldaemon.com...
> 
>>I havent read this whole thread, but pardon if this has been suggested.
>>Why doesnt the regular expression stuff use foreach?
> 
> 
> Why, indeed. Oskar has brought it up, and he and you are right. I'm going to reevaluate this based on the feedback in this thread. 
> 
> 

I agree with the "foreach" point/suggestion  ..
IMO building regex into the language to the point where a ~~ expressions automatically generates a "_match" variable is just going too far. a Match struct/class and a foreach implementation makes it much more consistent and clean.
February 17, 2006
Walter Bright wrote:
> 
> I'm just not getting it - why should it be removed? There never was a plan to remove it. And why would an implementation of a D runtime library not want to do a regex implementation? Of course, it's a lot of work to implement a regex, but one can just copy over std.RegExp and use/adapt it as required, as the license allows that. So I am just not getting what the problem is. 

Perhaps I'm being idealistic, as I simply don't believe the runtime should rely on standard library code.  Up to now that's been achievable, but the solution for this particular feature is less clear.  But I'll drop the issue for now and mull it over a bit.


Sean
February 17, 2006
"Georg Wrede" <georg.wrede@nospam.org> wrote in message news:43F53BE5.8020900@nospam.org...
> Using regexps in C needs a total change of paradigm. Regexps are kind of "top down" things, wherease traditionally "peeking into strings" is bottom-up programming.
>
> You'd also have to learn regexps. The trivial things are trivial in C-style too, and the non-trivial stuff gets avoided because of the up-front investment. Folks rather do nested ifs and stuff.
>
> Conversely, many interpreted languages make it inefficient to do "peek" kind of programming, as compared to using regexps.

There are a lot of cool things you can do in script languages because they are interpreted, and one doesn't care about efficiency. Those things are simply incompatible with D. But I don't see any inherent advantages script languages should have in implementing regex.


February 17, 2006
Kris wrote:
> "Walter Bright" <newshound@digitalmars.com> wrote ...
>>
>> RegExp could probably remove its dependence on OutBuffer, though.
> 
> Probably. On the same topic, you've often 'lectured' about the need to decouple such that the "libraries don't end up like Java" . Yet RegExp imports String too, which in turn imports all these (std.format in particular):
> 
> private import std.stdio;
> private import std.utf;
> private import std.uni;
> private import std.array;
> private import std.format;
> private import std.ctype;
> private import std.stdarg;
> 
> It's quite easy to eliminate OutBuffer and String from RegExp. There's an adjusted version of it in circulation, if you'd like to forego the effort.

For what it's worth, the latest release of Ares trims a lot of fat out of std.string, so far as runtime dependencies are concerned.  The only modules that are actually required by some portion of the runtime are:

std.ctype
std.outbuffer
std.regexp
std.string
std.utf

And outbuffer should be easy enough to remove from this list.  I'd have continued to use your modified std.regexp for this release except the deltas between the 146 and 147 versions of std.regexp were tremendous. It would have taken hours to sort out a workable merge of that file, so falling back on the new Phobos version seemed preferable.

>> It sucks in C, and why do I say that? I've shipped a C compiler for 22 years now, and not once, not ever, did anyone ask for a regex library for it. Regex wasn't put in the C standard, or the C++ one. Yet regex is considered a core capability of several other languages. There are many ways to interpret that - I am interpreting it as meaning that regex sucks in C, and so people seem to just never even think of using C when they need to process strings.
> 
> I'm surprised that you'd interpret it that way. I've used regex in C for decades. There was one great implementation from, uhhh, Ian somebody from Edinburgh Uni, which generated x86 code on the fly. I used that to great effect ~ a truly impressive utility. 

That sounds pretty cool.


Sean
February 17, 2006
"Kris" <fu@bar.com> wrote in message news:dt3foh$1s1d$1@digitaldaemon.com...
> I noted a few reasons previously, regarding differing approaches and mindsets between script developers and systems developers. Even when the same person does both. George Wrede just posted some very similar reasoning too. The upshot is that (IMO) the general C population rarely have a compelling need for regex. Where regex might seem (perhaps mistakenly) like using a sledgehammer to crack a nut in C, it's usage is often not given a second thought in scripts.

This might be a circular result - people don't use regex in C because regex's suck in C, so there is no incentive to improve it because there aren't any users. People just get used to going to another language to use regex, and never stop to think it doesn't have to be that way.

> Speaking personally, I don't expect high performance out of a script, and don't give two hoots about Q & D hacking therein. That's not the case with systems-programming (for me), where I'm likely to use something more lightweight as appropriate.

There's a lot of string processing work done in C that is not performance sensitive - like dealing with the command line arguments.

> On the other hand, I've written a lot of the type of code that really benefits from the state-machinery exposed by a good regex engine. Other times I've hand-tuned my own state-machines to do the work instead. Sometimes in assembly.

Sure. And building in some syntactic sugar for regex isn't going to sabotage optimization.

> One has to wonder how much script-regex actually leverages the power within? I'd bet a large % are completely trivial.

I agree with that.

> The kind which can easily be handled by other (more efficient) means in systems languages.

I'm not sure that efficiency is the only goal here - productivity is a big one, too, and one often uses regex in parts of the program that don't need performance. I know I sure get tired of strlen/strcmp/memcpy for routine non-performance-critical code.


February 17, 2006
"Sean Kelly" <sean@f4.ca> wrote in message news:dt3hev$1taf$1@digitaldaemon.com...
> Walter Bright wrote:
>>
>> I'm just not getting it - why should it be removed? There never was a plan to remove it. And why would an implementation of a D runtime library not want to do a regex implementation? Of course, it's a lot of work to implement a regex, but one can just copy over std.RegExp and use/adapt it as required, as the license allows that. So I am just not getting what the problem is.
>
> Perhaps I'm being idealistic, as I simply don't believe the runtime should rely on standard library code.  Up to now that's been achievable, but the solution for this particular feature is less clear.  But I'll drop the issue for now and mull it over a bit.

Consider that there's no way to implement C, D, etc., without some runtime library. Just doing a long divide relies on library code. There's the startup code (you can't just jmp to main()), shutdown code, exception handling support, etc.

C/C++ have gone the odd route of making the library *part of the language*, so, for example, a compiler can recognize strlen and replace it with custom code. To my mind this gives the worst of both worlds - no syntactic sugar and no library flexibility.


February 17, 2006
"Sean Kelly" <sean@f4.ca> wrote in message news:dt3nss$22bj$1@digitaldaemon.com...
> And outbuffer should be easy enough to remove from this list.  I'd have continued to use your modified std.regexp for this release except the deltas between the 146 and 147 versions of std.regexp were tremendous. It would have taken hours to sort out a workable merge of that file, so falling back on the new Phobos version seemed preferable.

Very little actually changed, what I did was resort the order so it was more appealing in Ddoc format, and add the Ddoc comments.


February 17, 2006
Walter Bright wrote:

> 
> "Oskar Linde" <olREM@OVEnada.kth.se> wrote in message news:dt1ccm$2ssg$1@digitaldaemon.com...
>> $.pre
>> $.post
>> $[0]
>> $[n]
>> (or $.match(n), but why not overload opIndex?)
> 
> That was the original plan, but when _match is of type T*, the [ ] cannot be overloaded.

So why does _match have to be a pointer? Would something like this not work? (from object.d, added void *_this, opIndex and changed this->_this)

/* ***************************** _Match **************************** */

/* **
 * Default type for _match.
 * Implemented as a proxy for RegExp, so that object doesn't pull in
 * the entire std.regexp.
 */

import std.regexp;

struct _Match
{
    void *_this;

    char[] match(size_t n)
    {
        return (cast(RegExp)_this).match(n);
    }

    char[] opIndex(size_t n)
    {
        return match(n);
    }

    _Match opNext()
    {
        RegExp r = (cast(RegExp)_this).opNext();
        if (r)
            return cast(_Match)_this;
        r = cast(RegExp)_this;
        delete r;
        return null;
    }

    char[] pre()
    {
        return (cast(RegExp)_this).pre();
    }

    char[] post()
    {
        return (cast(RegExp)_this).post();
    }
}

/Oskar