February 16, 2006
Walter Bright wrote:

> "Kris" <fu@bar.com> wrote in message news:dt0q7n$2cuo$1@digitaldaemon.com...

>> In retrospect, much of this should probably be handled via template usage (for the different UTF types). And the converter issue can be resolved by supporting some kind of assignable or plug-in module. All of this can be handled by a templated class. I attempted to do just this with your RegExp class, but ran into problems related to how patterns are stored in the "instruction" stream (size differences between char and dchar, for example).
> 
> I don't agree. The problem I ran into with this approach is the injection of the declaration _match into the current scope.

Have you considered making this more general? I.e. for all if statements, inject a variable that takes the value of the entire condition expression. (Using _result as a placeholder for such an identifier.)

if ("..." ~~ "...) {
  _result.match(0);
}

if (myFunc()) {
  _result.whatever();
}

Why should this behavior be reserved for opMatch() only? Isn't this a very common coding pattern that could also become less verbose by this:

SomeType result;
if ( (result = getSomething())) {
        doSomethingWith(result);
}

(becoming:

if (getSomething()) {
        doSomethingWith(_result);
}

)

One suggestion would be to call _result $. Giving $ the semantics of a "scope injected value". This would go hand in hand with an earlier suggestion of changing the $ for index operations too:

Assume [] introduces a new scope, then a $ within [] would refer to whatever is being indexed.

char[] cutHeadAndTail = myString[1 .. $.length-1];
Image subImage = myImage[$.upperLeft .. $.middle];
char[] contents = text[$.indexOf('{')+1 .. $.indexOf('}')];

/Oskar
February 16, 2006
In article <dt088e$1svm$2@digitaldaemon.com>, Walter Bright says...
>
>D dramatically improves the convenience of string handling over C++. But while I think using the library std.regexp is straightforward, obviously it just isn't gaining traction. People like the shortcut approaches Ruby and Perl use for regular expressions, hence the new D match-expression support.
>
>So, now we have:
>
>    if (regular_expression ~~ string)
>    {
>            _match.pre
>            _match.post
>            _match.match(n)
>    }

Fairly good.

>Should we do some aliases:
>
>    $` => _match.pre
>    $' => _match.post
>    $& => _match.match(0)
>    $n => _match.match(n)
>
>?

No.
That's why I hate perl. I have to look in the manual to know what the hell $`
means, and be carefult abou it being realli an ` and not a '.

>Syntactic sugar is often a good idea, but at what point do they become cyclamates and cause cancer in laboratory animals? Will these $ tokens render D more accessible, but perhaps too unreadable?

Yes.

All those $'$`$&$3 are useful only to make my eyes cross. If you want to use $ use it as an abbreviation of 'match', so you'll get:

$pre => _match.pre
$post => _match.post
$(0) => _match.match(0)
$(n) => _match.match(n)

So once I know that $ stands for 'match', I can easily argue what $pre, $post,
$(0) and $(3) stand for.

Ciao

---
http://www.mariottini.net/roberto/
February 16, 2006
Walter Bright wrote:

> Should we do some aliases:
> 
>     $` => _match.pre
>     $' => _match.post

` is not readily available on all keyboards. Some fonts also have problems
differentiating between the three Latin-1 ticks (` ' ´) (the straight tick
(apostrophe) (') looks like a right tick (acute accent) (´)  in many
fonts).

>     $& => _match.match(0)
>     $n => _match.match(n)

Is n meant to be an integer expression or a numeric literal?

> ? Syntactic sugar is often a good idea, but at what point do they become cyclamates and cause cancer in laboratory animals? Will these $ tokens render D more accessible, but perhaps too unreadable?

IMHO, Both. It makes D less readable for sure. I also think this repulses more people in general than it attracts some odd perl hackers. :)

In this case, I don't even thing the syntactical sugar makes the code much faster to write (which in reality, I think, is psychological more than a real problem).

If verbosity is to be avoided, I would suggest (as in my earlier reply to this thread) that $ replaces _match. This would give:

$.pre
$.post
$[0]
$[n]
(or $.match(n), but why not overload opIndex?)

/Oskar
February 16, 2006
Walter Bright wrote:
> "Kris" <fu@bar.com> wrote in message news:dt0q7n$2cuo$1@digitaldaemon.com...
> 
>>There seem to be multiple issues here. The first one, which you ask about, is related to the syntax. At first blush, the ~~ looks like an approximate approximation, and then making D look like a malformed Perl is surely a mistake.
> 
> 
> If you've got a better idea for tokens ~~ and !~ ?

Well, there's always "in" ...

if (".wav$" in filename)
    ...

plus the !in variation. Don't you find that somewhat more appealing?



>>What the heck is wrong with $match.pre, $match.post, $match.index(n) instead? At least they're readable :-)
> 
> 
> Nothing, really. But are they more readable than _match.pre, etc.?

I believe the shortened versions ($pre, $post, $group[n] etc) are much more readable. This type of thing is why some of us were so adamant about saving the $ sign as a prefix for meta-tags, vis-a-vis $time, $file, $line and, of course, $length


>>Additionally, I thought '~' was used for concatenation?
> 
> 
> It is.
> 
> 
>>Because '+' is overloaded in other languages? Isn't that just exactly what you're now doing with '~' ?
> 
> 
> '=' and '==' mean entirely different things. So does / and /*. I don't think ~~ need have anything to do with complement or concatenation.

The first two are at least related. But the argument is flawed: choosing arbitrary symbols for operators does not make the language easier to grasp. At least "in" has some relevant meaning to it.


>>Then, you say this is applicable only to char[]. What about wchar[] and dchar[]? Are they now relegated to second-class citizens? It's no use converting those arrays into char[] on the fly ~ apart from the heap activity and conversion that would ensue (for both operands; one of which could be rather substantial), $match.pre and friends would also have to do conversions back into the original format. Ugghh.
> 
> 
> That is a problem, one that would get solved when RegExp can do wchar and dchar. That isn't a technical problem, it's more of a getting around to it problem.

Well, since grammar supported regex has elevated itself to the top of the priority list, perhaps wchar/dchar support might tag along with it?


>>Yet another issue is with respect to case-folding (which is often used with regex expressions). You see, unicode case-folding does not follow the trivial rules of ASCII ~ you can't just call tolower() and hope for the best. Thus, there needs to be some mechanism to support alternate, more appropriate, converters.
> 
> 
> I agree that case is an issue. That's why this also works:
> 
>     if (RegExp("string", "i") ~~ "string") ...
> 
> and can work with any class type as the left operand, as long as it overloads opMatch.

That's a good solution. Do you have a unicode 'folder' ?


>>In retrospect, much of this should probably be handled via template usage (for the different UTF types). And the converter issue can be resolved by supporting some kind of assignable or plug-in module. All of this can be handled by a templated class. I attempted to do just this with your RegExp class, but ran into problems related to how patterns are stored in the "instruction" stream (size differences between char and dchar, for example).
> 
> 
> I don't agree. The problem I ran into with this approach is the injection of the declaration _match into the current scope.

I don't understand the relevance of that, Walter. What does _match have to do with the need to support utf8,utf16 and utf32?


>>I'm an advocate for potentially getting regex support into the grammar but, on the face of it, your approach just doesn't appear to be considered in a particularly thorough manner. There again, perhaps you've already addressed the above issues, and the resolution is just not currently visible?
> 
> 
> I considered many ways of doing it, and have actually been thinking about it for months. This seemed to be the most practical. I hope I answered your questions about it.

No, but the opMatch() is a good solution for that aspect.



>>Perhaps this whole thing should wait until after we see what can be done with the regex templates, so that there's some experience behind the grammar? I mean, that would surely be better than having to remove the above at some point in the future. What's the big rush with built-in regex anyway? I really do think it should wait until we have some solid experience with regex templates ~ don't you think it's rather likely we'll learn something really useful that applies directly to a built-in grammar?
> 
> 
> I don't think this takes away from the regex templates. I hope to use the regex templates in conjunction with this syntactic sugar to create optimized regex evaluation. 

Perhaps, but I really don't see the need for this sudden rush to get regex support into the grammar. Experience with regex templates is almost certain to uncover some conflict in this regard ~ one that will likely have to be compromised to fit in with the current syntax. That's just Murphy's law. What's the big hurry?
February 16, 2006
"kris" <fu@bar.org> wrote in message news:dt1cm1$2t76$1@digitaldaemon.com...
> Walter Bright wrote:
> Well, there's always "in" ...
>
> if (".wav$" in filename)
>     ...
> plus the !in variation. Don't you find that somewhat more appealing?

Not really. I think it also conflicts with 'in' already.

>>>What the heck is wrong with $match.pre, $match.post, $match.index(n) instead? At least they're readable :-)
>> Nothing, really. But are they more readable than _match.pre, etc.?
> I believe the shortened versions ($pre, $post, $group[n] etc) are much more readable. This type of thing is why some of us were so adamant about saving the $ sign as a prefix for meta-tags, vis-a-vis $time, $file, $line and, of course, $length

Fair enough. Let's see what others think.

>>>Additionally, I thought '~' was used for concatenation?
>> It is.
>>>Because '+' is overloaded in other languages? Isn't that just exactly what you're now doing with '~' ?
>> '=' and '==' mean entirely different things. So does / and /*. I don't think ~~ need have anything to do with complement or concatenation.
> The first two are at least related. But the argument is flawed: choosing arbitrary symbols for operators does not make the language easier to grasp.

It's all a matter of what you're used to. Who'd have thought that '!' for 'not' would feel natural? It was a kludge invented for C. Now it's standard.

> At least "in" has some relevant meaning to it.

It would be overloading its existing meaning, which means that it'll take semantic, rather than syntactic, analysis to disambiguate. This is potential trouble.

>> That is a problem, one that would get solved when RegExp can do wchar and dchar. That isn't a technical problem, it's more of a getting around to it problem.
> Well, since grammar supported regex has elevated itself to the top of the priority list, perhaps wchar/dchar support might tag along with it?

The thing is, RegExp has been in there from the beginning, but it has gone unused and even its existence is overlooked. I don't believe that's because it isn't useful - look at Ruby, Perl, Javascript, etc. Those languages heavilly use regex. Is there something inherent about *script* languages that make them nice for regex? I don't believe there is, I think it gets heavilly used in those languages because the syntactic sugar makes it easy to use.

I've been blasted for putting strings in the language (instead of as a library String class), for putting complex numbers in, and for associative arrays. I think the results speak for these being a success. If regex's are heavilly used, then the extra sugar for them becomes worthwhile as well.

Who uses regex in C++? Hardly anyone. I'm betting it's because using them sucks in C++, not because people don't use regex's.


>> I agree that case is an issue. That's why this also works:
>>
>>     if (RegExp("string", "i") ~~ "string") ...
>>
>> and can work with any class type as the left operand, as long as it overloads opMatch.
> That's a good solution. Do you have a unicode 'folder' ?

No. But that's a library issue, not a language issue. Match expressions are set up so that one can completely control their behavior with a custom class.

>>>In retrospect, much of this should probably be handled via template usage (for the different UTF types). And the converter issue can be resolved by supporting some kind of assignable or plug-in module. All of this can be handled by a templated class. I attempted to do just this with your RegExp class, but ran into problems related to how patterns are stored in the "instruction" stream (size differences between char and dchar, for example).
>> I don't agree. The problem I ran into with this approach is the injection of the declaration _match into the current scope.
> I don't understand the relevance of that, Walter. What does _match have to do with the need to support utf8,utf16 and utf32?

Nothing. But _match *does* have a lot to do with the inadequacy of a pure template solution. Not even mixins will work in a nice way here.

>> I don't think this takes away from the regex templates. I hope to use the regex templates in conjunction with this syntactic sugar to create optimized regex evaluation.
> Perhaps, but I really don't see the need for this sudden rush to get regex support into the grammar. Experience with regex templates is almost certain to uncover some conflict in this regard ~ one that will likely have to be compromised to fit in with the current syntax. That's just Murphy's law. What's the big hurry?

I thought it fit in well with D's new capability of being runnable in a script-like fashion. If this opens up a reasonably broad new range of applications that D is a good fit for, that's good. I might be wrong, of course, as I've been with the bit data type (a complete botch). Match expressions don't break anything, were not expensive to implement, and the only way to see how they'll work out is to try them.


February 16, 2006
"Oskar Linde" <olREM@OVEnada.kth.se> wrote in message news:dt1aif$2qpd$1@digitaldaemon.com...
> Have you considered making this more general? I.e. for all if statements, inject a variable that takes the value of the entire condition expression. (Using _result as a placeholder for such an identifier.)
>
> if ("..." ~~ "...) {
>  _result.match(0);
> }
>
> if (myFunc()) {
>  _result.whatever();
> }
>
> Why should this behavior be reserved for opMatch() only? Isn't this a very common coding pattern that could also become less verbose by this:
>
> SomeType result;
> if ( (result = getSomething())) {
>        doSomethingWith(result);
> }
>
> (becoming:
>
> if (getSomething()) {
>        doSomethingWith(_result);
> }
>
> )
>
> One suggestion would be to call _result $. Giving $ the semantics of a "scope injected value". This would go hand in hand with an earlier suggestion of changing the $ for index operations too:
>
> Assume [] introduces a new scope, then a $ within [] would refer to
> whatever
> is being indexed.
>
> char[] cutHeadAndTail = myString[1 .. $.length-1];
> Image subImage = myImage[$.upperLeft .. $.middle];
> char[] contents = text[$.indexOf('{')+1 .. $.indexOf('}')];

I never thought of that. It's an intriguing idea.


February 16, 2006
"Oskar Linde" <olREM@OVEnada.kth.se> wrote in message news:dt1ccm$2ssg$1@digitaldaemon.com...
>>     $& => _match.match(0)
>>     $n => _match.match(n)
> Is n meant to be an integer expression or a numeric literal?

$1, $2, $3, ...

>> ? Syntactic sugar is often a good idea, but at what point do they become cyclamates and cause cancer in laboratory animals? Will these $ tokens render D more accessible, but perhaps too unreadable?
> IMHO, Both. It makes D less readable for sure. I also think this repulses more people in general than it attracts some odd perl hackers. :)

I'm a little surprised at the uniformly negative reaction to the perl-ish notation. But that's good, as it makes the right way to go for D clear.

> If verbosity is to be avoided, I would suggest (as in my earlier reply to this thread) that $ replaces _match. This would give:
>
> $.pre
> $.post
> $[0]
> $[n]
> (or $.match(n), but why not overload opIndex?)

That was the original plan, but when _match is of type T*, the [ ] cannot be overloaded.


February 16, 2006
Walter Bright wrote:
> "kris" <fu@bar.org> wrote in message news:dt1cm1$2t76$1@digitaldaemon.com...
> 
>>Walter Bright wrote:
>>Well, there's always "in" ...
>>
>>if (".wav$" in filename)
>>    ...
>>plus the !in variation. Don't you find that somewhat more appealing?
> 
> 
> Not really. I think it also conflicts with 'in' already.

but not from the users standpoint


> It's all a matter of what you're used to. Who'd have thought that '!' for 'not' would feel natural? It was a kludge invented for C. Now it's standard.

That doesn't mean D should adopt arbitrary symbols, Walter. If you want rapid adoption, then the more you can do to make the language "approachable", the more success you'll have. There was a similar issue with === and !==, and you thankfully deprecated them :-)


>>At least "in" has some relevant meaning to it.
> 
> 
> It would be overloading its existing meaning, which means that it'll take semantic, rather than syntactic, analysis to disambiguate. This is potential trouble.

I can see that there "might" be trouble for the compiler and, if so, that would be an issue. However, for a developer, the meaning of "in" with respect to its use with AA and potentially regex-patterns is consistent. One is asking the question "does this thing on the left exist within the thing on the right". It even takes care of getting the operand ordering correct. Thus, I'd urge you to at least see if there's actually a notable problem for the compiler to handle this before writing the idea off.


> The thing is, RegExp has been in there from the beginning, but it has gone unused and even its existence is overlooked. I don't believe that's because it isn't useful - look at Ruby, Perl, Javascript, etc. Those languages heavilly use regex. Is there something inherent about *script* languages that make them nice for regex? I don't believe there is, I think it gets heavilly used in those languages because the syntactic sugar makes it easy to use.

Heck, I've used regex in all manner of ways. I don't think visibility is the problem; rather, I suspect there's a limited set of domains where it applies in a systems language. Some of the those can be addressed in other ways, particularly where performance is a concern; hence regex may not get used as much as it might. In scripting languages there's often a need for Q & D pattern-matching, with little regard for a potentially more efficient mechanism. Horses for courses.


> I've been blasted for putting strings in the language (instead of as a library String class), for putting complex numbers in, and for associative arrays. I think the results speak for these being a success. If regex's are heavilly used, then the extra sugar for them becomes worthwhile as well.

That's getting a bit off topic, isn't it? OK, I'll go with it:

I'm an advocate for getting regex support in the grammar, but I'm certainly not an advocate for tying Phobos to the compiler (RegExp has a notable resultant import set; because of this I refactored it for Ares and Mango).

Without a clearly defined means to decouple Phobos from the compiler, you're effectively erecting barriers for other solutions to clamber over (as Sean vaguely intimated earlier). What's missing from all this built-in stuff is a clean and documented means to have it supported outside of Phobos. After all, the compiler is injecting explicit references for AA code, utf conversion code, regex code, and a variety of other things. What's next?

In short: you're (a) building more and more library functionality directly into the language without providing a means to cleanly support alternate implementations, extensions, or otherwise decouple the compiler. And (b) by doing so, you're (perhaps inadvertantly) stifling some innovation and causing some headaches for the very people who are trying to help D along the road to acceptance. It would really help if you'd be somewhat sensitive to these aspects rather than persistently ignoring them.

For instance, how does one change .sort to use a different sorting algorithm? How does one change the hashing function for non-classes? How can one unhook RegExp+OutBuffer+String+Others, and replace it? etc. etc. If D is intended to be a closed-shop, Phobos-only environment, then some of us are presumably wasting our time supporting the language; right?

I don't suppose that was the answer you were looking for <g>


> Who uses regex in C++? Hardly anyone. I'm betting it's because using them sucks in C++, not because people don't use regex's.

Again, it's horses for courses. BTW, regex does not suck in C, so why C++ ?

> I thought it fit in well with D's new capability of being runnable in a script-like fashion. If this opens up a reasonably broad new range of applications that D is a good fit for, that's good. I might be wrong, of course, as I've been with the bit data type (a complete botch). Match expressions don't break anything, were not expensive to implement, and the only way to see how they'll work out is to try them. 

I figured that was the motivation. The "cost" you speak of considers only how much effort it takes you to get the functionality into the compiler, test it a bit, document the usage, and respond to the flak ;-)

BTW: perhaps it would be appropriate to deprecate bit[] before 1.0 and provide a nice library class/struct instead? You might even reuse the old code from Zortech/Zorland days.

February 16, 2006
In article <dt1eje$2uvu$2@digitaldaemon.com>, Walter Bright says...
>
>
>"Oskar Linde" <olREM@OVEnada.kth.se> wrote in message news:dt1aif$2qpd$1@digitaldaemon.com...
>> Have you considered making this more general? I.e. for all if statements, inject a variable that takes the value of the entire condition expression. (Using _result as a placeholder for such an identifier.)
>>
>> if ("..." ~~ "...) {
>>  _result.match(0);
>> }
>>
>> if (myFunc()) {
>>  _result.whatever();
>> }
>>
>> Why should this behavior be reserved for opMatch() only? Isn't this a very common coding pattern that could also become less verbose by this:
>>
>> SomeType result;
>> if ( (result = getSomething())) {
>>        doSomethingWith(result);
>> }
>>
>> (becoming:
>>
>> if (getSomething()) {
>>        doSomethingWith(_result);
>> }
>>
>> )
>>
>> One suggestion would be to call _result $. Giving $ the semantics of a "scope injected value". This would go hand in hand with an earlier suggestion of changing the $ for index operations too:
>>
>> Assume [] introduces a new scope, then a $ within [] would refer to
>> whatever
>> is being indexed.
>>
>> char[] cutHeadAndTail = myString[1 .. $.length-1];
>> Image subImage = myImage[$.upperLeft .. $.middle];
>> char[] contents = text[$.indexOf('{')+1 .. $.indexOf('}')];
>
>I never thought of that. It's an intriguing idea.
>

Something along these lines would *most certainly* get my vote!

- Eric Anderton at yahoo
February 16, 2006
Walter Bright wrote:
> D dramatically improves the convenience of string handling over C++. But while I think using the library std.regexp is straightforward, obviously it just isn't gaining traction. People like the shortcut approaches Ruby and Perl use for regular expressions, hence the new D match-expression support.
> 
> So, now we have:
> 
>     if (regular_expression ~~ string)
>     {
>             _match.pre
>             _match.post
>             _match.match(n)
>     }
> 
> Should we do some aliases:
> 
>     $` => _match.pre
>     $' => _match.post
>     $& => _match.match(0)
>     $n => _match.match(n)
> 
> ? Syntactic sugar is often a good idea, but at what point do they become cyclamates and cause cancer in laboratory animals? Will these $ tokens render D more accessible, but perhaps too unreadable? 
> 
> 

It is nice feature but I don't think such thing should be part of the language. I don't think it is so common. Maybe I am wrong... The other thing I don't like is the too many reserved words... Me personally wouldn't try to catch Ruby or Perl. I believe comparison between D/C/C++ and virtual machine or scripting language is foolish. But it depends on what are the goals of D - larger audience or higher quality. Because, in my opinion, trying to catch a scripting language is regression. But as I said it is very nice feature. I will use it myself, but wouldn't judge for a language by this...