$`, $', $&, $n - sugar or cyclamates? (page 2) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » $`, $', $&, $n - sugar or cyclamates? (page 2)

February 16, 2006

Re: $`, $', $&, $n - sugar or cyclamates?

Posted by pragma
in reply to Walter Bright

pragma

Posted in reply to Walter Bright

In article <dt0hbb$25iq$2@digitaldaemon.com>, Walter Bright says...
>
>
>"John Demme" <me@teqdruid.com> wrote in message news:dt0fvp$23bj$1@digitaldaemon.com...
>> Oh Bob no... Don't turn D into Perl.  I like the $ for short cuts and
>> such,
>> but please no random symbols.  I like $match.pre and $length, ect... but
>> $&
>> and $` don't mean anything to me!
>
><g>. I considered setting this up as a vote:
>
>Vote for 1:
>
>(1) If I wanted to write ugly programs I'd use Perl, not D.
>
>(2) Cool! I can now dump my Perl scripts and use D!
>

Well, assuming that your mind is made up on this way or no way, I'd have to lean toward (2).  Its there to be used, but if I object to it personally, I can abstain from using it.

Just some food for thought, as I think there's plenty left to be worked out in this concept. :)

IMHO, using "~~" as a token doesn't look right yet, but that's probably because this would be the first time that token has been used in a programming language (unless I'm mistaken).  The only thing I could possibly suggest to use differently would be at-cost ("@") symbol:

if("regular expression" @ "operand"){ /*...*/ }

This looks a little more arithmetic to my eye than "~~". :)

The dollar-sign operators look good, but "$n" seems limited to me.  Why not open this up to array-indexing so it's more compatible with foreach, arrays and other things D?  Also, what about if I want to pass the set of matches as an array?

The '$x' tokens are sure to lex great, but isn't this running the risk of overloading the '$' symbol a bit much (from a visual standpoint)?

if("$\w*" ~~ "hello world"){
mystring[0..$&.length] = $&; //eek!
}

Also, am I to assume that we'll get an "opProcess" operator overload to use on our classes?  As long as _match is flexible enough to accept any type, this could really work.  To my eye, the compiler could accept a custom class or struct as the _match value (kind of like an internal 'auto') so long as its namespace provides the .pre, .post, .match members.  All-in-all, it would be a rather nice side effect of all this, as things like Spirit have been difficult to implement as D has fewer operator overloads than C++.


- Eric Anderton at yahoo

February 16, 2006

Re: $`, $', $&, $n - sugar or cyclamates?

Posted by S. Chancellor
in reply to Walter Bright

S. Chancellor

Posted in reply to Walter Bright

On 2006-02-15 13:59:33 -0800, "Walter Bright" <newshound@digitalmars.com> said:

> D dramatically improves the convenience of string handling over C++. But while I think using the library std.regexp is straightforward, obviously it just isn't gaining traction. People like the shortcut approaches Ruby and Perl use for regular expressions, hence the new D match-expression support.
> 
> So, now we have:
> 
>     if (regular_expression ~~ string)
>     {
>             _match.pre
>             _match.post
>             _match.match(n)
>     }
> 
> Should we do some aliases:
> 
>     $` => _match.pre
>     $' => _match.post
>     $& => _match.match(0)
>     $n => _match.match(n)
> 
> ? Syntactic sugar is often a good idea, but at what point do they become cyclamates and cause cancer in laboratory animals? Will these $ tokens render D more accessible, but perhaps too unreadable?

With this you've essentially bound syntax to the RegExp class, or are you not using that for this?    I do believe I recall some statements by you in the past against standard libraries being an integral part of the computer language.  Though, I'm too lazy to dig them up right now.

My preference is that this match syntax be removed, and the aliases never see the light of day.  I use perl for this sort of stuff.

-S.

February 16, 2006

Re: $`, $', $&, $n - sugar or cyclamates?

Posted by Derek Parnell
in reply to S. Chancellor

Derek Parnell

Posted in reply to S. Chancellor

On Wed, 15 Feb 2006 18:06:45 -0800, S. Chancellor wrote:

> My preference is that this match syntax be removed, and the aliases never see the light of day.  I use perl for this sort of stuff.

I use regular expression matching a lot in the type of programming I do, e.g. Build, and I suspect I'd find perl far too slow for the purpose.

I haven't used the std.regexp library because it doesn't really support Unicode correctly so I've written simple functions to some pattern matching for my needs. And as I've just found out, the new pattern matching just uses the standard library and Unicode support is not there, so I still can't use it.

-- 
Derek
(skype: derek.j.parnell)
Melbourne, Australia
"Down with mediocracy!"
16/02/2006 1:38:45 PM

February 16, 2006

Re: $`, $', $&, $n - sugar or cyclamates?

Posted by Kris
in reply to Walter Bright

Kris

Posted in reply to Walter Bright

"Walter Bright" <newshound@digitalmars.com> wrote...
>D dramatically improves the convenience of string handling over C++. But while I think using the library std.regexp is straightforward, obviously it just isn't gaining traction. People like the shortcut approaches Ruby and Perl use for regular expressions, hence the new D match-expression support.
>
> So, now we have:
>
>    if (regular_expression ~~ string)
>    {
>            _match.pre
>            _match.post
>            _match.match(n)
>    }
>
> Should we do some aliases:
>
>    $` => _match.pre
>    $' => _match.post
>    $& => _match.match(0)
>    $n => _match.match(n)
>
> ? Syntactic sugar is often a good idea, but at what point do they become cyclamates and cause cancer in laboratory animals? Will these $ tokens render D more accessible, but perhaps too unreadable?


There seem to be multiple issues here. The first one, which you ask about, is related to the syntax. At first blush, the ~~ looks like an approximate approximation, and then making D look like a malformed Perl is surely a mistake. What the heck is wrong with $match.pre, $match.post, $match.index(n) instead? At least they're readable :-)

Additionally, I thought '~' was used for concatenation? Because '+' is overloaded in other languages? Isn't that just exactly what you're now doing with '~' ? I mean, what does a "pattern within" operation have to do with concatenation?

Then, you say this is applicable only to char[]. What about wchar[] and dchar[]? Are they now relegated to second-class citizens? It's no use converting those arrays into char[] on the fly ~ apart from the heap activity and conversion that would ensue (for both operands; one of which could be rather substantial), $match.pre and friends would also have to do conversions back into the original format. Ugghh.

Yet another issue is with respect to case-folding (which is often used with regex expressions). You see, unicode case-folding does not follow the trivial rules of ASCII ~ you can't just call tolower() and hope for the best. Thus, there needs to be some mechanism to support alternate, more appropriate, converters.

In retrospect, much of this should probably be handled via template usage (for the different UTF types). And the converter issue can be resolved by supporting some kind of assignable or plug-in module. All of this can be handled by a templated class. I attempted to do just this with your RegExp class, but ran into problems related to how patterns are stored in the "instruction" stream (size differences between char and dchar, for example).

I'm an advocate for potentially getting regex support into the grammar but, on the face of it, your approach just doesn't appear to be considered in a particularly thorough manner. There again, perhaps you've already addressed the above issues, and the resolution is just not currently visible?

Perhaps this whole thing should wait until after we see what can be done with the regex templates, so that there's some experience behind the grammar? I mean, that would surely be better than having to remove the above at some point in the future. What's the big rush with built-in regex anyway? I really do think it should wait until we have some solid experience with regex templates ~ don't you think it's rather likely we'll learn something really useful that applies directly to a built-in grammar?

- Kris

February 16, 2006

Re: $`, $', $&, $n - sugar or cyclamates?

Posted by James Dunne
in reply to Walter Bright

James Dunne

Posted in reply to Walter Bright

Walter Bright wrote:
> D dramatically improves the convenience of string handling over C++. But while I think using the library std.regexp is straightforward, obviously it just isn't gaining traction. People like the shortcut approaches Ruby and Perl use for regular expressions, hence the new D match-expression support.
> 
> So, now we have:
> 
>     if (regular_expression ~~ string)
>     {
>             _match.pre
>             _match.post
>             _match.match(n)
>     }
> 
> Should we do some aliases:
> 
>     $` => _match.pre
>     $' => _match.post
>     $& => _match.match(0)
>     $n => _match.match(n)
> 
> ? Syntactic sugar is often a good idea, but at what point do they become cyclamates and cause cancer in laboratory animals? Will these $ tokens render D more accessible, but perhaps too unreadable? 
> 
> 

I'd rather make my code easier to read than write.  I don't use regexps just for that reason.

-- 
Regards,
James Dunne

February 16, 2006

Re: $`, $', $&, $n - sugar or cyclamates?

Posted by Dave
in reply to Walter Bright

Dave

Posted in reply to Walter Bright

In article <dt0hbb$25iq$2@digitaldaemon.com>, Walter Bright says...
>
>
>"John Demme" <me@teqdruid.com> wrote in message news:dt0fvp$23bj$1@digitaldaemon.com...
>> Oh Bob no... Don't turn D into Perl.  I like the $ for short cuts and
>> such,
>> but please no random symbols.  I like $match.pre and $length, ect... but
>> $&
>> and $` don't mean anything to me!
>
><g>. I considered setting this up as a vote:
>
>Vote for 1:
>
>(1) If I wanted to write ugly programs I'd use Perl, not D.
>
>(2) Cool! I can now dump my Perl scripts and use D!
>

I think both apply and are not mutually exclusive <g>

For me, the big part of supporting the most common regex operation in the language itself is that quick scripts using it can be kicked out without having to import something or remember the details of the RegExp class. Crazy (or lazy?), but I find that appealing when comparing it to a scripting language. So that's a vote for (2).

I've never been a big fan of most of Perl's syntactical sugar - just too easy to miss something when you're reading it, so that's a vote for (1). And besides, one will never be able to copy and paste much of anything from Perl into D so there isn't any 'sweet' benefit there either <g>

- Dave

February 16, 2006

Re: $`, $', $&, $n - sugar or cyclamates?

Posted by Unknown W. Brackets
in reply to Walter Bright

Unknown W. Brackets

Posted in reply to Walter Bright

I personally don't see why it has to be 1 or 2.  I think compromise is a great thing.

I should note first that I actually like $ in scripting languages, because it tends to make variables stand out (not hide them.)

You seem to be suggesting either using _match.match(0) (ick!) or $&.... why?  Why can't it be:

   $pre => _match.pre
   $post => _match.post
   $match => _match.match(0)
   $5 => _match.match(5)

Yes, yes, I realize this looks more like those scripting-language variables, but it's also clearer than Perl's syntax, and almost as easy to type.  I would spend more time making sure I'm pressing the right symbol than typing "pre" or some such.

Just my opinion.

-[Unknown]


> "John Demme" <me@teqdruid.com> wrote in message news:dt0fvp$23bj$1@digitaldaemon.com...
>> Oh Bob no... Don't turn D into Perl.  I like the $ for short cuts and such,
>> but please no random symbols.  I like $match.pre and $length, ect... but $&
>> and $` don't mean anything to me!
> 
> <g>. I considered setting this up as a vote:
> 
> Vote for 1:
> 
> (1) If I wanted to write ugly programs I'd use Perl, not D.
> 
> (2) Cool! I can now dump my Perl scripts and use D! 
> 
>

February 16, 2006

Re: $`, $', $&, $n - sugar or cyclamates?

Posted by Walter Bright
in reply to pragma

Walter Bright

Posted in reply to pragma

"pragma" <pragma_member@pathlink.com> wrote in message news:dt0mfk$29qc$1@digitaldaemon.com...
> Also, am I to assume that we'll get an "opProcess" operator overload to
> use on
> our classes?

Yes, opMatch. Already done!

>  As long as _match is flexible enough to accept any type, this
> could really work.  To my eye, the compiler could accept a custom class or
> struct as the _match value (kind of like an internal 'auto') so long as
> its
> namespace provides the .pre, .post, .match members.

Already done!

> All-in-all, it would be a
> rather nice side effect of all this, as things like Spirit have been
> difficult
> to implement as D has fewer operator overloads than C++.
>
>
> - Eric Anderton at yahoo

February 16, 2006

Re: $`, $', $&, $n - sugar or cyclamates?

Posted by Walter Bright
in reply to Derek Parnell

Walter Bright

Posted in reply to Derek Parnell

"Derek Parnell" <derek@psych.ward> wrote in message news:sedgdqrvihce.1s7xzb5qubodc$.dlg@40tude.net...
> I haven't used the std.regexp library because it doesn't really support
> Unicode correctly so I've written simple functions to some pattern
> matching
> for my needs. And as I've just found out, the new pattern matching just
> uses the standard library and Unicode support is not there, so I still
> can't use it.

All you need to use it with your own custom type is provide an opMatch() overload.

February 16, 2006

Re: $`, $', $&, $n - sugar or cyclamates?

Posted by Walter Bright
in reply to Kris

Walter Bright

Posted in reply to Kris

"Kris" <fu@bar.com> wrote in message news:dt0q7n$2cuo$1@digitaldaemon.com...
> There seem to be multiple issues here. The first one, which you ask about, is related to the syntax. At first blush, the ~~ looks like an approximate approximation, and then making D look like a malformed Perl is surely a mistake.

If you've got a better idea for tokens ~~ and !~ ?

> What the heck is wrong with $match.pre, $match.post, $match.index(n) instead? At least they're readable :-)

Nothing, really. But are they more readable than _match.pre, etc.?

> Additionally, I thought '~' was used for concatenation?

It is.

> Because '+' is overloaded in other languages? Isn't that just exactly what you're now doing with '~' ?

'=' and '==' mean entirely different things. So does / and /*. I don't think ~~ need have anything to do with complement or concatenation.

> I mean, what does a "pattern within" operation have to do with concatenation?

Nothing at all.

> Then, you say this is applicable only to char[]. What about wchar[] and dchar[]? Are they now relegated to second-class citizens? It's no use converting those arrays into char[] on the fly ~ apart from the heap activity and conversion that would ensue (for both operands; one of which could be rather substantial), $match.pre and friends would also have to do conversions back into the original format. Ugghh.

That is a problem, one that would get solved when RegExp can do wchar and dchar. That isn't a technical problem, it's more of a getting around to it problem.

> Yet another issue is with respect to case-folding (which is often used with regex expressions). You see, unicode case-folding does not follow the trivial rules of ASCII ~ you can't just call tolower() and hope for the best. Thus, there needs to be some mechanism to support alternate, more appropriate, converters.

I agree that case is an issue. That's why this also works:

    if (RegExp("string", "i") ~~ "string") ...

and can work with any class type as the left operand, as long as it overloads opMatch.

> In retrospect, much of this should probably be handled via template usage (for the different UTF types). And the converter issue can be resolved by supporting some kind of assignable or plug-in module. All of this can be handled by a templated class. I attempted to do just this with your RegExp class, but ran into problems related to how patterns are stored in the "instruction" stream (size differences between char and dchar, for example).

I don't agree. The problem I ran into with this approach is the injection of the declaration _match into the current scope.

> I'm an advocate for potentially getting regex support into the grammar but, on the face of it, your approach just doesn't appear to be considered in a particularly thorough manner. There again, perhaps you've already addressed the above issues, and the resolution is just not currently visible?

I considered many ways of doing it, and have actually been thinking about it for months. This seemed to be the most practical. I hope I answered your questions about it.

> Perhaps this whole thing should wait until after we see what can be done with the regex templates, so that there's some experience behind the grammar? I mean, that would surely be better than having to remove the above at some point in the future. What's the big rush with built-in regex anyway? I really do think it should wait until we have some solid experience with regex templates ~ don't you think it's rather likely we'll learn something really useful that applies directly to a built-in grammar?

I don't think this takes away from the regex templates. I hope to use the regex templates in conjunction with this syntactic sugar to create optimized regex evaluation.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation