$`, $', $&, $n - sugar or cyclamates? And other topics (page 6) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » $`, $', $&, $n - sugar or cyclamates? And other topics (page 6)

February 17, 2006

Re: $`, $', $&, $n - sugar or cyclamates? And other topics

Posted by Kris
in reply to Sean Kelly

Kris

Posted in reply to Sean Kelly

"Sean Kelly" <sean@f4.ca> wrote
> Kris wrote:
>> "Thomas Kuehne" <thomas-dloop@kuehne.cn> wrote ...
>>> Currently, every type - including void - can be used as the type on an array element. What would be the consequences for generic programming if T -> T[] isn't guaranteed to succeed?
>>
>> Easy fix ~ change the bool alias to byte, instead of bit :-)
>
> I already use byte in some cases :-)  But it lacks the boolean value safety of bit, so I tend to litter my code with asserts just to be sure something didn't get screwed up... or simply make sure I'm only comparing to zero and not-zero.  Either way, it's more error prone than I'd like.

Yes, you're right of course. Would be just great if Walter would add a true *cough* bool *cough* type that doesn't try to pack itself when used with arrays. Packed bits are great too, but for different reasons.

February 17, 2006

Re: $`, $', $&, $n - sugar or cyclamates? And other topics

Posted by Regan Heath
in reply to Kris

Regan Heath

Posted in reply to Kris

On Thu, 16 Feb 2006 16:36:48 -0800, Kris <fu@bar.com> wrote:
> "Sean Kelly" <sean@f4.ca> wrote
>> Kris wrote:
>>> "Thomas Kuehne" <thomas-dloop@kuehne.cn> wrote ...
>>>> Currently, every type - including void - can be used as the type on an
>>>> array element. What would be the consequences for generic programming
>>>> if T -> T[] isn't guaranteed to succeed?
>>>
>>> Easy fix ~ change the bool alias to byte, instead of bit :-)
>>
>> I already use byte in some cases :-)  But it lacks the boolean value
>> safety of bit, so I tend to litter my code with asserts just to be sure
>> something didn't get screwed up... or simply make sure I'm only comparing
>> to zero and not-zero.  Either way, it's more error prone than I'd like.
>
> Yes, you're right of course. Would be just great if Walter would add a true
> *cough* bool *cough* type that doesn't try to pack itself when used with
> arrays.

A true bool would make several people happy.. but once one existed people would then want:

class A {}
A a = new a();
if (a) //error not boolean result.

right? That would bother me.

> Packed bits are great too, but for different reasons.

Indeed, I can think of several uses for packed bits. i.e.
 - Using them as a bunch of flags, generally boolean on/off flags.
 - Representing/disecting packed data, i.e. tcp headers.
 - Assembling/converting data i.e. 8bit to 7bit characters for SMS messages.

all of these can be done with & | ^ etc but it would be nice, i.e. more readable, easier to write if we could index the data.

I've suggested this before but is it perhaps possible to allow us to perform array operations on the basic types: byte, short, int, long. For the same reason that bit[] does not work, these could not provide a full set of array functionality, but it could provide much that would be of use, I suspect.

Examples:

int flags;
...
if (flags[5]) //check for flag
	flag[5] = 1; //set flag

void foo(long header) {
  int length = header[0..5]; //copy bits to lvalue.
...

For the 3rd task, converting from 8bit to 7bit some sort of stream that allowed bits to be sent to it and assembled would be the ideal way, I suspect.

In the end it's just syntactic sugar for & | and ^. The question is, does it make the code clearer, I think so. Does it make bit manipulation easier to code, I think so. Is that enough to make it a valuable feature?

Regan

February 17, 2006

Re: $`, $', $&, $n - sugar or cyclamates? And other topics

Posted by Walter Bright
in reply to kris

Walter Bright

Posted in reply to kris

"kris" <fu@bar.org> wrote in message news:dt1jhm$1m3$1@digitaldaemon.com...
> Walter Bright wrote:
>> Not really. I think it also conflicts with 'in' already.
> but not from the users standpoint

Can't separate the two.

> That doesn't mean D should adopt arbitrary symbols, Walter. If you want rapid adoption, then the more you can do to make the language "approachable", the more success you'll have. There was a similar issue with === and !==, and you thankfully deprecated them :-)

Those had to go because === was indistinguishable from == in many fonts.

>> It would be overloading its existing meaning, which means that it'll take semantic, rather than syntactic, analysis to disambiguate. This is potential trouble.
> I can see that there "might" be trouble for the compiler and, if so, that would be an issue. However, for a developer, the meaning of "in" with respect to its use with AA and potentially regex-patterns is consistent.

The trouble starts happening when you overload the operators. Doing this with 'in' will result in similar problems that C++ has with '+' being sometimes plus, sometimes concatenate.

> One is asking the question "does this thing on the left exist within the thing on the right". It even takes care of getting the operand ordering correct. Thus, I'd urge you to at least see if there's actually a notable problem for the compiler to handle this before writing the idea off.

It's not a problem with the compiler. It's a conceptual problem for the user. When I see 'in' I think of containers. That's completely different from regex.

> Heck, I've used regex in all manner of ways. I don't think visibility is the problem; rather, I suspect there's a limited set of domains where it applies in a systems language. Some of the those can be addressed in other ways, particularly where performance is a concern; hence regex may not get used as much as it might. In scripting languages there's often a need for Q & D pattern-matching, with little regard for a potentially more efficient mechanism. Horses for courses.

Scripting languages have 3 main programming characteristics:

1) dynamic typing
2) great string handling
3) runtime script generation & execution

A lot of people turn to them because of (2). There's no reason C++ and D can't do (2) as well. C++ doesn't because the C++ community has adopted the principle of "if it can be done as a library, it must be done as a library, no matter how unbelievably wretched that might turn out." So when C++ programmers want to do strings, they switch to Perl, Ruby, Python, etc.

As to string manipulation in a systems app - is a compiler a systems app? I believe it is, and there's a bunch of tedious string manipulation in it. Everything from handling the command line arguments to manipulating file names to formatting error messages to reading config files. It's astonishing how that stuff shrinks and becomes a pleasure to code rather than tedium when the string handling sugar is applied.

I also write a number of garden variety string processing apps, such as the one that turns newsgroup postings into the "D archives". I want to do them in D. I don't want to install/learn Ruby/Python/Perl. I see no reason why D cannot dominate that problem space well.

> I'm an advocate for getting regex support in the grammar,

I thought you were arguing against that <g>.

> but I'm certainly not an advocate for tying Phobos to the compiler (RegExp
> has a notable resultant import set; because of this I refactored it for
> Ares and Mango).
> Without a clearly defined means to decouple Phobos from the compiler,
> you're effectively erecting barriers for other solutions to clamber over
> (as Sean vaguely intimated earlier). What's missing from all this built-in
> stuff is a clean and documented means to have it supported outside of
> Phobos. After all, the compiler is injecting explicit references for AA
> code, utf conversion code, regex code, and a variety of other things.
> What's next?

The compiler actually does not emit any explicit references to RegExp. It's all done by a reference to object._Match. _Match operates as a proxy to RegExp, but the compiler knows nothing about that.

> In short: you're (a) building more and more library functionality directly into the language without providing a means to cleanly support alternate implementations, extensions, or otherwise decouple the compiler. And (b) by doing so, you're (perhaps inadvertantly) stifling some innovation and causing some headaches for the very people who are trying to help D along the road to acceptance. It would really help if you'd be somewhat sensitive to these aspects rather than persistently ignoring them.
>
> For instance, how does one change .sort to use a different sorting algorithm? How does one change the hashing function for non-classes? How can one unhook RegExp+OutBuffer+String+Others, and replace it? etc. etc. If D is intended to be a closed-shop, Phobos-only environment, then some of us are presumably wasting our time supporting the language; right?

Regex is non-trivial. There's no way to have any sort of language support for it without it being in the library. Anyone working on D libraries or other things is welcome to use RegExp, so I am just not understanding what the problem is. Phobos isn't a closed shop, the license on the files allows anyone to do pretty much anything they want with it.

Also, let me reiterate that the compiler does *not* emit any hardcoded references to RegExp, nor does it know anything at all about regex's. It uses object._Match, which is a proxy to whatever the language implementor wants to use.

RegExp could probably remove its dependence on OutBuffer, though.

>> Who uses regex in C++? Hardly anyone. I'm betting it's because using them sucks in C++, not because people don't use regex's.
> Again, it's horses for courses. BTW, regex does not suck in C, so why C++ ?

It sucks in C, and why do I say that? I've shipped a C compiler for 22 years now, and not once, not ever, did anyone ask for a regex library for it. Regex wasn't put in the C standard, or the C++ one. Yet regex is considered a core capability of several other languages. There are many ways to interpret that - I am interpreting it as meaning that regex sucks in C, and so people seem to just never even think of using C when they need to process strings.

> BTW: perhaps it would be appropriate to deprecate bit[] before 1.0 and provide a nice library class/struct instead? You might even reuse the old code from Zortech/Zorland days.

I know Stewart is using bit[], I want to hear his opinion first. If he says dump it, I'm agreeable.

February 17, 2006

Re: $`, $', $&, $n - sugar or cyclamates?

Posted by Walter Bright
in reply to Julio César Carrascal Urquijo

Walter Bright

Posted in reply to Julio César Carrascal Urquijo

"Julio César Carrascal Urquijo" <jcesar@phreaker.net> wrote in message news:dt28a3$o2q$1@digitaldaemon.com...
> Oskar Linde wrote:
>> char[] cutHeadAndTail = myString[1 .. $.length-1];
>> Image subImage = myImage[$.upperLeft .. $.middle];
>> char[] contents = text[$.indexOf('{')+1 .. $.indexOf('}')];
>
> This is a great idea. I like it.

There is one problem with it: every time an IfStatement is added to existing code, it will break all uses of $ in the ThenStatement:

----- before --------
if (foo())
    $.bar = 3;
------ after ---------
if (foo())
{
     if (abc())
        $.bar = 3;    // uh-oh!
}
----------------------

This is of course a trivial example, but consider if the $ appeared in a large block of code.

February 17, 2006

Re: $`, $', $&, $n - sugar or cyclamates?

Posted by Walter Bright
in reply to Sean Kelly

Walter Bright

Posted in reply to Sean Kelly

"Sean Kelly" <sean@f4.ca> wrote in message news:dt2cra$ssu$2@digitaldaemon.com...
> Walter Bright wrote:
>> "Kris" <fu@bar.com> wrote in message news:dt0q7n$2cuo$1@digitaldaemon.com...
>>> There seem to be multiple issues here. The first one, which you ask about, is related to the syntax. At first blush, the ~~ looks like an approximate approximation, and then making D look like a malformed Perl is surely a mistake.
>>
>> If you've got a better idea for tokens ~~ and !~ ?
>
> I'm half inclined to suggest -> for ~~, though there doesn't seem to be an obvious corresponding 'not' version.

Two cons:

1) people see -> and they're going to think the C/C++ meaning. Heck, I often mistakenly use -> in D instead of '.'. For that reason -> should never result in valid D code.

2) as you suggested, !-> doesn't look too hot :-(

February 17, 2006

Re: $`, $', $&, $n - sugar or cyclamates? And other topics

Posted by Sean Kelly
in reply to Regan Heath

Sean Kelly

Posted in reply to Regan Heath

Regan Heath wrote:
> On Thu, 16 Feb 2006 16:36:48 -0800, Kris <fu@bar.com> wrote:
>> "Sean Kelly" <sean@f4.ca> wrote
>>> Kris wrote:
>>>> "Thomas Kuehne" <thomas-dloop@kuehne.cn> wrote ...
>>>>> Currently, every type - including void - can be used as the type on an
>>>>> array element. What would be the consequences for generic programming
>>>>> if T -> T[] isn't guaranteed to succeed?
>>>>
>>>> Easy fix ~ change the bool alias to byte, instead of bit :-)
>>>
>>> I already use byte in some cases :-)  But it lacks the boolean value
>>> safety of bit, so I tend to litter my code with asserts just to be sure
>>> something didn't get screwed up... or simply make sure I'm only comparing
>>> to zero and not-zero.  Either way, it's more error prone than I'd like.
>>
>> Yes, you're right of course. Would be just great if Walter would add a true
>> *cough* bool *cough* type that doesn't try to pack itself when used with
>> arrays.
> 
> A true bool would make several people happy.. but once one existed people would then want:
> 
> class A {}
> A a = new a();
> if (a) //error not boolean result.
> 
> right? That would bother me.

This is only a slippery slope if we want it to be ;-)  I think the intent behind adding 'bool' was twofold: first, 'bit' loses meaning if it never actually refers to a bit, and second, it allows 'bit' to be deprecated for a while so people can change their code.

>> Packed bits are great too, but for different reasons.
> 
> Indeed, I can think of several uses for packed bits. i.e.
>  - Using them as a bunch of flags, generally boolean on/off flags.
>  - Representing/disecting packed data, i.e. tcp headers.
>  - Assembling/converting data i.e. 8bit to 7bit characters for SMS messages.
> 
> all of these can be done with & | ^ etc but it would be nice, i.e. more readable, easier to write if we could index the data.

Aye.  I like the idea of packed bit arrays in general.  I just don't want them to be mandatory for the built-in boolean type--I run into too many situations where I want to do something that the existing syntax doesn't support and I'm stuck using an array of bytes instead.


Sean

February 17, 2006

Re: $`, $', $&, $n - sugar or cyclamates?

Posted by Walter Bright
in reply to David Medlock

Walter Bright

Posted in reply to David Medlock

"David Medlock" <noone@nowhere.com> wrote in message news:dt2mpk$17aj$1@digitaldaemon.com...
> I havent read this whole thread, but pardon if this has been suggested. Why doesnt the regular expression stuff use foreach?

Why, indeed. Oskar has brought it up, and he and you are right. I'm going to reevaluate this based on the feedback in this thread.

February 17, 2006

Re: $`, $', $&, $n - sugar or cyclamates? And other topics

Posted by Walter Bright
in reply to Regan Heath

Walter Bright

Posted in reply to Regan Heath

"Regan Heath" <regan@netwin.co.nz> wrote in message news:ops43d5lmc23k2f5@nrage.netwin.co.nz...
> In the end it's just syntactic sugar for & | and ^. The question is, does it make the code clearer, I think so. Does it make bit manipulation easier to code, I think so. Is that enough to make it a valuable feature?

I regularly do bit masking and shifting on ints. I'm so used to it, I don't think that adding sugar for it would help any.

February 17, 2006

Re: $`, $', $&, $n - sugar or cyclamates? And other topics

Posted by Sean Kelly
in reply to Walter Bright

Sean Kelly

Posted in reply to Walter Bright

Walter Bright wrote:
> 
> The compiler actually does not emit any explicit references to RegExp. It's all done by a reference to object._Match. _Match operates as a proxy to RegExp, but the compiler knows nothing about that.

This is really more of a library issue than a compiler issue.  My concern is that, since internal/object.d now imports std.regexp, the runtime code can no longer be built without at least a skeleton regexp module available.  And if the regexp implementation changes then the runtime must be rebuilt.  I'll admit that the current approach is probably best given that std.regexp exists and code duplication is a Bad Thing, but it still creates a language dependency on library code, even if the compiler isn't emitting RegExp calls directly.

> Regex is non-trivial. There's no way to have any sort of language support for it without it being in the library. Anyone working on D libraries or other things is welcome to use RegExp, so I am just not understanding what the problem is. Phobos isn't a closed shop, the license on the files allows anyone to do pretty much anything they want with it.

I agree.  And this works fine for Phobos.  But if Phobos is to be a template for future standard library implementations, then it should be designed in a way that allows for closed-source compiler implementations as well.

Also, what if a library writer decides to exploit the regular expression support provided by the language, and merely implements his RegExp class as a veneer over the built-in functionality?  It creates an odd sort of circular dependency.  I'd originally considered the same thing for UTF transcoding using the built-in foreach mechanism, but as that code is relatively simply it's not as much of an issue.

I assume there's no plan to remove std.regexp from Phobos now that language support is in place?

> Also, let me reiterate that the compiler does *not* emit any hardcoded references to RegExp, nor does it know anything at all about regex's. It uses object._Match, which is a proxy to whatever the language implementor wants to use.

Understood.  In fact I'll vouch for this since I've had a close look at the code.

Sean

February 17, 2006

Re: $`, $', $&, $n - sugar or cyclamates? And other topics

Posted by Derek Parnell
in reply to Regan Heath

Derek Parnell

Posted in reply to Regan Heath

On Fri, 17 Feb 2006 13:54:47 +1300, Regan Heath wrote:

> On Thu, 16 Feb 2006 16:36:48 -0800, Kris <fu@bar.com> wrote:
>> "Sean Kelly" <sean@f4.ca> wrote
>>> Kris wrote:
>>>> "Thomas Kuehne" <thomas-dloop@kuehne.cn> wrote ...
>>>>> Currently, every type - including void - can be used as the type on an array element. What would be the consequences for generic programming if T -> T[] isn't guaranteed to succeed?
>>>>
>>>> Easy fix ~ change the bool alias to byte, instead of bit :-)
>>>
>>> I already use byte in some cases :-)  But it lacks the boolean value
>>> safety of bit, so I tend to litter my code with asserts just to be sure
>>> something didn't get screwed up... or simply make sure I'm only
>>> comparing
>>> to zero and not-zero.  Either way, it's more error prone than I'd like.
>>
>> Yes, you're right of course. Would be just great if Walter would add a
>> true
>> *cough* bool *cough* type that doesn't try to pack itself when used with
>> arrays.
> 
> A true bool would make several people happy.. but once one existed people would then want:
> 
> class A {}
> A a = new a();
> if (a) //error not boolean result.
> 
> right? That would bother me.

I regard the syntax

   if ( <identifier> )

as shorthand for

    if ( <identifier> != 0 )

or

    if ( <identifier> !is null)

as appropriate, so this would not fall foul of a native boolean implementation.

-- 
Derek
(skype: derek.j.parnell)
Melbourne, Australia
"Down with mediocracy!"
17/02/2006 12:37:40 PM

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation