View mode: basic / threaded / horizontal-split · Log in · Help
December 28, 2011
Re: string is rarely useful as a function argument
On 12/28/2011 06:40 PM, Andrei Alexandrescu wrote:
> On 12/28/11 11:11 AM, Walter Bright wrote:
>> On 12/28/2011 4:06 AM, Peter Alexander wrote:
>>> I rarely *ever* need an immutable string. What I usually need is
>>> const(char)[].
>>> I'd say 99%+ of the time I need only a const string.
>>
>> I have a very different experience with strings. I can't even remember a
>> case where I wanted to modify an existing string (this includes all my C
>> and C++ usage of strings). It's always assemble a string at one place,
>> and then refer to that string ever after (and never modify it).
>>
>> What immutable strings make possible is treating strings as if they were
>> value types. Nearly every language I know of treats them as immutable
>> except for C and C++.
>
> I remember the day at Kahili we figured immutable(char)[] will just work
> as it needs to. It felt pretty awesome.
>
> Andrei

I agree. But I am confused by the fact that you are suggesting it 
actually does not work as it needs to at other places in this thread.
December 28, 2011
Re: string is rarely useful as a function argument
On Wednesday, 28 December 2011 at 19:48:28 UTC, Adam D. Ruppe 
wrote:
> On Wednesday, 28 December 2011 at 19:30:04 UTC, Andrei 
> Alexandrescu wrote:
>> Implementation would entail a change in the compiler.
>
> I don't think I agree. Wouldn't something like this work?
>
> ===
>
> struct string {
>       immutable(char)[] rep;
>       alias rep this;
>       auto opAssign(immutable(char)[] rhs) {
>               rep = rhs;
>               return this;
>       }
>
>       this(immutable(char)[] rhs) {
>               rep = rhs;
>       }
>       // disable these here so it isn't passed on to .rep
>       @disable void opSlice(){  assert(0);  };
>       @disable size_t length() {  assert(0);  };
> }
>
> ===
>
> I did some quick tests and the basics seemed ok:
>
> /* paste impl from above */
>
> import std.string : replace;
>
> void main() {
>       string a = "test"; // works
>
>       a = a.replace("test", "mang"); // works
>       // a = a[0..1]; // correctly fails to compile
>       assert(0, a); // works
> }

My thinking exactly. Of course we can't put "@disable" right away 
and should start with "@deprecated" to allow for a proper 
migration period.
I'd also like a transition of the string related functions to 
this type. the previous ones can remain as simple 
wrappers/aliases/whatever for backwards compatibility.
December 28, 2011
Re: string is rarely useful as a function argument
On 12/28/2011 08:55 PM, foobar wrote:
> On Wednesday, 28 December 2011 at 19:38:53 UTC, Timon Gehr wrote:
> [snip]
>>>
>>> I'm all for making string a properly encapsulated type.
>>
>> In what way would the proposed change improve encapsulation, and why
>> would it even be desirable for such a basic data structure?
>
> I'm not sure what are you asking here. Are you asking what are the
> benefits of encapsulation?

I know the benefits of encapsulation and none of them applies here. The 
proposed change is nothing but a breaking interface change.

> This topic was discussed to death more than
> once and I'd suggest searching the NG archives for the details. Also, If
> you hadn't already I'd suggest reading about Unicode and its levels of
> abstraction: code point, code units, graphemes, etc...
>

'char' is a code unit. Therefore that is the level of abstraction the 
data type char[] provides.
December 28, 2011
Re: string is rarely useful as a function argument
On Wednesday, 28 December 2011 at 20:01:15 UTC, foobar wrote:
> I'd also like a transition of the string related functions to 
> this type. the previous ones can remain as simple 
> wrappers/aliases/whatever for backwards compatibility.

I actually like strings just the way they are... but if
we had to change, I'm sure we can do a good job in the
library relatively easily.
December 28, 2011
Re: string is rarely useful as a function argument
On 12/28/2011 08:00 PM, Andrei Alexandrescu wrote:
> On 12/28/11 12:46 PM, Walter Bright wrote:
>> On 12/28/2011 10:35 AM, Peter Alexander wrote:
>>> On 28/12/11 6:15 PM, Walter Bright wrote:
>>>> If such a change is made, then people will use const string when they
>>>> mean immutable, and the values underneath are not guaranteed to be
>>>> consistent.
>>>
>>> Then people should learn what const and immutable mean!
>>>
>>> I don't think it's fair to dismiss my suggestion on the grounds that
>>> people
>>> don't understand the language.
>>
>> People do what is convenient, and as endless experience shows, doing the
>> right thing should be easier than doing the wrong thing. If you present
>> people with a choice:
>>
>> #1: string s;
>> #2: immutable(char)[] s;
>>
>> sure as the sun rises, they will type the former, and it will be subtly
>> incorrect if string is const(char)[].
>>
>> Telling people they should know better and pick #2 instead is a strategy
>> that never works very well - not for programming, nor any other endeavor.
>
> Oh, one more thing - one good thing that could come out of this thread
> is abolition (through however slow a deprecation path) of s.length and
> s[i] for narrow strings. Requiring s.rep.length instead of s.length and
> s.rep[i] instead of s[i] would improve the quality of narrow strings
> tremendously. Also, s.rep[i] should return ubyte/ushort, not char/wchar.

Why? char and wchar are unicode code units, ubyte/ushort are unsigned 
integrals. It is clear that char/wchar are a better match.


> Then, people would access the decoding routines on the needed occasions,
> or would consciously use the representation.
>
> Yum.
>
>
> Andrei
December 28, 2011
Re: string is rarely useful as a function argument
On Wednesday, December 28, 2011 21:25:39 Timon Gehr wrote:
> Why? char and wchar are unicode code units, ubyte/ushort are unsigned
> integrals. It is clear that char/wchar are a better match.

It's an issue of the correct usage being the easy path. As it stands, it's 
incredibly easy to use narrow strings incorrectly. By forcing any array of 
char or wchar to use .rep.length instead of .length, the relatively automatic 
(and generally incorrect) usage of .length on a string wouldn't immediately 
work. It would force you to work more at doing the wrong thing. Unfortunately, 
walkLength isn't necessarily any easier than .rep.length, but it does force 
people to look into why they can't do .length, which will generally better 
educate them and will hopefully reduce the misuse of narrow strings.

If we make rep ubyte[] and ushort[] for char[] and wchar[] respectively, then 
we reinforce the fact that you shouldn't operate on chars or wchars. It also 
makes it simply for the compiler to never allow you to use length on char[] or 
wchar[], since it doesn't have to worry about whether you got that char[] or 
wchar[] from a rep property or not.

Now, I don't know if this is really a good move at this point. If we were to 
really do this right, we'd need to disallow indexing and slicing of the char[] 
and wchar[] as well, which would break that much more code. It also pretty 
quickly makes it look like string should be its own type rather than an array, 
since it's acting less and less like an array. Not to mention, even the 
correct usage of .rep would become rather irritating (e.g. slicing it when you 
know that the indicies that you're dealing with aren't going to cut into any 
code points), because you'd have to cast from ubyte[] to char[] whenever you 
did that.

So, I think that the general sentiment behind this is a good one, but I don't 
know if the exact idea is ultimately a good one - particularly at this stage 
in the game. If we're going to make a change like this which would break as 
much code as this would, we'd need to be _very_ certain that it's what we want 
to do.

- Jonathan M Davis
December 28, 2011
Re: string is rarely useful as a function argument
On Wednesday, December 28, 2011 10:27:15 Andrei Alexandrescu wrote:
> I'm afraid you're wrong here. The current setup is very good, and much
> better than one in which "string" would be an alias for const(char)[].
> 
> The problem is escaping. A function that transitorily operates on a
> string indeed does not care about the origin of the string, but storing
> a string inside an object is a completely different deal. The setup
> 
> class Query
> {
>      string name;
>      ...
> }
> 
> is safe, minimizes data copying, and never causes surprises to anyone
> ("I set the name of my query and a little later it's all messed up!").
> 
> So immutable(char)[] is the best choice for a correct string abstraction
> compared against both char[] and const(char)[]. In fact it's in a way
> good that const(char)[] takes longer to type, because it also carries
> larger liabilities.
> 
> If you want to create a string out of a char[] or const(char)[], use
> std.conv.to or the unsafe assumeUnique.

Agreed. And for a number of functions, taking const(char)[] would be worse, 
because they would have to dup or idup the string, whereas with 
immutable(char)[], they can safely slice it without worrying about its value 
changing.

I think that if we want to make it so that immutable(char)[] isn't forced as 
much, then we need to make proper use of templates (which also can allow you 
to not force char over wchar or dchar) and inout - and perhaps in some cases, 
a templated function could allow you to indicate what type of character you 
want returned. But in general, string is by far the most useful and least 
likely to cause bugs with slicing. So, I think that string should remain 
immutable(char)[].

- Jonathan M Davis
December 28, 2011
Re: string is rarely useful as a function argument
On Wednesday, December 28, 2011 19:25:15 Jakob Ovrum wrote:
> Also, 'in char[]', which is conceptually much safer, isn't that
> much longer to type.
> 
> It would be cool if 'scope' was actually implemented apart from
> an optimization though.

in char[] is _not_ safer than immutable(char)[]. In fact it's _less_ safe. 
Itals also far more restrictive. Many, many functions return a portion of the 
string that they are passed in. That slicing would be impossible with scope, 
and because in char[] makes no guarantees about the elements not changing 
after the function call, you'd often have to dup or idup it in order to avoid 
bugs. immutable(char)[] avoids all of that. You can safely slice it without 
having to worry about duping it to avoid it changing out from under you.

- Jonathan M Davis
December 28, 2011
Re: string is rarely useful as a function argument
Apparently my previous post was lost. Apologies if this comes out twice.

On 12/28/2011 09:39 PM, Jonathan M Davis wrote:
> On Wednesday, December 28, 2011 21:25:39 Timon Gehr wrote:
>> Why? char and wchar are unicode code units, ubyte/ushort are unsigned
>> integrals. It is clear that char/wchar are a better match.
>
> It's an issue of the correct usage being the easy path. As it stands, it's
> incredibly easy to use narrow strings incorrectly. By forcing any array of
> char or wchar to use .rep.length instead of .length, the relatively automatic
> (and generally incorrect) usage of .length on a string wouldn't immediately
> work. It would force you to work more at doing the wrong thing. Unfortunately,
> walkLength isn't necessarily any easier than .rep.length, but it does force
> people to look into why they can't do .length, which will generally better
> educate them and will hopefully reduce the misuse of narrow strings.
>

I was educated enough not to make that mistake, because I read the 
entire language specification before deciding the language was awesome 
and downloading the compiler. I find it strange that the product should 
be made less usable because we do not expect users to read the manual. 
But it is of course a valid point.

> If we make rep ubyte[] and ushort[] for char[] and wchar[] respectively, then
> we reinforce the fact that you shouldn't operate on chars or wchars.

There is nothing wrong with operating at the code unit level. Efficient 
slicing is very desirable.

> It also
> makes it simply for the compiler to never allow you to use length on char[] or
> wchar[], since it doesn't have to worry about whether you got that char[] or
> wchar[] from a rep property or not.
>
> Now, I don't know if this is really a good move at this point. If we were to
> really do this right, we'd need to disallow indexing and slicing of the char[]
> and wchar[] as well, which would break that much more code. It also pretty
> quickly makes it look like string should be its own type rather than an array,
> since it's acting less and less like an array.

Exactly. It is acting less and less like an array of code units. But it 
*is* an array of code units. If the general consensus is that we need a 
string data type that acts at a different abstraction level by default 
(with which I'd disagree, but apparently I don't have a popular opinion 
here), then we need a string type in the standard library to do that. 
Changing the language so that an array of code units stops behaving like 
an array of code units is not a solution.

> Not to mention, even the
> correct usage of .rep would become rather irritating (e.g. slicing it when you
> know that the indicies that you're dealing with aren't going to cut into any
> code points), because you'd have to cast from ubyte[] to char[] whenever you
> did that.
>
> So, I think that the general sentiment behind this is a good one, but I don't
> know if the exact idea is ultimately a good one - particularly at this stage
> in the game. If we're going to make a change like this which would break as
> much code as this would, we'd need to be _very_ certain that it's what we want
> to do.
>
> - Jonathan M Davis

I agree.
December 28, 2011
Re: string is rarely useful as a function argument
On Wednesday, 28 December 2011 at 20:49:54 UTC, Jonathan M Davis 
wrote:
> On Wednesday, December 28, 2011 19:25:15 Jakob Ovrum wrote:
>> Also, 'in char[]', which is conceptually much safer, isn't that
>> much longer to type.
>> 
>> It would be cool if 'scope' was actually implemented apart from
>> an optimization though.
>
> in char[] is _not_ safer than immutable(char)[].

I didn't say it was. Please read more closely.
1 2 3 4 5 6 7 8 9
Top | Discussion index | About this forum | D home