March 07, 2009
Walter Bright wrote:
> If I may restate your case, it is that given function that does something with character arrays:
> 
> int foo(string s);
> 
> and you wish to pass a mutable character array to it. If foo was declared as:
> 
> int foo(const(char)[] s);
> 
> then it would just work. So why is it declared immutable(char)[] when that isn't actually necessary?
> 
> The answer is to encourage the use of immutable strings. I believe the future of programming will tend towards ever more use of immutable data, as immutable data:
> 
> 1. is implicitly sharable between threads

In fact const data is also implicitly sharable between threads. This is because shared is not implicitly convertible to const. No?

Andrei
March 08, 2009
Sat, 07 Mar 2009 15:19:50 -0800, Andrei Alexandrescu wrote:

> To recap, if an API takes a string and all you have a char[], DO NOT CAST IT. Call .idup - better safe than sorry. The API may evolve and store a reference for later. Case in point: the up-and-coming std.stdio.File constructor initially was:
> 
> this(in char[] filename);
> 
> Later on I decided to save the filename for error message reporting and the such. Now I had two choices:
> 
> (1) Leave the signature unchanged and issue an idup:
> 
> this.filename = to!string(filename); // issues an idup
> 
> (2) Change the signature to
> 
> this(string filename);
> 
> Now all client code that DID pass a string in the first place (the vast majority) was safe _and_ efficient. The minority of client code was that that had a char[] or a const(char)[] at hand. That code did not compile, so it had to insert a to!string on the caller side.
> 
> As has been copiously shown in other languages, the need for character-level mutable string is rather rare. So most of the time you will not traffic in char[], but instead you'll have a immutable(char)[] to start with. This further erodes the legitimacy of your concern.

My file names are constructed most of the time.  And most of the time they are simple char[]s.

It is not obvious that File should store the file name.  It's not strictly necessary.  It's an *implementation detail.*  Now you expose this implementation detail through the class interface, and you do this without any good reason.  You save a 150 byte allocation per file. Nice.

I can understand when a hash takes an immutable key.  It's in the hash's contract.  Various lazy functions could take immutable input to guarantee correct lazy execution.  But I think that overall use of immutable types should be rare and thoroughly thought-out.  They should be used only when it's absolutely, provably necessary.  That's why I think aliasing string as immutable is a mistake.  It felt wrong when I discovered D a year ago, and it feels wrong now.
March 08, 2009
Sergey Gromov wrote:
> Sat, 07 Mar 2009 15:19:50 -0800, Andrei Alexandrescu wrote:
> 
>> To recap, if an API takes a string and all you have a char[], DO NOT CAST IT. Call .idup - better safe than sorry. The API may evolve and store a reference for later. Case in point: the up-and-coming std.stdio.File constructor initially was:
>>
>> this(in char[] filename);
>>
>> Later on I decided to save the filename for error message reporting and the such. Now I had two choices:
>>
>> (1) Leave the signature unchanged and issue an idup:
>>
>> this.filename = to!string(filename); // issues an idup
>>
>> (2) Change the signature to
>>
>> this(string filename);
>>
>> Now all client code that DID pass a string in the first place (the vast majority) was safe _and_ efficient. The minority of client code was that that had a char[] or a const(char)[] at hand. That code did not compile, so it had to insert a to!string on the caller side.
>>
>> As has been copiously shown in other languages, the need for character-level mutable string is rather rare. So most of the time you will not traffic in char[], but instead you'll have a immutable(char)[] to start with. This further erodes the legitimacy of your concern.
> 
> My file names are constructed most of the time.  And most of the time
> they are simple char[]s.

Ehm. Mine are also constructed, but somehow come in string format, e.g.:

string basename;
...
auto f = File(basename ~ ".txt");

> It is not obvious that File should store the file name.  It's not
> strictly necessary.  It's an *implementation detail.*  Now you expose
> this implementation detail through the class interface, and you do this
> without any good reason.  You save a 150 byte allocation per file.
> Nice.

It's just an example, the point being that there things are always fast and safe. In many cases there's much more at stake and you can't rely on idioms that allocate memory needlessly.

> I can understand when a hash takes an immutable key.  It's in the hash's
> contract.  Various lazy functions could take immutable input to
> guarantee correct lazy execution.  But I think that overall use of
> immutable types should be rare and thoroughly thought-out.  They should
> be used only when it's absolutely, provably necessary.  That's why I
> think aliasing string as immutable is a mistake.  It felt wrong when I
> discovered D a year ago, and it feels wrong now.

That may be because you are writing C in D. Immutable strings should allow solid coding without much friction.


Andrei
March 08, 2009
Andrei Alexandrescu Wrote:

> Sergey Gromov wrote:
> > Sat, 07 Mar 2009 15:19:50 -0800, Andrei Alexandrescu wrote:
> > 
> >> To recap, if an API takes a string and all you have a char[], DO NOT CAST IT. Call .idup - better safe than sorry. The API may evolve and store a reference for later. Case in point: the up-and-coming std.stdio.File constructor initially was:
> >>
> >> this(in char[] filename);
> >>
> >> Later on I decided to save the filename for error message reporting and the such. Now I had two choices:
> >>
> >> (1) Leave the signature unchanged and issue an idup:
> >>
> >> this.filename = to!string(filename); // issues an idup
> >>
> >> (2) Change the signature to
> >>
> >> this(string filename);
> >>
> >> Now all client code that DID pass a string in the first place (the vast majority) was safe _and_ efficient. The minority of client code was that that had a char[] or a const(char)[] at hand. That code did not compile, so it had to insert a to!string on the caller side.
> >>
> >> As has been copiously shown in other languages, the need for character-level mutable string is rather rare. So most of the time you will not traffic in char[], but instead you'll have a immutable(char)[] to start with. This further erodes the legitimacy of your concern.
> > 
> > My file names are constructed most of the time.  And most of the time they are simple char[]s.
> 
> Ehm. Mine are also constructed, but somehow come in string format, e.g.:
> 
> string basename;
> ...
> auto f = File(basename ~ ".txt");
> 
> > It is not obvious that File should store the file name.  It's not strictly necessary.  It's an *implementation detail.*  Now you expose this implementation detail through the class interface, and you do this without any good reason.  You save a 150 byte allocation per file. Nice.
> 
> It's just an example, the point being that there things are always fast and safe. In many cases there's much more at stake and you can't rely on idioms that allocate memory needlessly.

Your example above does allocate memory. A mutable string could potentially avoid allocating to append ".txt"


> > I can understand when a hash takes an immutable key.  It's in the hash's contract.  Various lazy functions could take immutable input to guarantee correct lazy execution.  But I think that overall use of immutable types should be rare and thoroughly thought-out.  They should be used only when it's absolutely, provably necessary.  That's why I think aliasing string as immutable is a mistake.  It felt wrong when I discovered D a year ago, and it feels wrong now.
> 
> That may be because you are writing C in D. Immutable strings should allow solid coding without much friction.
> 
> 
> Andrei

March 08, 2009
Andrei Alexandrescu wrote:
> Walter Bright wrote:
>> If I may restate your case, it is that given function that does something with character arrays:
>>
>> int foo(string s);
>>
>> and you wish to pass a mutable character array to it. If foo was declared as:
>>
>> int foo(const(char)[] s);
>>
>> then it would just work. So why is it declared immutable(char)[] when that isn't actually necessary?
>>
>> The answer is to encourage the use of immutable strings. I believe the future of programming will tend towards ever more use of immutable data, as immutable data:
>>
>> 1. is implicitly sharable between threads
> 
> In fact const data is also implicitly sharable between threads.

No. You have to declare it "shared const" to make it sharable between threads.
March 08, 2009
Derek Parnell wrote:
> On Sat, 07 Mar 2009 14:43:50 -0800, Walter Bright wrote:
> 
>> int foo(const(char)[] s)
>>
>> what if foo() keeps a private reference to s (which it might if it does lazy evaluation)? Now I, as a caller, mutate s[] and muck up foo. So, to fix it, I do:
>>
>> foo(s.dup);    // defensive copy in case foo keeps a reference to s
> 
> In foo's defence, if it takes a private reference, then it should also take
> a copy.

Yup, and as I said, an extra copy "just in case".

> In fact, should it be allowed to take a private reference of data
> which might be modified after it returns?
> 

Instead of adding more complexity to const so it acts more like immutable, why not just use immutable <g> ?
March 08, 2009
Sergey Gromov wrote:
> But I think that overall use of
> immutable types should be rare and thoroughly thought-out.  They should
> be used only when it's absolutely, provably necessary.

I suggest that that's exactly backwards <g>. Mutable types should be the rare, carefully considered ones.

> That's why I
> think aliasing string as immutable is a mistake.  It felt wrong when I
> discovered D a year ago, and it feels wrong now.

I know it feels wrong. That's the C background talking. I went through the same thing. It's sort of like OOP if you're used to C. It takes a while before it clicks, in the meantime, it feels wrong and stupid.
March 08, 2009
Walter Bright Wrote:

> If I may restate your case, it is that given function that does something with character arrays:
> 
> int foo(string s);
> 
> and you wish to pass a mutable character array to it. If foo was declared as:
> 
> int foo(const(char)[] s);
> 
> then it would just work. So why is it declared immutable(char)[] when that isn't actually necessary?

No, that's not the problem at all. The problem is this line in object.d:

    alias invariant (char) [] string;

There are two interesting features here. It's what D calls a string, and it's invariant, making the declaration that whoever has a reference to that string can hold onto it forever without ever expecting its contents to be modified or destroyed.

So, while building my string I use a function which replaces matching substrings in a string with another string. If that function were to declare my parameters as strings they'd do two things: they'd tell the reader that its parameters can never change over the course of the program because it may retain copies of the parameters. That is a strong, highly prescriptive statement. So I would expect the function to be implemented like this:

   const (char) [] replace (const (char) [] s, const (char) [] from, const (char) [] to)

But it's not. std.string.replace is implemented like this:

   string replace (string s, string from, string to)

This is for a number of reasons. It's easiest to assume that the default is going to be the correct one. The const syntax is hard to read, so it's avoided, and "string" is more readily descriptive than "const (char) []". So, I pass my mutable string to std.string.replace, which only accepts invariant data.

This wouldn't be too bad because const is worthless when optimising, but if invariant is going to be given any weight then we must never cause data to be casted to invariant unless if it's actually invariant data. So, the sensible default is "const (char) []" for strings, a selection of aliases in object.d for the others, and safe casting templates in object.d.
March 08, 2009
Jason House wrote:
> Andrei Alexandrescu Wrote:
> 
>> Sergey Gromov wrote:
>>> Sat, 07 Mar 2009 15:19:50 -0800, Andrei Alexandrescu wrote:
>>>
>>>> To recap, if an API takes a string and all you have a char[], DO NOT CAST IT. Call .idup - better safe than sorry. The API may evolve and store a reference for later. Case in point: the up-and-coming std.stdio.File constructor initially was:
>>>>
>>>> this(in char[] filename);
>>>>
>>>> Later on I decided to save the filename for error message reporting and the such. Now I had two choices:
>>>>
>>>> (1) Leave the signature unchanged and issue an idup:
>>>>
>>>> this.filename = to!string(filename); // issues an idup
>>>>
>>>> (2) Change the signature to
>>>>
>>>> this(string filename);
>>>>
>>>> Now all client code that DID pass a string in the first place (the vast majority) was safe _and_ efficient. The minority of client code was that that had a char[] or a const(char)[] at hand. That code did not compile, so it had to insert a to!string on the caller side.
>>>>
>>>> As has been copiously shown in other languages, the need for character-level mutable string is rather rare. So most of the time you will not traffic in char[], but instead you'll have a immutable(char)[] to start with. This further erodes the legitimacy of your concern.
>>> My file names are constructed most of the time.  And most of the time
>>> they are simple char[]s.
>> Ehm. Mine are also constructed, but somehow come in string format, e.g.:
>>
>> string basename;
>> ...
>> auto f = File(basename ~ ".txt");
>>
>>> It is not obvious that File should store the file name.  It's not
>>> strictly necessary.  It's an *implementation detail.*  Now you expose
>>> this implementation detail through the class interface, and you do this
>>> without any good reason.  You save a 150 byte allocation per file.
>>> Nice.
>> It's just an example, the point being that there things are always fast and safe. In many cases there's much more at stake and you can't rely on idioms that allocate memory needlessly.
> 
> Your example above does allocate memory. A mutable string could potentially avoid allocating to append ".txt"

It does, and for a good reason - File stores an alias of it. If it didn't have to, it would have accepted const, in which case a mutable string would have sufficed.

Andrei
March 08, 2009
Walter Bright wrote:
> Andrei Alexandrescu wrote:
>> Walter Bright wrote:
>>> If I may restate your case, it is that given function that does something with character arrays:
>>>
>>> int foo(string s);
>>>
>>> and you wish to pass a mutable character array to it. If foo was declared as:
>>>
>>> int foo(const(char)[] s);
>>>
>>> then it would just work. So why is it declared immutable(char)[] when that isn't actually necessary?
>>>
>>> The answer is to encourage the use of immutable strings. I believe the future of programming will tend towards ever more use of immutable data, as immutable data:
>>>
>>> 1. is implicitly sharable between threads
>>
>> In fact const data is also implicitly sharable between threads.
> 
> No. You have to declare it "shared const" to make it sharable between threads.

Sorry, I got confused. What I meant was that a function accepting a const T can count on other threads leaving T alone, which is the converse of what you say. Cool!

Andrei