March 08, 2009
Andrei Alexandrescu wrote:

> Jason House wrote:
>> Andrei Alexandrescu Wrote:
>> 
>>> Sergey Gromov wrote:
>>>> Sat, 07 Mar 2009 15:19:50 -0800, Andrei Alexandrescu wrote:
>>>>
>>>>> To recap, if an API takes a string and all you have a char[], DO NOT CAST IT. Call .idup - better safe than sorry. The API may evolve and store a reference for later. Case in point: the up-and-coming std.stdio.File constructor initially was:
>>>>>
>>>>> this(in char[] filename);
>>>>>
>>>>> Later on I decided to save the filename for error message reporting and the such. Now I had two choices:
>>>>>
>>>>> (1) Leave the signature unchanged and issue an idup:
>>>>>
>>>>> this.filename = to!string(filename); // issues an idup
>>>>>
>>>>> (2) Change the signature to
>>>>>
>>>>> this(string filename);
>>>>>
>>>>> Now all client code that DID pass a string in the first place (the vast majority) was safe _and_ efficient. The minority of client code was that that had a char[] or a const(char)[] at hand. That code did not compile, so it had to insert a to!string on the caller side.
>>>>>
>>>>> As has been copiously shown in other languages, the need for character-level mutable string is rather rare. So most of the time you will not traffic in char[], but instead you'll have a immutable(char)[] to start with. This further erodes the legitimacy of your concern.
>>>> My file names are constructed most of the time.  And most of the time they are simple char[]s.
>>> Ehm. Mine are also constructed, but somehow come in string format, e.g.:
>>>
>>> string basename;
>>> ...
>>> auto f = File(basename ~ ".txt");
>>>
>>>> It is not obvious that File should store the file name.  It's not strictly necessary.  It's an *implementation detail.*  Now you expose this implementation detail through the class interface, and you do this without any good reason.  You save a 150 byte allocation per file. Nice.
>>> It's just an example, the point being that there things are always fast and safe. In many cases there's much more at stake and you can't rely on idioms that allocate memory needlessly.
>> 
>> Your example above does allocate memory. A mutable string could potentially avoid allocating to append ".txt"
> 
> It does, and for a good reason - File stores an alias of it. If it didn't have to, it would have accepted const, in which case a mutable string would have sufficed.


I think you missed my point, but I was partly being a devil's advocate...  I probably should not be fanning the flames for this thread, so I'll be quiet

March 08, 2009
Andrei Alexandrescu wrote:
> Sorry, I got confused. What I meant was that a function accepting a const T can count on other threads leaving T alone, which is the converse of what you say.

Yes.

 Cool!
> 
> Andrei
March 08, 2009
Burton Radons wrote:
> This wouldn't be too bad because const is worthless when optimising,
> but if invariant is going to be given any weight then we must never
> cause data to be casted to invariant unless if it's actually
> invariant data. So, the sensible default is "const (char) []" for
> strings, a selection of aliases in object.d for the others, and safe
> casting templates in object.d.

What I interpret from this is that you see strings as fundamentally mutable character arrays, and sometimes in special cases you can make them immutable. I propose turning that view on its head - regard strings as fundamentally immutable, and having a mutable char array is a rare thing that only appears in isolated places in the program.

In the find() example, the implementation of it actually uses a mutable char[] to build the result. When the result is done, it is converted to immutable and "published" by returning it. The mutable array never escapes the function; it is completely sandboxed in.

What sold me on immutable strings was going through my code and looking to see where I *actually* was mutating the strings in place rather than just passing them around or storing them or copying them into another buffer. It turns out it was a vanishingly small number. I was startled. Not only that, those places could be, with a minor bit of refactoring, further reduced in number without sacrifice. I stacked this against the gain by eliminating all those places that were doing copies, and it was clear that immutable strings as default was a winner.
March 08, 2009
Walter Bright Wrote:

> Burton Radons wrote:
> > This wouldn't be too bad because const is worthless when optimising, but if invariant is going to be given any weight then we must never cause data to be casted to invariant unless if it's actually invariant data. So, the sensible default is "const (char) []" for strings, a selection of aliases in object.d for the others, and safe casting templates in object.d.
> 
> What I interpret from this is that you see strings as fundamentally mutable character arrays, and sometimes in special cases you can make them immutable. I propose turning that view on its head - regard strings as fundamentally immutable, and having a mutable char array is a rare thing that only appears in isolated places in the program.

No, I don't. You are misunderstanding me, and I'm not sure why or how. Here's a (contrived) example of where my concern may come into play:

   int [] a = new int [1];

   a [0] = 1;

   auto b = cast (invariant (int) []) a;

   a [0] += b [0];
   a [0] += b [0];
   writef ("%s\n", a [0]);
   // Normal result: 4.
   // Optimiser which assumes invariant data can't change: 3

Yes, the code is an abuse of the const system. THAT'S EXACTLY MY POINT. Casting mutable data to invariant leads to situations like these. Only data which will never change can be made invariant. Putting "alias invariant (char) [] string" in object.d induces these situations and makes it seem like it's a good idea.
March 08, 2009
Burton Radons wrote:
> Walter Bright Wrote:
> 
>> Burton Radons wrote:
>>> This wouldn't be too bad because const is worthless when
>>> optimising, but if invariant is going to be given any weight then
>>> we must never cause data to be casted to invariant unless if it's
>>> actually invariant data. So, the sensible default is "const
>>> (char) []" for strings, a selection of aliases in object.d for
>>> the others, and safe casting templates in object.d.
>> What I interpret from this is that you see strings as fundamentally
>>  mutable character arrays, and sometimes in special cases you can
>> make them immutable. I propose turning that view on its head -
>> regard strings as fundamentally immutable, and having a mutable
>> char array is a rare thing that only appears in isolated places in
>> the program.
> 
> No, I don't. You are misunderstanding me, and I'm not sure why or
> how.

I guess I just cannot figure out where you're coming from.

> Here's a (contrived) example of where my concern may come into
> play:
> 
> int [] a = new int [1];
> 
> a [0] = 1;
> 
> auto b = cast (invariant (int) []) a;
> 
> a [0] += b [0]; a [0] += b [0]; writef ("%s\n", a [0]); // Normal
> result: 4. // Optimiser which assumes invariant data can't change: 3
> 
> Yes, the code is an abuse of the const system. THAT'S EXACTLY MY
> POINT. Casting mutable data to invariant leads to situations like
> these. Only data which will never change can be made invariant.
> Putting "alias invariant (char) [] string" in object.d induces these
> situations and makes it seem like it's a good idea.

I'm still not understanding you, because this is a contrived example that I cannot see the point of nor can I see where it would be legitimately used.
March 08, 2009
On Sat, 07 Mar 2009 23:40:52 -0800, Walter Bright wrote:

> I'm still not understanding you, because this is a contrived example that I cannot see the point of nor can I see where it would be legitimately used.

I can see Burton's concern, and I'm very surprised that the compiler allows this to happen. Here is a slightly more explicit version of Burton's code.

import std.stdio;
void main()
{
  int [] a = new int [1];

   a [0] = 1;

   invariant (int) [] b = cast (invariant (int) []) a;

   writef ("a=%s b=%s\n", a [0], b[0]);
   a [0] += b [0];
   writef ("a=%s b=%s\n", a [0], b[0]);
   a [0] += b [0];
   writef ("a=%s b=%s\n", a [0], b[0]);
}

The problem is that we have declared 'b' as invariant, but the program is
allowed to change it. That is the issue.

-- 
Derek Parnell
Melbourne, Australia
skype: derek.j.parnell
March 08, 2009
Derek Parnell wrote:
> import std.stdio;
> void main()
> {
>   int [] a = new int [1];
> 
>    a [0] = 1;
> 
>    invariant (int) [] b = cast (invariant (int) []) a;
> 
>    writef ("a=%s b=%s\n", a [0], b[0]);
>    a [0] += b [0];
>    writef ("a=%s b=%s\n", a [0], b[0]);
>    a [0] += b [0];
>    writef ("a=%s b=%s\n", a [0], b[0]);
> }
>    The problem is that we have declared 'b' as invariant, but the program is
> allowed to change it. That is the issue.
> 

The following also compiles:

char c;
int* p = cast(int*)&c;
*p = 5;

and is clearly buggy code. Whenever you use a cast, the onus is on the programmer to know what they are doing. The cast is an escape from the typing system.
March 08, 2009
Walter Bright Wrote:

> Burton Radons wrote:
> > Walter Bright Wrote:
> > 
> >> Burton Radons wrote:
> >>> This wouldn't be too bad because const is worthless when optimising, but if invariant is going to be given any weight then we must never cause data to be casted to invariant unless if it's actually invariant data. So, the sensible default is "const (char) []" for strings, a selection of aliases in object.d for the others, and safe casting templates in object.d.
> >> What I interpret from this is that you see strings as fundamentally
> >>  mutable character arrays, and sometimes in special cases you can
> >> make them immutable. I propose turning that view on its head -
> >> regard strings as fundamentally immutable, and having a mutable
> >> char array is a rare thing that only appears in isolated places in
> >> the program.
> > 
> > No, I don't. You are misunderstanding me, and I'm not sure why or how.
> 
> I guess I just cannot figure out where you're coming from.
> 
> > Here's a (contrived) example of where my concern may come into
> > play:
> > 
> > int [] a = new int [1];
> > 
> > a [0] = 1;
> > 
> > auto b = cast (invariant (int) []) a;
> > 
> > a [0] += b [0]; a [0] += b [0]; writef ("%s\n", a [0]); // Normal result: 4. // Optimiser which assumes invariant data can't change: 3
> > 
> > Yes, the code is an abuse of the const system. THAT'S EXACTLY MY POINT. Casting mutable data to invariant leads to situations like these. Only data which will never change can be made invariant. Putting "alias invariant (char) [] string" in object.d induces these situations and makes it seem like it's a good idea.
> 
> I'm still not understanding you, because this is a contrived example that I cannot see the point of nor can I see where it would be legitimately used.

Obviously I made it contrived so that it's as clear as possible what the issue is. In reality, it will be going through more layers. Here's one layer:

   int [] a = new int [1];
   a [0] = 1;

   invariant (int) [] func (invariant (int) [] a) { return a; }

   auto b = func (cast (invariant (int) []) a);

Notice this has the same pattern as std.string.replace; that's why I did that cast.

   a [0] += b [0];
   a [0] += b [0];
   writef ("%s\n", a [0]);
   // Not optimised: 4.
   // Assuming b cannot be modified: 3.

When this actually crops up in bugs the reality will be far more complex and practically impossible to discover.

I think I've stated this warning a half-dozen times in the last three days, and that's it, I'm done.
March 08, 2009
On Sun, 08 Mar 2009 00:36:06 -0800, Walter Bright wrote:

> Derek Parnell wrote:
>> import std.stdio;
>> void main()
>> {
>>   int [] a = new int [1];
>> 
>>    a [0] = 1;
>> 
>>    invariant (int) [] b = cast (invariant (int) []) a;
>> 
>>    writef ("a=%s b=%s\n", a [0], b[0]);
>>    a [0] += b [0];
>>    writef ("a=%s b=%s\n", a [0], b[0]);
>>    a [0] += b [0];
>>    writef ("a=%s b=%s\n", a [0], b[0]);
>> }
>> 
>> The problem is that we have declared 'b' as invariant, but the program is
>> allowed to change it. That is the issue.
>> 
> 
> The following also compiles:
> 
> char c;
> int* p = cast(int*)&c;
> *p = 5;
> 
> and is clearly buggy code. Whenever you use a cast, the onus is on the programmer to know what they are doing. The cast is an escape from the typing system.

Walter, you have side-stepped the problem in question by talking about a totally different problem.

Burtons code says "b is invariant", but the program allows it to be changed. Your code does NOT say that any of those variables is invariant. The problem is NOT with the cast (although that is a totally different issue). The problem is that the code says "invariant" but the data gets changed anyhow. The method of changing the data is not the issue. The issue is that is gets changed at all.

-- 
Derek Parnell
Melbourne, Australia
skype: derek.j.parnell
March 08, 2009
Derek Parnell wrote:
> Walter, you have side-stepped the problem in question by talking about a
> totally different problem.

It's the same issue. When you use a cast, you are subverting the type system. That means you have to be sure you are doing it right. The compiler cannot help you.

> Burtons code says "b is invariant", but the program allows it to be
> changed. Your code does NOT say that any of those variables is invariant.
> The problem is NOT with the cast (although that is a totally different
> issue). The problem is that the code says "invariant" but the data gets
> changed anyhow. The method of changing the data is not the issue. The issue
> is that is gets changed at all.

When you cast something to immutable, you can no longer change it. It's a one-way ticket.