March 07, 2009
Burton Radons wrote:
> That's what we said about strings in 1.0. You modify it, you copy it,
> or you tell the user. The gentleman's agreement worked perfectly and
> that came without a mess of keywords, without implicit or explicit
> restrictions on behaviour, without having to condition templates.

The one flaw in it was the behavior I consistently saw of "I'm copying the string just to be sure I own it and nobody else changes it." D was meant for copy-on-write, which means copy the string *only* if you change it. No defensive copying. No "just in case" copying. The gentleman's agreement failed as far as I could tell.

With immutable strings, the gentleman's agreement is enforced.
March 07, 2009
Walter Bright wrote:
> Burton Radons wrote:
>> That's what we said about strings in 1.0. You modify it, you copy it,
>> or you tell the user. The gentleman's agreement worked perfectly and
>> that came without a mess of keywords, without implicit or explicit
>> restrictions on behaviour, without having to condition templates.
> 
> The one flaw in it was the behavior I consistently saw of "I'm copying the string just to be sure I own it and nobody else changes it." D was meant for copy-on-write, which means copy the string *only* if you change it. No defensive copying. No "just in case" copying. The gentleman's agreement failed as far as I could tell.
> 
> With immutable strings, the gentleman's agreement is enforced.

What about automatic, built-in copy on write?
March 07, 2009
grauzone wrote:
> Walter Bright wrote:
>> Burton Radons wrote:
>>> That's what we said about strings in 1.0. You modify it, you copy it,
>>> or you tell the user. The gentleman's agreement worked perfectly and
>>> that came without a mess of keywords, without implicit or explicit
>>> restrictions on behaviour, without having to condition templates.
>>
>> The one flaw in it was the behavior I consistently saw of "I'm copying the string just to be sure I own it and nobody else changes it." D was meant for copy-on-write, which means copy the string *only* if you change it. No defensive copying. No "just in case" copying. The gentleman's agreement failed as far as I could tell.
>>
>> With immutable strings, the gentleman's agreement is enforced.
> 
> What about automatic, built-in copy on write?

Then it would happen even when you *know* you're the only one with a reference. Worse, it'd happen multiple times if you modify multiple characters in a row...
March 07, 2009
grauzone wrote:
> Walter Bright wrote:
>> Burton Radons wrote:
>>> That's what we said about strings in 1.0. You modify it, you copy it,
>>> or you tell the user. The gentleman's agreement worked perfectly and
>>> that came without a mess of keywords, without implicit or explicit
>>> restrictions on behaviour, without having to condition templates.
>>
>> The one flaw in it was the behavior I consistently saw of "I'm copying the string just to be sure I own it and nobody else changes it." D was meant for copy-on-write, which means copy the string *only* if you change it. No defensive copying. No "just in case" copying. The gentleman's agreement failed as far as I could tell.
>>
>> With immutable strings, the gentleman's agreement is enforced.
> 
> What about automatic, built-in copy on write?

No go with threads. COW sounded like a great idea for std::string in ancient times when threads were a rarity. Today, virtually all C++ implementations actively dropped COW and replaced it with eager copy + small string optimization for short strings. D really has the best of all worlds solution.

Andrei
March 07, 2009
Walter Bright Wrote:

> Burton Radons wrote:
> > That's what we said about strings in 1.0. You modify it, you copy it, or you tell the user. The gentleman's agreement worked perfectly and that came without a mess of keywords, without implicit or explicit restrictions on behaviour, without having to condition templates.
> 
> The one flaw in it was the behavior I consistently saw of "I'm copying the string just to be sure I own it and nobody else changes it." D was meant for copy-on-write, which means copy the string *only* if you change it. No defensive copying. No "just in case" copying. The gentleman's agreement failed as far as I could tell.
> 
> With immutable strings, the gentleman's agreement is enforced.

Am I going to become a broken record on this? Because "invariant (char) []" is the string type, data that is going to be mutable will always find its way into that type in order to deal with an API which WILL use string as its arguments, not writing out "const (char) []". It gives me no information about the future of the object while removing the apparent need for the gentleman's agreement. Therefore I have no way of knowing what the actual pedigree of this string I've been given has. It may be invariant, it may be mutable.

I want this to be addressed directly. Exactly how am I wrong on this point? Is it not conceivable that mutable data gets casted to invariant in this case?
March 07, 2009
grauzone wrote:
> What about automatic, built-in copy on write?

Can't do it efficiently without hardware support.
March 07, 2009
On Sat, 07 Mar 2009 17:08:58 -0500, Burton Radons wrote:

> Am I going to become a broken record on this? Because
> "invariant (char) []" is the string type, data that
> is going to be mutable will always find its way into
> that type in order to deal with an API which WILL use
> string as its arguments, not writing out
> "const (char) []".

I'm starting to think that 'string' for function parameters should be a rare thing. For a function to insist that it only recieves immutable data sounds like the function is worried that it might accidently change data. And that sounds like a bug to me. It is shifting the responsibility to the caller for the data's integrity.

> It gives me no information about
> the future of the object while removing the apparent
> need for the gentleman's agreement. Therefore I have
> no way of knowing what the actual pedigree of this
> string I've been given has. It may be invariant, it
> may be mutable.

But why would your function care about that? Let's assume your function's signature is 'const' for its parameters because it does not intend to modify any of them. If the caller passes invariant data then your function cannot modify the arguments. If the caller passes mutable data, the compiler won't allow your function to modify the parameters either, due to the const signature. So why is it important that the function should know the mutability of the passed data?

-- 
Derek Parnell
Melbourne, Australia
skype: derek.j.parnell
March 07, 2009
If I may restate your case, it is that given function that does something with character arrays:

int foo(string s);

and you wish to pass a mutable character array to it. If foo was declared as:

int foo(const(char)[] s);

then it would just work. So why is it declared immutable(char)[] when that isn't actually necessary?

The answer is to encourage the use of immutable strings. I believe the future of programming will tend towards ever more use of immutable data, as immutable data:

1. is implicitly sharable between threads
2. is more conducive to static analysis of programs
3. makes it easier for programmers to understand code
4. enables better code generation
5. allows taking a private reference to without needing to make a copy

const(char)[], on the other hand, still leaves us with the temptation to make a copy "just in case". If I, as a user, sees:

int foo(const(char)[] s)

what if foo() keeps a private reference to s (which it might if it does lazy evaluation)? Now I, as a caller, mutate s[] and muck up foo. So, to fix it, I do:

foo(s.dup);    // defensive copy in case foo keeps a reference to s

But the implementor of foo() doesn't know it's getting its own private copy, so the first line of foo() is:

int foo(const(char)[] s)
{
    s = s.dup;   // make sure we own a copy
}

so the defensive, robust code has TWO unnecessary copies.
March 07, 2009
On Sat, 07 Mar 2009 14:43:50 -0800, Walter Bright wrote:

> int foo(const(char)[] s)
> 
> what if foo() keeps a private reference to s (which it might if it does lazy evaluation)? Now I, as a caller, mutate s[] and muck up foo. So, to fix it, I do:
> 
> foo(s.dup);    // defensive copy in case foo keeps a reference to s

In foo's defence, if it takes a private reference, then it should also take a copy. In fact, should it be allowed to take a private reference of data which might be modified after it returns?

-- 
Derek Parnell
Melbourne, Australia
skype: derek.j.parnell
March 07, 2009
Burton Radons wrote:
> Walter Bright Wrote:
> 
>> Burton Radons wrote:
>>> That's what we said about strings in 1.0. You modify it, you copy
>>> it, or you tell the user. The gentleman's agreement worked
>>> perfectly and that came without a mess of keywords, without
>>> implicit or explicit restrictions on behaviour, without having to
>>> condition templates.
>> The one flaw in it was the behavior I consistently saw of "I'm
>> copying the string just to be sure I own it and nobody else changes
>> it." D was meant for copy-on-write, which means copy the string
>> *only* if you change it. No defensive copying. No "just in case"
>> copying. The gentleman's agreement failed as far as I could tell.
>> 
>> With immutable strings, the gentleman's agreement is enforced.
> 
> Am I going to become a broken record on this? Because "invariant
> (char) []" is the string type, data that is going to be mutable will
> always find its way into that type in order to deal with an API which
> WILL use string as its arguments, not writing out "const (char) []".
> It gives me no information about the future of the object while
> removing the apparent need for the gentleman's agreement. Therefore I
> have no way of knowing what the actual pedigree of this string I've
> been given has. It may be invariant, it may be mutable.
> 
> I want this to be addressed directly. Exactly how am I wrong on this
> point? Is it not conceivable that mutable data gets casted to
> invariant in this case?

It is conceivable by means of a cast. I've explained that casts can break any of D's guarantees, so there is nothing new that you can masquerade a mutable string into an immutable one. If there was a means to implicitly convert a mutable string into an immutable one, you'd have a case. But it either looks like you're not understanding something, or are using a double standard when it comes about casting as applied to immutability in particular.

There is one point where we are forced to doing something gauche: assumeUnique. We could have avoided that by introducing a "unique" notion, but we thought we'd simplify the language by not doing so. So far the uses of assumeUnique seem to be idiomatic and contained enough to not be a threat, so it seems to have been a passable engineering decision.

To recap, if an API takes a string and all you have a char[], DO NOT CAST IT. Call .idup - better safe than sorry. The API may evolve and store a reference for later. Case in point: the up-and-coming std.stdio.File constructor initially was:

this(in char[] filename);

Later on I decided to save the filename for error message reporting and the such. Now I had two choices:

(1) Leave the signature unchanged and issue an idup:

this.filename = to!string(filename); // issues an idup

(2) Change the signature to

this(string filename);

Now all client code that DID pass a string in the first place (the vast majority) was safe _and_ efficient. The minority of client code was that that had a char[] or a const(char)[] at hand. That code did not compile, so it had to insert a to!string on the caller side.

As has been copiously shown in other languages, the need for character-level mutable string is rather rare. So most of the time you will not traffic in char[], but instead you'll have a immutable(char)[] to start with. This further erodes the legitimacy of your concern.

I have no idea how to make this any more clearer. I explained it so many times and in so many ways, even I understood it :o).


Andrei