March 07, 2009
Walter Bright wrote:
> "Nobody notices they are immutable, they just work."
> 
> So what is it about immutability that makes strings "just work" in a natural and intuitive manner? The insight is that it enables strings, which are reference types, to behave exactly as if they were value types.
> 
> After all, it never occurs to anyone to think that the integer 123 could be a "mutable" integer and perhaps be 133 the next time you look at it. If you put 123 into a variable, it stays 123. It's immutable. People intuitively expect strings to behave the same way. Only C programmers expect that once they assign a string to a variable, that string may change in place.
> 
> C has it backwards by making strings mutable, and it's one of the main reasons why dealing with strings in C is such a gigantic pain. But as a longtime C programmer, I was so used to that I didn't notice what a pain it was until I started using other languages where string manipulation was a breeze.

I could fall into infinite loop while agreeing with you.

Cheers
March 07, 2009
Walter Bright wrote:
...
> The way to do strings in D is to have them be immutable. If you are building a string by manipulating its parts, start with mutable, when finished then convert it to immutable and 'publish' it to the rest of the program. Mutable char[] arrays should only exist as temporaries. This is exactly the opposite of the way one does it in C, but if you do it this way, you'll find you never need to defensively dup the string "just in case" and things just seem to naturally work out.
> 
> I tend to agree that if you try to do strings the C way in D2, you'll probably find it to be frustrating experience.

That is a really helpful insight. It also means string programming is a bit different in D2 than in D1.

At some point in time, it might be helpful to add a little introduction 'howto program with strings' to the D documentation. After all, it is a major feature of D and departure from C and C++.




March 07, 2009
On Sat, 07 Mar 2009 10:48:13 +0100, Lutger wrote:

> Walter Bright wrote:

>> I tend to agree that if you try to do strings the C way in D2, you'll probably find it to be frustrating experience.

> That is a really helpful insight. It also means string programming is a bit different in D2 than in D1.

Tell me about it! When I converted Bud to D2, it was a nightmare. It took many, many hours of edit-compile cycles to get a clean compile. Then debugging it took ages due to still trying to think in D1 string terms, which gave me lots of weird and wrong strings during run time.

After a lot of trial and error, I finally groked the D2 string concept and got on top of the issue. But I wanted to have the same source support D1 and D2, which leads to a whole new set of horrors.

The lessons I've learned from this exercise include ...
(a) Wait until D2 is stablised.
(b) Use a text macro processor if you want one source to support D1 and D2.
(c) Any project that is not a simple application, might be better
re-written with D2 than converted from D1.
(d) D2 strings are a useful idea. However, one still needs const(char)[]
and char[] types, so useful mnemonics for these is a good idea.

-- 
Derek Parnell
Melbourne, Australia
skype: derek.j.parnell
March 07, 2009
Walter Bright Wrote:

> When we first got into what to do with strings and const/immutable/mutable, I was definitely in the camp that strings should be mutable char[], or at worst const(char)[]. The thing is, Andrei pointed out to me, languages that are considered very good at dealing with strings (like Perl) use immutable strings. The fascinating thing about strings in such languages is:
> 
> "Nobody notices they are immutable, they just work."

That's what we said about strings in 1.0. You modify it, you copy it, or you tell the user. The gentleman's agreement worked perfectly and that came without a mess of keywords, without implicit or explicit restrictions on behaviour, without having to condition templates.

Perl would be more powerful if its strings were mutable, not less, although not by much due to the interpreter.
March 07, 2009
Walter Bright:

This is an interesting topic.
I like immutability, but sometimes I also like mutability.

>languages that are considered very good at dealing with strings (like Perl) use immutable strings. The fascinating thing about strings in such languages is: "Nobody notices they are immutable, they just work."<

Languages that have immutable strings often have:
- "String interning", to improve performance...
- A good garbage collector, to cope with the increased allocation-deallocation traffic.
- Sometimes the garbage collector is able to see that two unrelated strings are equal, and keep only one of them. Experiments have shown this reduces a lot the memory used by many Java programs.
- Strings often keep their hash value stored beside them, so it's computed only once, the first time you actually need the hash value (this also means the hash value is initialized to an unvalid value).

People notice such strings are immutable. Usually it's fine, but once in a while it's a pain.

Note that in Python you usually try to avoid looping on single chars because it's a too much slow thing to do, so you try to use string methods and regular expressions as much as possible. But I like a lower level language because it gives me the *freedom* to read and process the single chars efficiently. (Python is implemented in C, and writing the Python interpreter itself with a language that uses immutable strings only is probably a pain).

Such languages like Python also always offer you an escape, for example in the standard library of Python there is a mutable char array (Python3 is different, it has as built-ins immutable unicode strings + mutable arrays of bytes + maybe an immutable array of bytes):

This is Python 2.5:

>>> from array import array
>>> # mutable array of chars
>>> s = array("c", "Hello")
>>> s
array('c', 'Hello')
>>> s[2] = "X"
>>> s
array('c', 'HeXlo')
>>> # mutable array of unicode chars
>>> t = array("u", u"Hello")
>>> t
array('u', u'Hello')
>>> t[1]= u"Y"
>>> t
array('u', u'HYllo')


Also note that Ruby, that is a very good language for string processing, allows mutable strings. So the situation isn't as clear cut as you think.

To compare such matters across languages you can also take a look at this page: http://merd.sourceforge.net/pixel/language-study/various/mutability-and-sharing/


>After all, it never occurs to anyone to think that the integer 123 could be a "mutable" integer and perhaps be 133 the next time you look at it.<

Because they are small numbers. With the multi-precision GMP library you can mutate numbers in place because this becomes useful when you manage huge numbers.

Note that there are Python bindings for GMP, they manage numbers in an immutable way to respect the Python style, but it's not much efficient, see explanation here in the middle:
http://gmpy.sourceforge.net/


>The way to do strings in D is to have them be immutable. If you are building a string by manipulating its parts, start with mutable, when finished then convert it to immutable and 'publish' it to the rest of the program.<

Seems acceptable.


>you'll find you never need to defensively dup the string "just in case" and things just seem to naturally work out.<

If you put strings in an associative array as keys, you usually want them to be immutable to keep their correct place in the hash and avoid big troubles.
For such purpose Python has mutable and immutable arrays (named list and tuple), where you can only use tuples as dictionary keys.
So built-in associative arrays of D too may appreciate immutable arrays more :-)

Bye,
bearophile
March 07, 2009
Walter Bright:

>The way to do strings in D is to have them be immutable. If you are building a string by manipulating its parts, start with mutable, when finished then convert it to immutable and 'publish' it to the rest of the program.<

Most of the times this seems acceptable.
But if such text is very long (example, 20 MB) and you want to pass it around for various functions to process&modify it, they you may want to keep it mutable (this is a quite uncommon situation, but it's happened to me, during genomic data processing).

Bye,
bearophile
March 07, 2009
On Sat, 07 Mar 2009 07:11:45 -0500, Burton Radons wrote:

> Perl would be more powerful if its strings
> were mutable, not less, although not by much
> due to the interpreter.

I think we have a terminology issue.

We have character arrays (some fixed length, others variable length - doesn't matter). In D's world view, data can be invariant (nothing gets to change it), const (other routines can modify it but this routine will not), or mutable (anything can change it). So in D we have some character arrays that are invariant (eg. Literals), some are const, and some are mutable. It is a pity that D's term "string" is being used in discussions as if it is synonymous with character array - but it is not. It only refers to certain types of character arrays - the invariant ones. We really need some simple terms for const and mutable character arrays.

-- 
Derek Parnell
Melbourne, Australia
skype: derek.j.parnell
March 07, 2009
Walter Bright Wrote:

> When we first got into what to do with strings and const/immutable/mutable, I was definitely in the camp that strings should be mutable char[], or at worst const(char)[]. The thing is, Andrei pointed out to me, languages that are considered very good at dealing with strings (like Perl) use immutable strings. The fascinating thing about strings in such languages is:
> 
> "Nobody notices they are immutable, they just work."
> 
> So what is it about immutability that makes strings "just work" in a natural and intuitive manner? The insight is that it enables strings, which are reference types, to behave exactly as if they were value types.
> 
> After all, it never occurs to anyone to think that the integer 123 could be a "mutable" integer and perhaps be 133 the next time you look at it. If you put 123 into a variable, it stays 123. It's immutable. People intuitively expect strings to behave the same way. Only C programmers expect that once they assign a string to a variable, that string may change in place.
> 
> C has it backwards by making strings mutable, and it's one of the main reasons why dealing with strings in C is such a gigantic pain. But as a longtime C programmer, I was so used to that I didn't notice what a pain it was until I started using other languages where string manipulation was a breeze.
> 
> The way to do strings in D is to have them be immutable. If you are building a string by manipulating its parts, start with mutable, when finished then convert it to immutable and 'publish' it to the rest of the program. Mutable char[] arrays should only exist as temporaries. This is exactly the opposite of the way one does it in C, but if you do it this way, you'll find you never need to defensively dup the string "just in case" and things just seem to naturally work out.

So your suggestion is to do something like:

string manipulate() {
  char[] buf = read_20M_string_data();
  initialization_mangling(buf);
  return (string)buf;
}
string 20M_string = manipulate();

As long as we don't want to mangle the storage later in the program.

March 07, 2009
Derek Parnell wrote:
> On Sat, 07 Mar 2009 07:11:45 -0500, Burton Radons wrote:
> 
>> Perl would be more powerful if its strings
>> were mutable, not less, although not by much
>> due to the interpreter.
> 
> I think we have a terminology issue.
> 
> We have character arrays (some fixed length, others variable length -
> doesn't matter). In D's world view, data can be invariant (nothing gets to
> change it), const (other routines can modify it but this routine will not),
> or mutable (anything can change it). So in D we have some character arrays
> that are invariant (eg. Literals), some are const, and some are mutable. It
> is a pity that D's term "string" is being used in discussions as if it is
> synonymous with character array - but it is not. It only refers to certain
> types of character arrays - the invariant ones. We really need some simple
> terms for const and mutable character arrays.
> 

I don't think char[] is half bad. const(char)[] is a mouthful, but most of the time those are function parameters, where the handy in char[] applies.

Andrei
March 07, 2009
On Sat, 07 Mar 2009 07:09:34 -0800, Andrei Alexandrescu wrote:


> I don't think char[] is half bad. const(char)[] is a mouthful, but most of the time those are function parameters, where the handy in char[] applies.

I was think more as unambiguous words we can use in discussions and not so much as aliases in our code.

-- 
Derek Parnell
Melbourne, Australia
skype: derek.j.parnell