Thread overview
String literals
Aug 04, 2005
Kalle A. Sandstrom
Aug 04, 2005
Derek Parnell
August 04, 2005
While thinking about D's differing use of the ``const'' keyword (which is apparently more similar to C# than C or C++) I came to write a small test program, which I've included below.

import std.stdio;

char[] get_string() {
	return "testing, testing";
}

int main(char[][] args)
{
	char[] z = get_string();
	writef("first: '%s'\n", z);
	z[3] = 'X';
	writef("second: '%s'\n", z);
	z = get_string();
	writef("third: '%s'\n", z);
	return 0;
}

Compiled with GDC 0.15 built on GCC 3.4.4, this produces code that crashes after the first call to writef. The apparent reason is that string literals are included in the text segment of ELF binaries and are thus read-only. In C and C++ (AFAIK) this is made explicit to the programmer by string literals being of type 'const char *', causing rather significant warnings to be printed when compilation of code like this is attempted.

However, being as D doesn't have a C-like concept of constness, this doesn't so much as pop a warning. (Personally, I'd have expected some sort of a clever copy-on-write semantic to be applied in the subscript assignment; this would have been appropriately D-ish.) This behaviour leads to the interesting (in the Chinese proverb sense) situation where there are char arrays that can be modified and char arrays which cannot; furthermore there is no way[1] to distinguish between the two!

I'm pretty sure that this qualifies as a language design bug, or alternatively a compiler implementation bug if a COW semantic was defined. In the former case, would it be too much to consider the addition of a Java/C#-ish "string" type as part of the language?



[1] besides looking at the address of the first element of such an array
    and trying to figure out whether it falls in the text segment or not.
    This would be non-portable to say the least.

-- 
Kalle A. Sandstro"m                                        ksandstr@iki.fi DB9D 0C39:              F4FF 4535 B501 4C79 B1DF  03F6 27D1 BF12 DB9D 0C39 void *truth = &truth;                              http://iki.fi/ksandstr/
August 04, 2005
On Thu, 4 Aug 2005 23:28:20 +0300, Kalle A. Sandstrom wrote:

> While thinking about D's differing use of the ``const'' keyword (which is apparently more similar to C# than C or C++) I came to write a small test program, which I've included below.
> 
> import std.stdio;
> 
> char[] get_string() {
> 	return "testing, testing";
> }
> 
> int main(char[][] args)
> {
> 	char[] z = get_string();
> 	writef("first: '%s'\n", z);
> 	z[3] = 'X';
> 	writef("second: '%s'\n", z);
> 	z = get_string();
> 	writef("third: '%s'\n", z);
> 	return 0;
> }
> 
[snip]

> 
> [1] besides looking at the address of the first element of such an array
>     and trying to figure out whether it falls in the text segment or not.
>     This would be non-portable to say the least.

Yes. String literals are protected in Linux and unprotected in Windows. The same code above does not crash in Windows.

-- 
Derek Parnell
Melbourne, Australia
5/08/2005 7:18:43 AM
August 05, 2005
Kalle A. Sandstrom wrote:

> int main(char[][] args)
> {
> 	char[] z = get_string();
> 	writef("first: '%s'\n", z);
> 	z[3] = 'X';

Since you don't "own" z here, you are supposed to .dup it first... (CoW)

> 	writef("second: '%s'\n", z);
> 	z = get_string();
> 	writef("third: '%s'\n", z);
> 	return 0;
> }

[...]

> I'm pretty sure that this qualifies as a language design bug, or
> alternatively a compiler implementation bug if a COW semantic was
> defined. In the former case, would it be too much to consider the
> addition of a Java/C#-ish "string" type as part of the language?

It's a language design "bug" if you want to call it that, and a source
of much debate regarding adding such a "readonly" attribute back to D...

Walter has said that he doesn't want a string type, preferring char[].
(or wchar[] or dchar[], but that's another discussion - regarding UTF)

Meanwhile, as you said, there is no way to distinguish between the two.
--anders