View mode: basic / threaded / horizontal-split · Log in · Help
August 04, 2005
String literals
While thinking about D's differing use of the ``const'' keyword (which
is apparently more similar to C# than C or C++) I came to write a small
test program, which I've included below.

import std.stdio;

char[] get_string() {
	return "testing, testing";
}

int main(char[][] args)
{
	char[] z = get_string();
	writef("first: '%s'\n", z);
	z[3] = 'X';
	writef("second: '%s'\n", z);
	z = get_string();
	writef("third: '%s'\n", z);
	return 0;
}

Compiled with GDC 0.15 built on GCC 3.4.4, this produces code that
crashes after the first call to writef. The apparent reason is that
string literals are included in the text segment of ELF binaries and are
thus read-only. In C and C++ (AFAIK) this is made explicit to the
programmer by string literals being of type 'const char *', causing
rather significant warnings to be printed when compilation of code like
this is attempted.

However, being as D doesn't have a C-like concept of constness, this
doesn't so much as pop a warning. (Personally, I'd have expected some
sort of a clever copy-on-write semantic to be applied in the subscript
assignment; this would have been appropriately D-ish.) This behaviour
leads to the interesting (in the Chinese proverb sense) situation where
there are char arrays that can be modified and char arrays which cannot;
furthermore there is no way[1] to distinguish between the two!

I'm pretty sure that this qualifies as a language design bug, or
alternatively a compiler implementation bug if a COW semantic was
defined. In the former case, would it be too much to consider the
addition of a Java/C#-ish "string" type as part of the language?



[1] besides looking at the address of the first element of such an array
   and trying to figure out whether it falls in the text segment or not.
   This would be non-portable to say the least.

-- 
Kalle A. Sandstro"m                                        ksandstr@iki.fi
DB9D 0C39:              F4FF 4535 B501 4C79 B1DF  03F6 27D1 BF12 DB9D 0C39
void *truth = &truth;                              http://iki.fi/ksandstr/
August 04, 2005
Re: String literals
On Thu, 4 Aug 2005 23:28:20 +0300, Kalle A. Sandstrom wrote:

> While thinking about D's differing use of the ``const'' keyword (which
> is apparently more similar to C# than C or C++) I came to write a small
> test program, which I've included below.
> 
> import std.stdio;
> 
> char[] get_string() {
> 	return "testing, testing";
> }
> 
> int main(char[][] args)
> {
> 	char[] z = get_string();
> 	writef("first: '%s'\n", z);
> 	z[3] = 'X';
> 	writef("second: '%s'\n", z);
> 	z = get_string();
> 	writef("third: '%s'\n", z);
> 	return 0;
> }
> 
[snip]

> 
> [1] besides looking at the address of the first element of such an array
>     and trying to figure out whether it falls in the text segment or not.
>     This would be non-portable to say the least.

Yes. String literals are protected in Linux and unprotected in Windows. The
same code above does not crash in Windows.

-- 
Derek Parnell
Melbourne, Australia
5/08/2005 7:18:43 AM
August 05, 2005
Re: String literals
Kalle A. Sandstrom wrote:

> int main(char[][] args)
> {
> 	char[] z = get_string();
> 	writef("first: '%s'\n", z);
> 	z[3] = 'X';

Since you don't "own" z here, you are supposed to .dup it first... (CoW)

> 	writef("second: '%s'\n", z);
> 	z = get_string();
> 	writef("third: '%s'\n", z);
> 	return 0;
> }

[...]

> I'm pretty sure that this qualifies as a language design bug, or
> alternatively a compiler implementation bug if a COW semantic was
> defined. In the former case, would it be too much to consider the
> addition of a Java/C#-ish "string" type as part of the language?

It's a language design "bug" if you want to call it that, and a source
of much debate regarding adding such a "readonly" attribute back to D...

Walter has said that he doesn't want a string type, preferring char[].
(or wchar[] or dchar[], but that's another discussion - regarding UTF)

Meanwhile, as you said, there is no way to distinguish between the two.
--anders
Top | Discussion index | About this forum | D home