Thread overview
Why are string literals zero-terminated?
January 25, 2005
Why are D string literals '\0' terminated ?

Isn't the implicit length field supposed
to make that termination unnecessary now ?

For instance, if I use:

string2.d:
> char* cstr = "alpha";
> char[] str = "alpha";

Then I get one pointer to the characters:

> __D7string24cstrPa:
> 	.long	LC0

That's alright, just pointing to the literal:

> LC0:
> 	.ascii "alpha\0"

But the D string is also terminated with a \0:

> __D7string23strAa:
> 	.long	5
> 	.long	LC0

Doesn't that just waste a char, now that the
hack in toStringz has been proved dangerous ?

Or is there some internal routine using the
fact that they are indeed zero-terminated ?

AFAIK, it's just the three string arrays in D:
(char[], wchar[], dchar[]) - not other arrays.

--anders
January 25, 2005
Earlier, I wrote:

> Why are D string literals '\0' terminated ?

Never mind, it's just to make the implicit cast
to (char*) possible, for use with C functions...

Otherwise one would have to use toStringz always,
even with string literals. (such as for printf)


Test code:
> static const byte[4] XXXX = [ 'X', 'X', 'X', 'X' ];
> 
> static const char[4] cABC = "abc\n";
> static const byte[4] bABC = [ 'a', 'b', 'c', '\n' ];
> 
> static const byte[4] YYYY = [ 'Y', 'Y', 'Y', 'Y' ];
> 
> void main()
> {
>   char* chello;
>   byte* bhello;
> 
>   chello = cABC;
>   bhello = bABC;
> 
>   printf(chello);
>   printf(cast(char*) bhello);
> }

And as far as I can determine, this goes for *all*
char/wchar/dchar arrays - not just the literals ?
i.e. even if I create the array using new char[#]
(but not for byte[]/short[]/int[], and the others)


But if toStringz() doesn't check the '\0' contract
- and all string arrays are zero-terminated anyway,
then of what use is it ? Just avoiding null params ?

That could be done much simpler, if that's the case:
> char *stringz(char[] str) { return str ? str : ""; }

Or, if null is not a possibility, just "str.ptr"...
(or "cast(char *) str", for DMD before version 0.107)

All assuming that D strings are zero-terminated,
since that seems to be the current case - right ?

--anders
January 25, 2005
Anders F Björklund wrote:

> All assuming that D strings are zero-terminated,
> since that seems to be the current case - right ?

Just rambling, forgot all about the quirks of
the allocator with strings of sizes 16,32, etc.
> (16, 32, 64, 128, 256, 512, 1024, and so on)

Please ignore. (but toStringz still needs fixing)

--anders
January 25, 2005
(this was not true:)
> And as far as I can determine, this goes for *all*
> char/wchar/dchar arrays - not just the literals ?
> i.e. even if I create the array using new char[#]
> (but not for byte[]/short[]/int[], and the others)

And here are the simplified test cases, that
show when a char[] is *not* zero-terminated:

1)
Lengths of 16, 32, 64, 128, 256, 512, 1024, etc.

> void main()
> {
>         char[] x = new char[16];
>         char[] string = new char[16];
>         char[] y = new char[16];
>         for (int i = 0; i < 16; i++)
>         {
>                 x[i] = 'X';
>                 string[i] = 'a' + i;
>                 y[i] = 'Y';
>         }
>         printf("%s\n", cast(char*) string);
> }

2)
Slices, of already existing strings / arrays.

> void main()
> {
> 	char[] hello = "hello";
> 	char[] string = hello[0..3];
> 	printf("%s\n", string.ptr);
> }

There could be more examples of this, as well.

String literals are still terminated with a '\0'.
Which is a good thing, even if sometimes confusing.

--anders