Thread overview
Why are string literals zero-terminated?
Jul 20, 2010
awishformore
Jul 20, 2010
awishformore
July 20, 2010
Following this discussion on announce, I was wondering why string literals are zero-terminated. Or to re-formulate, why only string literals are zero-terminated. Why that inconsistency? What's the rationale behind it? Does anyone know?

/Max

>>>> Did you test with a string that was not in the code itself, e.g. from a
>>>> config file?
>>>> String literals are null terminated so you wouldn't have had an issue if
>>>> all your strings were literals.
>>>> Utf8 doesn't contain the string length, so you will run in to problems
>>>> eventually.
>>>>
>>>> You have to use toStringz or your own null terminator. Unless of course
>>>> you know that the function will always be
>>>> taking string literals. But even then leaving something like that up to
>>>> the programmer to remember is not exactly
>>>> fool proof.
>>>>
>>>> Enjoy.
>>>> ~Rory
>>>
>>> Hey again and thanks for the hint. I tried finding something on the DM
>>> page about string literals being null terminated and while the section
>>> about string literals didn't even mention it, it was said some place
>>> else.
>>>
>>> That explains why using string literals works even though I expected
>>> it to fail. It's indeed good to know and adding std.string.toStringz
>>> is probably a good idea ;). Thanks.
>>>
>>> Greetings, Max.
>>
>> sure, I must admit it is annoying when the same code can do different
>> things just because of where the data came
>> from. It would be easier to notice the bug if d never added a null on
>> literals, but then there would also be a lot more
>> usages of toStringz.
>>
>> I think if you want to test it you can do:
>> auto s = "blah";
>> open(s[0..$].dup.ptr); // duplicating it should put it somewhere else
>> // just slicing will not test
>
> When thinking about it, it makes sense to have string literals null terminated in order to have C functions work with them. However, I wonder about some stuff, for instance:
>
> string s = "string";
> // is s == "string\0" now?
> char[] c = cast(char[])s;
> // is c[6] == '\0' now?
> char* p = s.ptr;
> // is *(p+6) == '\0' now?
>
> I think use of the zero terminator should be consistent. Either make every string (and char[] for that matter) zero terminated in the underlying memory for backwards compatibility with C or leave it to the user in all cases.
>
> /Max

perhaps the NULL is there because its there in the executable file?
NULL is also often after a dynamic array simply because of d always initializing memory, and
when you get an allocation often a larger amount is allocated which remains NULL.
July 20, 2010
On Tue, 20 Jul 2010 14:59:18 +0200, awishformore wrote:

> Following this discussion on announce, I was wondering why string literals are zero-terminated. Or to re-formulate, why only string literals are zero-terminated. Why that inconsistency? What's the rationale behind it? Does anyone know?

So you can pass them to C functions.

-Lars
July 20, 2010
On Tue, 20 Jul 2010 13:26:56 +0000, Lars T. Kyllingstad wrote:

> On Tue, 20 Jul 2010 14:59:18 +0200, awishformore wrote:
> 
>> Following this discussion on announce, I was wondering why string literals are zero-terminated. Or to re-formulate, why only string literals are zero-terminated. Why that inconsistency? What's the rationale behind it? Does anyone know?
> 
> So you can pass them to C functions.

Note that even though string literals are zero terminated, the actual string (the array, that is) doesn't contain the zero character.  It's located at the memory position immediately following the string.

  string s = "hello";
  assert (s[$-1] != '\0');  // Last character of s is 'o', not '\0'
  assert (s.ptr[s.length] == '\0');

Why is it only so for literals?  That is because the compiler can only guarantee the zero-termination of string literals.  The memory following a string in general could contain anything.

  string s = getStringFromSomewhere();
  // I have no idea where s is coming from, so I don't
  // know whether it is zero-terminated or not.  Better
  // make sure.
  someCFunction(toStringz(s));

-Lars
July 20, 2010
Am 20.07.2010 15:38, schrieb Lars T. Kyllingstad:
> On Tue, 20 Jul 2010 13:26:56 +0000, Lars T. Kyllingstad wrote:
>
>> On Tue, 20 Jul 2010 14:59:18 +0200, awishformore wrote:
>>
>>> Following this discussion on announce, I was wondering why string
>>> literals are zero-terminated. Or to re-formulate, why only string
>>> literals are zero-terminated. Why that inconsistency? What's the
>>> rationale behind it? Does anyone know?
>>
>> So you can pass them to C functions.
>
> Note that even though string literals are zero terminated, the actual
> string (the array, that is) doesn't contain the zero character.  It's
> located at the memory position immediately following the string.
>
>    string s = "hello";
>    assert (s[$-1] != '\0');  // Last character of s is 'o', not '\0'
>    assert (s.ptr[s.length] == '\0');
>
> Why is it only so for literals?  That is because the compiler can only
> guarantee the zero-termination of string literals.  The memory following
> a string in general could contain anything.
>
>    string s = getStringFromSomewhere();
>    // I have no idea where s is coming from, so I don't
>    // know whether it is zero-terminated or not.  Better
>    // make sure.
>    someCFunction(toStringz(s));
>
> -Lars

Hey.

Yes, that indeed makes a lot of sense.

I didn't actually try those asserts because I'm currently not on a dev machine, but what you point out basically is the behaviour I was hoping for.

Thanks for clearing this up.

/Max