Thread overview | ||||||
---|---|---|---|---|---|---|
|
January 25, 2005 Why are string literals zero-terminated? | ||||
---|---|---|---|---|
| ||||
Why are D string literals '\0' terminated ? Isn't the implicit length field supposed to make that termination unnecessary now ? For instance, if I use: string2.d: > char* cstr = "alpha"; > char[] str = "alpha"; Then I get one pointer to the characters: > __D7string24cstrPa: > .long LC0 That's alright, just pointing to the literal: > LC0: > .ascii "alpha\0" But the D string is also terminated with a \0: > __D7string23strAa: > .long 5 > .long LC0 Doesn't that just waste a char, now that the hack in toStringz has been proved dangerous ? Or is there some internal routine using the fact that they are indeed zero-terminated ? AFAIK, it's just the three string arrays in D: (char[], wchar[], dchar[]) - not other arrays. --anders |
January 25, 2005 Re: Why are string literals zero-terminated? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Anders F Björklund | Earlier, I wrote: > Why are D string literals '\0' terminated ? Never mind, it's just to make the implicit cast to (char*) possible, for use with C functions... Otherwise one would have to use toStringz always, even with string literals. (such as for printf) Test code: > static const byte[4] XXXX = [ 'X', 'X', 'X', 'X' ]; > > static const char[4] cABC = "abc\n"; > static const byte[4] bABC = [ 'a', 'b', 'c', '\n' ]; > > static const byte[4] YYYY = [ 'Y', 'Y', 'Y', 'Y' ]; > > void main() > { > char* chello; > byte* bhello; > > chello = cABC; > bhello = bABC; > > printf(chello); > printf(cast(char*) bhello); > } And as far as I can determine, this goes for *all* char/wchar/dchar arrays - not just the literals ? i.e. even if I create the array using new char[#] (but not for byte[]/short[]/int[], and the others) But if toStringz() doesn't check the '\0' contract - and all string arrays are zero-terminated anyway, then of what use is it ? Just avoiding null params ? That could be done much simpler, if that's the case: > char *stringz(char[] str) { return str ? str : ""; } Or, if null is not a possibility, just "str.ptr"... (or "cast(char *) str", for DMD before version 0.107) All assuming that D strings are zero-terminated, since that seems to be the current case - right ? --anders |
January 25, 2005 Re: Why are string literals zero-terminated? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Anders F Björklund | Anders F Björklund wrote: > All assuming that D strings are zero-terminated, > since that seems to be the current case - right ? Just rambling, forgot all about the quirks of the allocator with strings of sizes 16,32, etc. > (16, 32, 64, 128, 256, 512, 1024, and so on) Please ignore. (but toStringz still needs fixing) --anders |
January 25, 2005 Re: Why are string literals zero-terminated? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Anders F Björklund | (this was not true:) > And as far as I can determine, this goes for *all* > char/wchar/dchar arrays - not just the literals ? > i.e. even if I create the array using new char[#] > (but not for byte[]/short[]/int[], and the others) And here are the simplified test cases, that show when a char[] is *not* zero-terminated: 1) Lengths of 16, 32, 64, 128, 256, 512, 1024, etc. > void main() > { > char[] x = new char[16]; > char[] string = new char[16]; > char[] y = new char[16]; > for (int i = 0; i < 16; i++) > { > x[i] = 'X'; > string[i] = 'a' + i; > y[i] = 'Y'; > } > printf("%s\n", cast(char*) string); > } 2) Slices, of already existing strings / arrays. > void main() > { > char[] hello = "hello"; > char[] string = hello[0..3]; > printf("%s\n", string.ptr); > } There could be more examples of this, as well. String literals are still terminated with a '\0'. Which is a good thing, even if sometimes confusing. --anders |
Copyright © 1999-2021 by the D Language Foundation