Jump to page: 1 2
Thread overview
What exactly are the String literrals in D and how they work?
Aug 15, 2021
rempas
Aug 15, 2021
jfondren
Aug 15, 2021
jfondren
Aug 15, 2021
jfondren
Aug 15, 2021
rempas
Aug 15, 2021
rikki cattermole
Aug 15, 2021
rempas
Aug 15, 2021
jfondren
Aug 15, 2021
rempas
Aug 15, 2021
Tejas
Aug 15, 2021
rempas
Aug 15, 2021
jfondren
Aug 15, 2021
rempas
Aug 15, 2021
Mike Parker
Aug 15, 2021
rempas
Aug 15, 2021
Ali Çehreli
August 15, 2021

So when I'm doing something like the following: string name = "John";
Then what's the actual type of the literal "John"?
In the chapter Calling C functions in the "Interfacing with C" page, the following is said:

>

Strings are not 0 terminated in D. See "Data Type Compatibility" for more information about this. However, string literals in D are 0 terminated.

Which is really interesting and makes me suppose that "John" is a string literal right?
However, when I'm writing something like the following: char *name = "John";,
then D will complain with the following message:

>

Error: cannot implicitly convert expression "John" of type string to char*

Which is interesting because this works in C. If I use const char* instead, it will work. I suppose that this has to do with the fact that string is an alias for immutable(char[]) but still this has to mean that the actual type of a LITERAL string is of type string (aka immutable(char[])).

Another thing I can do is cast the literal to a char* but I'm wondering what's going on under the hood in this case. Is casting executed at compile time or at runtime? So am I going to have an extra runtime cost having to first construct a string and then ALSO cast it to a string literal?

I hope all that makes sense and the someone can answer, lol

August 15, 2021

On Sunday, 15 August 2021 at 06:10:53 UTC, rempas wrote:

>

So when I'm doing something like the following: string name = "John";
Then what's the actual type of the literal "John"?

unittest {
    pragma(msg, typeof("John"));  // string
    pragma(msg, is(typeof("John") == immutable(char)[]));  // true
}
>

In the chapter Calling C functions in the "Interfacing with C" page, the following is said:

>

Strings are not 0 terminated in D. See "Data Type Compatibility" for more information about this. However, string literals in D are 0 terminated.

void zerort(string s) {
    assert(s.ptr[s.length] == '\0');
}

unittest {
    zerort("John"); // assertion success
    string s = "Jo";
    s ~= "hn";
    zerort(s); // assertion failure
}

If a function takes a string as a runtime parameter, it might not be NUL terminated. This might be more obvious with substrings:

unittest {
    string j = "John";
    string s = j[0..2];
    assert(s == "Jo");
    assert(s.ptr == j.ptr);
    assert(s.ptr[s.length] == 'h'); // it's h-terminated
}
>

Which is really interesting and makes me suppose that "John" is a string literal right?
However, when I'm writing something like the following: char *name = "John";,
then D will complain with the following message:

>

Error: cannot implicitly convert expression "John" of type string to char*

Which is interesting because this works in C.

Well, kinda:

void mutate(char *s) {
    s[0] = 'X';
}

int main() {
    char *s = "John";
    mutate(s); // segmentation fault
}

char* is just the wrong type, it suggests mutability where mutability ain't.

>

If I use const char* instead, it will work. I suppose that this has to do with the fact that string is an alias for immutable(char[]) but still this has to mean that the actual type of a LITERAL string is of type string (aka immutable(char[])).

Another thing I can do is cast the literal to a char* but I'm wondering what's going on under the hood in this case.

The same thing as in C:

void mutate(char *s) {
    s[0] = 'X';
}

void main() {
    char* s = cast(char*) "John";
    mutate(s); // program killed by signal 11
}
>

Is casting executed at compile time or at runtime?

Compile-time. std.conv.to is what you'd use at runtime. Here though, what you want is dup to get a char[], which you can then take the pointer of if you want:

unittest {
    char* s = "John".dup.ptr;
    s[0] = 'X'; // no segfaults
    assert(s[0..4] == "Xohn"); // ok
}
>

So am I going to have an extra runtime cost having to first construct a string and then ALSO cast it to a string literal?

I hope all that makes sense and the someone can answer, lol

August 15, 2021

On Sunday, 15 August 2021 at 07:43:59 UTC, jfondren wrote:

>

On Sunday, 15 August 2021 at 06:10:53 UTC, rempas wrote:

unittest {
    char* s = "John".dup.ptr;
    s[0] = 'X'; // no segfaults
    assert(s[0..4] == "Xohn"); // ok
}
>

So am I going to have an extra runtime cost having to first construct a string and then ALSO cast it to a string literal?

In the above case, "John" is a string that's compiled into the resulting executable and loaded into read-only memory, and this code is reached that string is duplicated, at runtime, to create a copy in writable memory.

August 15, 2021

On Sunday, 15 August 2021 at 07:43:59 UTC, jfondren wrote:

>
unittest {
    pragma(msg, typeof("John"));  // string
    pragma(msg, is(typeof("John") == immutable(char)[]));  // true
}

Still don't know what "pragma" does but thank you.

>
void zerort(string s) {
    assert(s.ptr[s.length] == '\0');
}

unittest {
    zerort("John"); // assertion success
    string s = "Jo";
    s ~= "hn";
    zerort(s); // assertion failure
}

If a function takes a string as a runtime parameter, it might not be NUL terminated. This might be more obvious with substrings:

unittest {
    string j = "John";
    string s = j[0..2];
    assert(s == "Jo");
    assert(s.ptr == j.ptr);
    assert(s.ptr[s.length] == 'h'); // it's h-terminated
}

That's interesting!

>
void mutate(char *s) {
    s[0] = 'X';
}

int main() {
    char *s = "John";
    mutate(s); // segmentation fault
}

char* is just the wrong type, it suggests mutability where mutability ain't.

I mean that in C, we can assign a string literal into a char* and also a const char* type without getting a compilation error while in D, we can only assign it to a const char* type. I suppose that's because of C doing explicit conversion. I didn't talked about mutating a string literal

>

Compile-time. std.conv.to is what you'd use at runtime. Here though, what you want is dup to get a char[], which you can then take the pointer of if you want:

unittest {
    char* s = "John".dup.ptr;
    s[0] = 'X'; // no segfaults
    assert(s[0..4] == "Xohn"); // ok
}

Well, that one didn't worked out really well for me. Using .dup.ptr, didn't added a null terminated character while cast(char*) did. So I suppose the first way is more better when you want a C-like char* and not a D-like char[].

August 15, 2021
On 15/08/2021 8:11 PM, rempas wrote:
> Still don't know what "pragma" does but thank you.

pragma is a set of commands to the compiler that may be compiler specific.

In the case of the msg command, it tells the compiler to output a message to stdout during compilation.
August 15, 2021
On Sunday, 15 August 2021 at 08:17:47 UTC, rikki cattermole wrote:
>
> pragma is a set of commands to the compiler that may be compiler specific.
>
> In the case of the msg command, it tells the compiler to output a message to stdout during compilation.

Thanks man!
August 15, 2021

On Sunday, 15 August 2021 at 07:47:27 UTC, jfondren wrote:

>

On Sunday, 15 August 2021 at 07:43:59 UTC, jfondren wrote:

>

On Sunday, 15 August 2021 at 06:10:53 UTC, rempas wrote:

unittest {
    char* s = "John".dup.ptr;
    s[0] = 'X'; // no segfaults
    assert(s[0..4] == "Xohn"); // ok
}
>

So am I going to have an extra runtime cost having to first construct a string and then ALSO cast it to a string literal?

In the above case, "John" is a string that's compiled into the resulting executable and loaded into read-only memory, and this code is reached that string is duplicated, at runtime, to create a copy in writable memory.

Probably a more useful way to think about this is to consider what happens in a loop:

void static_lifetime() @nogc {
    foreach (i; 0 .. 100) {
        string s = "John";
        // some code
    }
}

^^ At runtime a slice is created on the stack 100 times, with a pointer to the 'J' of the literal, a length of 4, etc. The cost of this doesn't change with the length of the literal, and the bytes of the literal aren't copied, so this code would be just as fast if the string were megabytes in length.

void dynamically_allocated() { // no @nogc
    foreach (i; 0 .. 100) {
        char[] s = "John".dup;
        // some code
    }
}

^^ Here, the literal is copied into freshly GC-allocated memory a hundred times, and a slice is made from that.

And for completeness:

void stack_allocated() @nogc {
    foreach (i; 0 .. 100) {
        char[4] raw = "John";
        char[] s = raw[0..$];
        // some code
    }
}

^^ Here, a static array is constructed on the stack a hundred times, and the literal is copied into the array, and then a slice is constructed on the stack with a pointer into the array on the stack, a length of 4, etc. This doesn't use the GC but the stack is limited in size and now you have worry about the slice getting copied elsewhere and outliving the data on the stack:

char[] stack_allocated() @nogc {
    char[] ret;
    foreach (i; 0 .. 100) {
        char[4] raw = "John";
        char[] s = raw[0 .. $];
        ret = s;
    }
    return ret; // errors with -preview=dip1000
}

void main() {
    import std.stdio : writeln;

    char[] s = stack_allocated();
    writeln(s); // prints garbage
}
August 15, 2021

On Sunday, 15 August 2021 at 08:11:39 UTC, rempas wrote:

>

On Sunday, 15 August 2021 at 07:43:59 UTC, jfondren wrote:

>
unittest {
    char* s = "John".dup.ptr;
    s[0] = 'X'; // no segfaults
    assert(s[0..4] == "Xohn"); // ok
}

Well, that one didn't worked out really well for me. Using .dup.ptr, didn't added a null terminated character

dup() isn't aware of the NUL since that's outside the slice of the string. It only copies the chars in "John". You can use toStringz to ensure NUL termination:
https://dlang.org/phobos/std_string.html#.toStringz

August 15, 2021

On Sunday, 15 August 2021 at 08:47:39 UTC, jfondren wrote:

>

dup() isn't aware of the NUL since that's outside the slice of the string. It only copies the chars in "John". You can use toStringz to ensure NUL termination:
https://dlang.org/phobos/std_string.html#.toStringz

Is there something bad than just casting it to char* that I should be aware of?

August 15, 2021

On Sunday, 15 August 2021 at 08:51:19 UTC, rempas wrote:

>

On Sunday, 15 August 2021 at 08:47:39 UTC, jfondren wrote:

>

dup() isn't aware of the NUL since that's outside the slice of the string. It only copies the chars in "John". You can use toStringz to ensure NUL termination:
https://dlang.org/phobos/std_string.html#.toStringz

Is there something bad than just casting it to char* that I should be aware of?

External C libraries expect strings to be null terminated, so if you do use .dup, use .toStringz as well.

« First   ‹ Prev
1 2