August 15, 2021

On Sunday, 15 August 2021 at 08:53:50 UTC, Tejas wrote:

>

External C libraries expect strings to be null terminated, so if you do use .dup, use .toStringz as well.

Yeah, yeah I got that. My question is, if I should avoid cast(char*) and use .toStringz while both do the exact same thing?

August 15, 2021

On Sunday, 15 August 2021 at 08:56:07 UTC, rempas wrote:

>

On Sunday, 15 August 2021 at 08:53:50 UTC, Tejas wrote:

>

External C libraries expect strings to be null terminated, so if you do use .dup, use .toStringz as well.

Yeah, yeah I got that. My question is, if I should avoid cast(char*) and use .toStringz while both do the exact same thing?

They don't do the same thing. toStringz always copies, always GC-allocates, and always NUL-terminates. cast(char*) only does what you want in the case that you're applying it a string literal. But in that case you shouldn't cast, you should just

const char* s = "John";

If you need cast cast the const away to work with a C API, doing that separately, at the point of the call to the C function, makes it clearer what you're doing and what the risks are there (does the C function modify the string? If so this will segfault).

August 15, 2021

On Sunday, 15 August 2021 at 08:11:39 UTC, rempas wrote:

>

I mean that in C, we can assign a string literal into a char* and also a const char* type without getting a compilation error while in D, we can only assign it to a const char* type. I suppose that's because of C doing explicit conversion. I didn't talked about mutating a string literal

The D string is an alias for immutable(char)[], immutable contents of a mutable array reference (immutable(char[]) would mean the array reference is also immutable). You don't want to assign that to a char*, because then you'd be able to mutate the contents of the string, thereby violating the contract of immutable. (immutable means the data to which it's applied, in this case the contents of an array, will not be mutated through any reference anywhere in the program.)

Assigning it to const(char)* is fine, because const means the data can't be mutated through that particular reference (pointer in this case). And because strings in C are quite frequently represented as const(char)*, especially in function parameter lists, D string literals are explicitly convertible to const(char)* and also NUL-terminated. So you can do something like puts("Something") without worry.

This blog post may be helpful:

https://dlang.org/blog/2021/05/24/interfacing-d-with-c-strings-part-one/

August 15, 2021

On Sunday, 15 August 2021 at 09:01:17 UTC, jfondren wrote:

>

They don't do the same thing. toStringz always copies, always GC-allocates, and always NUL-terminates. cast(char*) only does what you want in the case that you're applying it a string literal. But in that case you shouldn't cast, you should just

const char* s = "John";

If you need cast cast the const away to work with a C API, doing that separately, at the point of the call to the C function, makes it clearer what you're doing and what the risks are there (does the C function modify the string? If so this will segfault).

Yeah I won't cast when having a const char*. I already mentioned that it works without cast with const variables ;)

August 15, 2021

On Sunday, 15 August 2021 at 09:06:14 UTC, Mike Parker wrote:

>

The D string is an alias for immutable(char)[], immutable contents of a mutable array reference (immutable(char[]) would mean the array reference is also immutable). You don't want to assign that to a char*, because then you'd be able to mutate the contents of the string, thereby violating the contract of immutable. (immutable means the data to which it's applied, in this case the contents of an array, will not be mutated through any reference anywhere in the program.)

[...]

Thanks a lot for the info!

August 15, 2021
Lot's of great information and pointers already. I will try from another angle. :)

On 8/14/21 11:10 PM, rempas wrote:

> So when I'm doing something like the following: `string name = "John";`
> Then what's the actual type of the literal `"John"`?

As you say and as the code shows, there are two constructs in that line. The right-hand side is a string literal. The left-hand side is a 'string'.

>> Strings are not 0 terminated in D. See "Data Type Compatibility" for
>> more information about this. However, string literals in D are 0
>> terminated.

The string literal is embedded into the compiled program as 5 bytes in this case: 'J', 'o', 'h', 'n', '\0'. That's the right-hand side of your code above.

'string' is an array in D and arrays are stored as the following pair:

  size_t length;    // The number of elements
  T * ptr;          // The pointer to the first element

(This is called a "fat pointer".)

So, if we assume that the literal 'John' was placed at memory location 0x1000, then the left-hand side of your code will satisfy the following conditions:

  assert(name.length == 4);    // <-- NOT 5
  assert(name.ptr == 0x1000);

The important part to note is how even though the string literal was stored as 5 bytes but the string's length is 4.

As others said, when we add a character to a string, there is no '\0' involved. Only the newly added char will the added.

Functions in D do not need the '\0' sentinel to know where the string ends. The end is already known from the 'length' property.

Ali

August 15, 2021

On 8/15/21 2:10 AM, rempas wrote:

>

So when I'm doing something like the following: string name = "John";
Then what's the actual type of the literal "John"?
In the chapter Calling C functions in the "Interfacing with C" page, the following is said:

>

Strings are not 0 terminated in D. See "Data Type Compatibility" for more information about this. However, string literals in D are 0 terminated.

Which is really interesting and makes me suppose that "John" is a string literal right?
However, when I'm writing something like the following: char *name = "John";,
then D will complain with the following message:

>

Error: cannot implicitly convert expression "John" of type string to char*

Which is interesting because this works in C. If I use const char* instead, it will work. I suppose that this has to do with the fact that string is an alias for immutable(char[]) but still this has to mean that the actual type of a LITERAL string is of type string (aka immutable(char[])).

Another thing I can do is cast the literal to a char* but I'm wondering what's going on under the hood in this case. Is casting executed at compile time or at runtime? So am I going to have an extra runtime cost having to first construct a string and then ALSO cast it to a string literal?

I hope all that makes sense and the someone can answer, lol

Lots of great responses in this thread!

I wanted to stress that a string literal is sort of magic. It has extra type information inside the compiler that is not available in the normal type system. Namely that "this is a literal, and so can morph into other things".

To give you some examples:

string s = "John";
immutable(char)* cs = s; // nope
immutable(char)* cs2 = "John"; // OK!
wstring ws = s; // nope
wstring ws2 = "John"; // OK!

What is going on? Because the compiler knows this is a string literal, it can modify the type (and possibly the data itself) at will to match what you are assigning it to. In the case of zero-terminated C strings, it allows usage as a pointer instead of a D array. In the case of different width strings (wstring uses 16-bit code-units), it can actually transform the underlying data to what you wanted.

Note that even when you do lose that "literal" magic by assigning to a variable, you can still rely on D always putting a terminating zero in the data segment for a string literal. So it's valid to just do:

string s = "John";
printf(s.ptr);

As long as you know the string came from a literal.

-Steve

1 2
Next ›   Last »