Thread overview
How to convert "string" to const(wchar)* ?
Jan 29, 2020
Marcone
Jan 29, 2020
Ferhat Kurtulmuş
Jan 29, 2020
Jonathan M Davis
Jan 29, 2020
Ferhat Kurtulmuş
Jan 29, 2020
Jonathan M Davis
January 29, 2020
How to convert "string" to const(wchar)* ?
The code bellow is making confuse strange characters.

cast(wchar*) str
January 29, 2020
On Wednesday, 29 January 2020 at 05:17:03 UTC, Marcone wrote:
> How to convert "string" to const(wchar)* ?
> The code bellow is making confuse strange characters.
>
> cast(wchar*) str

this seems working:

string s = "test ğüişçöıı";
wstring wstr = s.to!wstring;

const(wchar)* str = wstr.ptr;
writeln(str[0..wstr.length]); // do not try to print pointer const(wchar)*
                              // use slicing
January 28, 2020
On Tuesday, January 28, 2020 10:17:03 PM MST Marcone via Digitalmars-d-learn wrote:
> How to convert "string" to const(wchar)* ?
> The code bellow is making confuse strange characters.
>
> cast(wchar*) str

Of course it is. string is immutable(char)[], and the characters are in UTF-8. immutable(wchar)[] would would be UTF-16. Even casting between those two types would result in nonsense, because UTF-8 and UTF-16 are different encodings. Casting between array or pointer types basically causes one type to be interpreted as the other. It doesn't convert the underlying data in any fashion. Also, strings aren't null-terminated in D, so having a pointer to a random string could result in a buffer overflow when you try to iterate through the string via pointer as is typical in C code. D code just uses the length property of the string.

I assume that with const(wchar)*, you want it to be a null-terminated string
of const(wchar). For that, what you basically need is a const(wchar)[] with
a null terminator, and then you need to get a pointer to its first
character. So, if you were to do that yourself, you'd end up with something
like

wstring wstr = to!wstring(str) ~ '\0';
const(wchar)* cwstr = wstr.ptr;

or more likely

auto = to!wstring(str) ~ '\0';
auto cwstr = wstr.ptr;

The function in the standard library for simplifying that is toUTF16z:

https://dlang.org/phobos/std_utf.html#toUTF16z

Then you can just do

auto cwstr = str.toUTF16z();

However, if you're doing this to pass a null-terminated string of UTF-16 characters to a C program (e.g. to the Windows API), be aware that if that function stores that pointer anywhere, you will need to also store it in your D code, because toUTF16z allocates a dynamic array to hold the string that you're getting a pointer to, and if a C function holds on to that pointer, the D GC won't see that it's doing that. And if the D GC doesn't see any references to that array anywhere, it will likely collect that memory. As long as you're passing it to a C function that just operates on the memory and returns, it's not a problem, but it can definitely be a problem if the C function stores that pointer even after the function has returned. Keeping a pointer to that memory in your D code fixes that problem, because then the D GC can see that that memory is still referenced and thus should not be collected.

- Jonathan M Davis



January 29, 2020
On Wednesday, 29 January 2020 at 06:53:15 UTC, Jonathan M Davis wrote:
> On Tuesday, January 28, 2020 10:17:03 PM MST Marcone via Digitalmars-d-learn wrote:
>> [...]
>
> Of course it is. string is immutable(char)[], and the characters are in UTF-8. immutable(wchar)[] would would be UTF-16. Even casting between those two types would result in nonsense, because UTF-8 and UTF-16 are different encodings. Casting between array or pointer types basically causes one type to be interpreted as the other. It doesn't convert the underlying data in any fashion. Also, strings aren't null-terminated in D, so having a pointer to a random string could result in a buffer overflow when you try to iterate through the string via pointer as is typical in C code. D code just uses the length property of the string.
>
> [...]

+ Just a reminder that string literals are null-terminated.
January 29, 2020
On Wednesday, January 29, 2020 12:16:29 AM MST Ferhat Kurtulmuş via Digitalmars-d-learn wrote:
> On Wednesday, 29 January 2020 at 06:53:15 UTC, Jonathan M Davis
>
> wrote:
> > On Tuesday, January 28, 2020 10:17:03 PM MST Marcone via
> >
> > Digitalmars-d-learn wrote:
> >> [...]
> >
> > Of course it is. string is immutable(char)[], and the characters are in UTF-8. immutable(wchar)[] would would be UTF-16. Even casting between those two types would result in nonsense, because UTF-8 and UTF-16 are different encodings. Casting between array or pointer types basically causes one type to be interpreted as the other. It doesn't convert the underlying data in any fashion. Also, strings aren't null-terminated in D, so having a pointer to a random string could result in a buffer overflow when you try to iterate through the string via pointer as is typical in C code. D code just uses the length property of the string.
> >
> > [...]
>
> + Just a reminder that string literals are null-terminated.

Yes, but unless you're using them directly, it doesn't really matter. Their null character is one past their end and thus is not actually part of the string itself as far as the type system is concerned. So, something as simple as str ~ "foo" would mean that you weren't dealing with a null-terminated string. You can do something like

printf("answer: %d\n", 42);

but if you mutate the string at all or create a new string from it, then you're not dealing with a string with a null-terminator one past its end anymore. Certainly, converting a string to wstring is not going to result in the wstring being null-terminated without a null terminator being explicitly appended to it.

Ultimately, that null-terminator one past the end of string literals is pretty much just useful for being able to pass string literals directly to C functions without having to explicitly put a null terminator on their end.

- Jonathan M Davis