Thread overview |
---|
April 05, 2016 Convert wchar* to wstring? | ||||
---|---|---|---|---|
| ||||
I'm sorry for this total newbie question, but for some reason this is eluding me. I must be overlooking something obvious, but I haven't been able to figure this out and haven't found anything helpful. I am invoking an entry point in a D DLL from C# (via extern (C)), and one of the parameters is a string. This works just fine for ANSI, but I'm having trouble with the Unicode equivalent. For ANSI, the message parameter is char*, and string info = to!string(message) produces the correct string. For Unicode, I assumed this would be wchar_t*, as it is in C++. (In C++ you can just pass the wchar_t* value to the wstring constructor.) So I tried wchar_t*, wchar* and dchar* as well. When the message parameter is wchar*, wstring info = to!wstring(message) populates the string with the _address_ of the wchar*. So when message was in the debugger as 0x00000000035370e8 L"Writing Exhaustive unit tests is exhausting.", the wstring info variable ended up as {length=7 ptr=0x000000001c174a20 L"35370E8" }. The dstring*/wchar_t* version had equivalent results. Again, I'm sure I'm missing something obvious, but I poked at this problem with various types, casts, Phobos library string conversions, and I'm just stumped! :) thanks, Thalamus |
April 05, 2016 Re: Convert wchar* to wstring? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Thalamus | On Tuesday, 5 April 2016 at 01:21:55 UTC, Thalamus wrote:
> I'm sorry for this total newbie question, but for some reason this is eluding me. I must be overlooking something obvious, but I haven't been able to figure this out and haven't found anything helpful.
>
> I am invoking an entry point in a D DLL from C# (via extern (C)), and one of the parameters is a string. This works just fine for ANSI, but I'm having trouble with the Unicode equivalent.
>
> For ANSI, the message parameter is char*, and string info = to!string(message) produces the correct string.
>
> For Unicode, I assumed this would be wchar_t*, as it is in C++. (In C++ you can just pass the wchar_t* value to the wstring constructor.) So I tried wchar_t*, wchar* and dchar* as well. When the message parameter is wchar*, wstring info = to!wstring(message) populates the string with the _address_ of the wchar*. So when message was in the debugger as 0x00000000035370e8 L"Writing Exhaustive unit tests is exhausting.", the wstring info variable ended up as {length=7 ptr=0x000000001c174a20 L"35370E8" }. The dstring*/wchar_t* version had equivalent results.
>
> Again, I'm sure I'm missing something obvious, but I poked at this problem with various types, casts, Phobos library string conversions, and I'm just stumped! :)
>
> thanks,
> Thalamus
I cannot give you any code example, but can you try that:
1. By using a loop, calculate the total byte length until finding 0 (zero). (This would work only if it was given as NULL-terminated, otherwise you need to know the length already.)
2. Then define wchar[ calculated_length ] mystring;
3. Copy the content from wchar* into you array. mystring[0 .. calculated_length ] = wcharptr[0 .. calculated_length];
4. If you want, you can do casting for your mystring to convert it to wstring.
|
April 05, 2016 Re: Convert wchar* to wstring? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Thalamus | On Tuesday, 5 April 2016 at 01:21:55 UTC, Thalamus wrote: > When the message parameter is wchar*, wstring info = to!wstring(message) populates the string with the _address_ of the wchar*. So when message was in the debugger as 0x00000000035370e8 L"Writing Exhaustive unit tests is exhausting.", the wstring info variable ended up as {length=7 ptr=0x000000001c174a20 L"35370E8" }. `wchar*` is a raw pointer. D APIs generally expect a dynamic array - also known as a "slice" - which packs the pointer together with an explicit `length` field. You can easily get a slice from a pointer using D's convenient slicing syntax: https://dlang.org/spec/arrays.html#slicing wchar* cw; size_t cw_len; // be sure to use the right length, or you'll suffer buffer overruns. wchar[] dw = cw[0 .. cw_len]; Slicing is extremely fast, because it does not allocate any new heap memory: `dw` is still pointing to the same chunk of memory as cw. D APIs that work with text will often accept a mutable character array like `dw` without issue. However, `wstring` in D is an alias for `immutable(wchar[])`. In the example above, `dw` cannot be immutable because it is reusing the same mutable memory chunk as `cw`. If the D code you want to interface with requires a real `wstring`, you'll need to copy the text into a new immutable memory chunk: wstring wstr = dw.idup; // idup is short for "immutable duplicate" `idup` will allocate heap memory, so if you care about performance and memory usage, don't use it unless you actually need it. You can also combine both steps into a one-liner: wstring wstr = cw[0 .. cw_len].idup; |
April 05, 2016 Re: Convert wchar* to wstring? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Thalamus | On Tuesday, 5 April 2016 at 01:21:55 UTC, Thalamus wrote: > I'm sorry for this total newbie question, but for some reason this is eluding me. [...] You've been given the right answer by the other participants but I'd like to share this simple helper range from my user lib: auto nullTerminated(C)(C c) if (isPointer!C && isSomeChar!(PointerTarget!(C))) { struct NullTerminated(C) { private C _front; /// this(C c) { _front = c; } /// @property bool empty() { return *_front == 0; } /// auto front() { return *_front; } /// void popFront() { ++_front; } /// C save() { return _front; } } return NullTerminated!C(c); } The idea is to get rid of the conversion and to process directly the pointer in all phobos function. |
April 05, 2016 Re: Convert wchar* to wstring? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Thalamus | On Tuesday, 5 April 2016 at 01:21:55 UTC, Thalamus wrote:
> I'm sorry for this total newbie question, but for some reason this is eluding me. I must be overlooking something obvious, but I haven't been able to figure this out and haven't found anything helpful.
In case you haven't done so already, you'll also have to use CharSet = CharSet.Unicode in the DllImport attribute.
|
April 05, 2016 Re: Convert wchar* to wstring? | ||||
---|---|---|---|---|
| ||||
Posted in reply to tsbockman | On Tuesday, 5 April 2016 at 07:10:50 UTC, tsbockman wrote:
> You can also combine both steps into a one-liner:
>
> wstring wstr = cw[0 .. cw_len].idup;
This should do the trick, too:
import std.conv : to;
auto wstr = to!wstring(cw);
|
April 05, 2016 Re: Convert wchar* to wstring? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Thalamus | On Tuesday, 5 April 2016 at 01:21:55 UTC, Thalamus wrote:
> I am invoking an entry point in a D DLL from C# (via extern (C)), and one of the parameters is a string. This works just fine for ANSI, but I'm having trouble with the Unicode equivalent.
>
> When the message parameter is wchar*, wstring info = to!wstring(message) populates the string with the _address_ of the wchar*. So when message was in the debugger as 0x00000000035370e8 L"Writing Exhaustive unit tests is exhausting.", the wstring info variable ended up as {length=7 ptr=0x000000001c174a20 L"35370E8" }. The dstring*/wchar_t* version had equivalent results.
Strings passed from C# are pinned, but temporary. You probably want to receive them as immutable (StringBuilder is for mutable string buffers), it's also easier to just pass the string length from C# side:
C#:
[DllImport(...)]
extern void dfunc(string s, int len);
dfunc(s, s.Length);
D:
extern(C) void dfunc(immutable(wchar)* s, int len)
{
wstring ws = s[0..len];
}
Since the string is temporary, you'll have to idup it if you want to retain it after the call finishes.
|
April 05, 2016 Re: Convert wchar* to wstring? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Thalamus | Thanks everyone! You've all been very helpful. |
April 05, 2016 Re: Convert wchar* to wstring? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Thalamus | On Tuesday, 5 April 2016 at 11:26:44 UTC, Thalamus wrote:
> Thanks everyone! You've all been very helpful.
For anyone who has the same question and happens on this thread, I wanted to post what I finally came up with. I combined the information everyone in this thread gave me with what I saw in Phobos source for the to!string() implementation, closely following the latter. The important to!string() code is in the toImpl implementation in conv.d at line 880. The existing code uses strlen, but that's an ANSI function. Fortunately, D has wcslen available, too.
import core.stdc.stddef; // For wchar_t. This is defined differently for Windows vs POSIX.
import core.stdc.wchar_; // For wcslen.
wstring toWstring(wchar_t* value)
{
return value ? cast(wstring) value[0..wcslen(wstr)].dup : null;
}
The Phobos code notes that this operation is unsafe, because there's no guarantee the string is null-terminated as it should be. That's definitely true. The only outcome you can be really sure is accurate is an access violation. :)
thanks!
Thalamus
|
April 05, 2016 Re: Convert wchar* to wstring? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Thalamus | On 05.04.2016 20:44, Thalamus wrote: > import core.stdc.stddef; // For wchar_t. This is defined differently for > Windows vs POSIX. > import core.stdc.wchar_; // For wcslen. Aside: D has syntax for "// For wchar_t.": `import core.stdc.stddef: wchar_t;`. > wstring toWstring(wchar_t* value) > { > return value ? cast(wstring) value[0..wcslen(wstr)].dup : null; > } wchar_t is not wchar. wstring is not (portably) compatible with a wchar_t array. If you actually have a wchar_t* and you want a wstring as opposed to a wchar_t[], then you will potentially have to do some converting. If you have a wchar*, then don't use wcslen, as that's defined in terms of wchar_t. There may be some function for finding the first null wchar from a wchar*, but I don't know it, and writing out a loop isn't exactly hard: ---- wstring toWstring(const(wchar)* value) { if (value is null) return null; auto cursor = value; while (*cursor != 0) ++cursor; return value[0 .. cursor - value].dup; } ---- |
Copyright © 1999-2021 by the D Language Foundation