converting D's string to use with C API with unicode - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » Learn » converting D's string to use with C API with unicode

Thread overview

converting D's string to use with C API with unicode
Dec 05, 2020 Jack
Dec 05, 2020 IGotD-
Dec 05, 2020 IGotD-
Dec 05, 2020 tsbockman
Dec 05, 2020 Jack
Dec 05, 2020 tsbockman
Dec 06, 2020 Jack
Dec 06, 2020 tsbockman
Dec 06, 2020 Jack
Dec 06, 2020 Виталий Фадеев
Dec 06, 2020 Jack

December 05, 2020

converting D's string to use with C API with unicode

Posted by Jack

Jack

So in D I have a struct like this:

>struct ProcessResult
>{
>	string[] output;
>	bool ok;
>}

in order to use output from C WINAPI with unicode, I need to convert each string to wchar* so that i can acess it from C with wchar_t*. Is that right or am I missing anything?


>struct ProcessResult
>{
>	string[] output;
>	bool ok;
>
>	C_ProcessResult toCResult()
>	{
>		auto r = C_ProcessResult();
>		r.ok = this.ok; // just copy, no conversion needed
>		foreach(s; this.output)
>			r.output ~= cast(wchar*)s.ptr;
>		return r;
>	}
>}

>version(Windows) extern(C) export
>struct C_ProcessResult
>{
>	wchar*[] output;
>	bool ok;
>}

December 05, 2020

Re: converting D's string to use with C API with unicode

Posted by IGotD-
in reply to Jack

IGotD-

Posted in reply to Jack

On Saturday, 5 December 2020 at 19:51:14 UTC, Jack wrote:
> So in D I have a struct like this:
>
>>struct ProcessResult
>>{
>>	string[] output;
>>	bool ok;
>>}
>
> in order to use output from C WINAPI with unicode, I need to convert each string to wchar* so that i can acess it from C with wchar_t*. Is that right or am I missing anything?
>
>
>>struct ProcessResult
>>{
>>	string[] output;
>>	bool ok;
>>
>>	C_ProcessResult toCResult()
>>	{
>>		auto r = C_ProcessResult();
>>		r.ok = this.ok; // just copy, no conversion needed
>>		foreach(s; this.output)
>>			r.output ~= cast(wchar*)s.ptr;
>>		return r;
>>	}
>>}
>
>>version(Windows) extern(C) export
>>struct C_ProcessResult
>>{
>>	wchar*[] output;
>>	bool ok;
>>}

I would just use std.encoding

https://dlang.org/phobos/std_encoding.html

and use transcode

https://dlang.org/phobos/std_encoding.html#transcode

December 05, 2020

Re: converting D's string to use with C API with unicode

Posted by IGotD-
in reply to IGotD-

IGotD-

Posted in reply to IGotD-

On Saturday, 5 December 2020 at 20:12:52 UTC, IGotD- wrote:
> On Saturday, 5 December 2020 at 19:51:14 UTC, Jack wrote:
>> So in D I have a struct like this:
>>
>>>struct ProcessResult
>>>{
>>>	string[] output;
>>>	bool ok;
>>>}
>>
>> in order to use output from C WINAPI with unicode, I need to convert each string to wchar* so that i can acess it from C with wchar_t*. Is that right or am I missing anything?
>>
>>
>>>struct ProcessResult
>>>{
>>>	string[] output;
>>>	bool ok;
>>>
>>>	C_ProcessResult toCResult()
>>>	{
>>>		auto r = C_ProcessResult();
>>>		r.ok = this.ok; // just copy, no conversion needed
>>>		foreach(s; this.output)
>>>			r.output ~= cast(wchar*)s.ptr;
>>>		return r;
>>>	}
>>>}
>>
>>>version(Windows) extern(C) export
>>>struct C_ProcessResult
>>>{
>>>	wchar*[] output;
>>>	bool ok;
>>>}
>
> I would just use std.encoding
>
> https://dlang.org/phobos/std_encoding.html
>
> and use transcode
>
> https://dlang.org/phobos/std_encoding.html#transcode

Forget previous post, I didn't see the arrays.

extern(C) has no knowledge of D arrays, I think you need to use wchar** instead of []. Keep in mind you need to store the lengths as well unless you use zero terminated strings.

December 05, 2020

Re: converting D's string to use with C API with unicode

Posted by tsbockman
in reply to Jack

tsbockman

Posted in reply to Jack

On Saturday, 5 December 2020 at 19:51:14 UTC, Jack wrote:
>>version(Windows) extern(C) export
>>struct C_ProcessResult
>>{
>>	wchar*[] output;

In D, `T[]` (where T is some element type, `wchar*` in this case) is a slice structure that bundles a length and a pointer together. It is NOT the same thing as `T[]` in C. You will get memory corruption if you try to use `T[]` directly when interfacing with C.

Instead, you must use a bare pointer, plus a separate length/size if the C API accepts one. I'm guessing that `C_ProcessResult.output` should have type `wchar**`, but I can't say for sure without seeing the Windows API documentation or C header file in which the C structure is detailed.

>>	bool ok;
>>}

>>struct ProcessResult
>>{
>>	string[] output;
>>	bool ok;
>>
>>	C_ProcessResult toCResult()
>>	{
>>		auto r = C_ProcessResult();
>>		r.ok = this.ok; // just copy, no conversion needed
>>		foreach(s; this.output)
>>			r.output ~= cast(wchar*)s.ptr;

This is incorrect, and will corrupt memory. `cast(wchar*)` is a reinterpret cast, and an invalid one at that. It says, "just take my word for it, the data at the address stored in `s.ptr` is UTF16 encoded." But, that's not true: the data is UTF8 encoded, because `s` is a `string`, so this will thoroughly confuse things and not do what you want at all. The text will be garbled and you will likely trigger a buffer overrun on the C side of things.

What you need to do instead is allocate a separate array of `wchar[]`, and then use the UTF8 to UTF16 conversion algorithm to fill the new `wchar[]` array based on the `char` elements in `s`.

The conversion algorithm is non-trivial, but the `std.encoding` module can do it for you.

>>		return r;
>>	}
>>}
>

Note also that when exchanging heap-allocated data (such as most strings or arrays) with a C API, you must figure out who is responsible for de-allocating the memory at the proper time - and NOT BEFORE. If you allocate memory with D's GC (using `new` or the slice concatenation operators `~` and `~=`), watch out that you keep a reference to it alive on the D side until after the C API is completely done with it. Otherwise, D's GC may not realize it's still in use, and may de-allocate it early, causing memory corruption in a way that is very difficult to debug.

December 05, 2020

Re: converting D's string to use with C API with unicode

Posted by Jack
in reply to tsbockman

Jack

Posted in reply to tsbockman

I totally forget to malloc() the strings and array. I don't do C has been a while and totally forget this, thank you so much guys for your answer.

my code now look like this, still there's a memory corrupt. Could anyone help point out where is it?

>struct ProcessResult
>{
>	string[] output;
>	bool ok;
>
>	C_ProcessResult* toCResult()
>	{
>		import core.stdc.stdlib : malloc, free;
>		import core.stdc.string : memcpy;
>		import core.exception : onOutOfMemoryError;
>		import std.encoding : transcode;

>		auto mem = malloc(C_ProcessResult.sizeof);
>		if(!mem) {
>			onOutOfMemoryError();
>		}
>		auto r = cast(C_ProcessResult*) mem;
>		r.ok = this.ok;
>		r.outputLength = cast(int) output.length;
>		r.output = cast(wchar**) malloc((wchar*).sizeof * output.length);
>		if(!r.output) {
>			onOutOfMemoryError();
>		}
>		foreach(i; 0..output.length) {
>			wstring ws;
>			transcode(output[i], ws);
>			auto s = malloc(ws.length + 1);
>			if(!s) { 				onOutOfMemoryError();
>			}
>			memcpy(s, ws.ptr, ws.length);
>			r.output[i] = cast(wchar*)s;
>		}
>		return r;
>	}
>}

December 05, 2020

Re: converting D's string to use with C API with unicode

Posted by tsbockman
in reply to Jack

tsbockman

Posted in reply to Jack

On Saturday, 5 December 2020 at 21:55:13 UTC, Jack wrote:
> my code now look like this, still there's a memory corrupt. Could anyone help point out where is it?
>
> ...
>
>> foreach(i; 0..output.length) {
>>     wstring ws;
>>     transcode(output[i], ws);
>>     auto s = malloc(ws.length + 1);
>>     if(!s) {
>>         onOutOfMemoryError();
>>     }
>>     memcpy(s, ws.ptr, ws.length);

`ws.length` is the length in `wchar`s, but `memcpy` expects the size in bytes. (This is because it takes `void*` pointers as inputs, and so does not know the element type or its size.)

Also, I think you need to manually zero-terminate `s`. You allocate space to do so, but don't actually use it. (I believe that transcode will only zero-terminate the destination if the source argument is already zero-terminated.)

>>     r.output[i] = cast(wchar*)s;
>> }

December 06, 2020

Re: converting D's string to use with C API with unicode

Posted by Jack
in reply to tsbockman

Jack

Posted in reply to tsbockman

On Saturday, 5 December 2020 at 23:31:31 UTC, tsbockman wrote:
> On Saturday, 5 December 2020 at 21:55:13 UTC, Jack wrote:
>> my code now look like this, still there's a memory corrupt. Could anyone help point out where is it?
>>
>> ...
>>
>>> foreach(i; 0..output.length) {
>>>     wstring ws;
>>>     transcode(output[i], ws);
>>>     auto s = malloc(ws.length + 1);
>>>     if(!s) {
>>>         onOutOfMemoryError();
>>>     }
>>>     memcpy(s, ws.ptr, ws.length);
>
> `ws.length` is the length in `wchar`s, but `memcpy` expects the size in bytes. (This is because it takes `void*` pointers as inputs, and so does not know the element type or its size.)

How do I get this size in bytes from wstring?

> Also, I think you need to manually zero-terminate `s`. You allocate space to do so, but don't actually use it. (I believe that transcode will only zero-terminate the destination if the source argument is already zero-terminated.)
>
>>>     r.output[i] = cast(wchar*)s;
>>> }

I'll fix

December 06, 2020

Re: converting D's string to use with C API with unicode

Posted by Виталий Фадеев
in reply to Jack

Виталий Фадеев

Posted in reply to Jack

On Saturday, 5 December 2020 at 19:51:14 UTC, Jack wrote:
> So in D I have a struct like this:
>
>>struct ProcessResult
>>{
>>	string[] output;
>>	bool ok;
>>}
>
> in order to use output from C WINAPI with unicode, I need to convert each string to wchar* so that i can acess it from C with wchar_t*. Is that right or am I missing anything?
>
>
>>struct ProcessResult
>>{
>>	string[] output;
>>	bool ok;
>>
>>	C_ProcessResult toCResult()
>>	{
>>		auto r = C_ProcessResult();
>>		r.ok = this.ok; // just copy, no conversion needed
>>		foreach(s; this.output)
>>			r.output ~= cast(wchar*)s.ptr;
>>		return r;
>>	}
>>}
>
>>version(Windows) extern(C) export
>>struct C_ProcessResult
>>{
>>	wchar*[] output;
>>	bool ok;
>>}

Drawing string via WinAPI. As example.

// UTF-16. wchar*
wstring ws = "Abc"w;
ExtTextOutW( hdc, x, y, 0, &clipRect, cast( LPCWSTR ) ws.ptr, cast( uint ) ws.length, NULL );

// UTF-8. char*
string s = "Abc";
import std.utf : toUTF16;
string ws = s.toUTF16;
ExtTextOutW( hdc, x, y, 0, &clipRect, cast( LPCWSTR ) ws.ptr, cast( uint ) ws.length, NULL );

// UTF-32. dchar*
dstring ds = "Abc"d;
import std.utf : toUTF16;
string ws = ds.toUTF16;
ExtTextOutW( hdc, x, y, 0, &clipRect, cast( LPCWSTR ) ws.ptr, cast( uint ) ws.length, NULL );

One char.
// UTF-16. wchar
wchar wc = 'A';
ExtTextOutW( hdc, x, y, 0, &clipRect, cast( LPCWSTR ) &wc, 1, NULL );

// UTF-32. dchar
dchar dc = 'A';
import std.utf : encode;
wchar[ 2 ] ws;
auto l = encode( ws, dc );
ExtTextOutW( hdc, x, y, 0, &clipRect, cast( LPCWSTR ) &ws.ptr, cast( uint ) l, NULL );

//
// Font API
string face = "Arial";
LOGFONT lf;
import std.utf : toUTF16;
lf.lfFaceName[ 0 .. face.length ] = face.toUTF16;
HFONT hfont = CreateFontIndirect( &lf );

// Common case
LPWSTR toLPWSTR( string s ) nothrow // wchar_t*. UTF-16
{
    import std.utf : toUTFz, toUTF16z, UTFException;
    try                        { return toUTFz!( LPWSTR )( s ); }
    catch ( UTFException e )   { return cast( LPWSTR ) "ERR"w.ptr; }
    catch ( Exception e )      { return cast( LPWSTR ) "ERR"w.ptr; }
}
alias toLPWSTR toPWSTR;
alias toLPWSTR toLPOLESTR;
alias toLPWSTR toPOLESTR;

// WinAPI
string windowName = "Abc";
HWND hwnd =
    CreateWindowEx(
        ...
        windowName.toLPWSTR,
        ...
    );

December 06, 2020

Re: converting D's string to use with C API with unicode

Posted by tsbockman
in reply to Jack

tsbockman

Posted in reply to Jack

On Sunday, 6 December 2020 at 02:07:10 UTC, Jack wrote:
> On Saturday, 5 December 2020 at 23:31:31 UTC, tsbockman wrote:
>> On Saturday, 5 December 2020 at 21:55:13 UTC, Jack wrote:
>>>>     wstring ws;
>>>>     transcode(output[i], ws);
>>>>     auto s = malloc(ws.length + 1);
>>>>     if(!s) {
>>>>         onOutOfMemoryError();
>>>>     }
>>>>     memcpy(s, ws.ptr, ws.length);
>>
>> `ws.length` is the length in `wchar`s, but `memcpy` expects the size in bytes. (This is because it takes `void*` pointers as inputs, and so does not know the element type or its size.)
>
> How do I get this size in bytes from wstring?

`ws.length * wchar.sizeof` should do it. `wstring` is just an alias for `immutable(wchar[])`, and the `length` property is the number of `wchar` elements in the slice.

December 06, 2020

Re: converting D's string to use with C API with unicode

Posted by Jack
in reply to Виталий Фадеев

Jack

Posted in reply to Виталий Фадеев

On Sunday, 6 December 2020 at 04:41:56 UTC, Виталий Фадеев wrote:
> On Saturday, 5 December 2020 at 19:51:14 UTC, Jack wrote:
>> So in D I have a struct like this:
>>
>>>struct ProcessResult
>>>{
>>>	string[] output;
>>>	bool ok;
>>>}
>>
>> in order to use output from C WINAPI with unicode, I need to convert each string to wchar* so that i can acess it from C with wchar_t*. Is that right or am I missing anything?
>>
>>
>>>struct ProcessResult
>>>{
>>>	string[] output;
>>>	bool ok;
>>>
>>>	C_ProcessResult toCResult()
>>>	{
>>>		auto r = C_ProcessResult();
>>>		r.ok = this.ok; // just copy, no conversion needed
>>>		foreach(s; this.output)
>>>			r.output ~= cast(wchar*)s.ptr;
>>>		return r;
>>>	}
>>>}
>>
>>>version(Windows) extern(C) export
>>>struct C_ProcessResult
>>>{
>>>	wchar*[] output;
>>>	bool ok;
>>>}
>
> Drawing string via WinAPI. As example.
>
> // UTF-16. wchar*
> wstring ws = "Abc"w;
> ExtTextOutW( hdc, x, y, 0, &clipRect, cast( LPCWSTR ) ws.ptr, cast( uint ) ws.length, NULL );
>
> // UTF-8. char*
> string s = "Abc";
> import std.utf : toUTF16;
> string ws = s.toUTF16;
> ExtTextOutW( hdc, x, y, 0, &clipRect, cast( LPCWSTR ) ws.ptr, cast( uint ) ws.length, NULL );
>
> // UTF-32. dchar*
> dstring ds = "Abc"d;
> import std.utf : toUTF16;
> string ws = ds.toUTF16;
> ExtTextOutW( hdc, x, y, 0, &clipRect, cast( LPCWSTR ) ws.ptr, cast( uint ) ws.length, NULL );
>
> One char.
> // UTF-16. wchar
> wchar wc = 'A';
> ExtTextOutW( hdc, x, y, 0, &clipRect, cast( LPCWSTR ) &wc, 1, NULL );
>
> // UTF-32. dchar
> dchar dc = 'A';
> import std.utf : encode;
> wchar[ 2 ] ws;
> auto l = encode( ws, dc );
> ExtTextOutW( hdc, x, y, 0, &clipRect, cast( LPCWSTR ) &ws.ptr, cast( uint ) l, NULL );
>
> //
> // Font API
> string face = "Arial";
> LOGFONT lf;
> import std.utf : toUTF16;
> lf.lfFaceName[ 0 .. face.length ] = face.toUTF16;
> HFONT hfont = CreateFontIndirect( &lf );
>
> // Common case
> LPWSTR toLPWSTR( string s ) nothrow // wchar_t*. UTF-16
> {
>     import std.utf : toUTFz, toUTF16z, UTFException;
>     try                        { return toUTFz!( LPWSTR )( s ); }
>     catch ( UTFException e )   { return cast( LPWSTR ) "ERR"w.ptr; }
>     catch ( Exception e )      { return cast( LPWSTR ) "ERR"w.ptr; }
> }

didn't know about toUTFz!( LPWSTR ), I'll save everything else for futher reference, I'll be using WINAPI for a while. Thanks

> alias toLPWSTR toPWSTR;
> alias toLPWSTR toLPOLESTR;
> alias toLPWSTR toPOLESTR;

that's interesting, I didn't about using multiples alias.

> // WinAPI
> string windowName = "Abc";
> HWND hwnd =
>     CreateWindowEx(
>         ...
>         windowName.toLPWSTR,
>         ...
>     );

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation