Thread overview |
---|
December 05, 2020 converting D's string to use with C API with unicode | ||||
---|---|---|---|---|
| ||||
So in D I have a struct like this: >struct ProcessResult >{ > string[] output; > bool ok; >} in order to use output from C WINAPI with unicode, I need to convert each string to wchar* so that i can acess it from C with wchar_t*. Is that right or am I missing anything? >struct ProcessResult >{ > string[] output; > bool ok; > > C_ProcessResult toCResult() > { > auto r = C_ProcessResult(); > r.ok = this.ok; // just copy, no conversion needed > foreach(s; this.output) > r.output ~= cast(wchar*)s.ptr; > return r; > } >} >version(Windows) extern(C) export >struct C_ProcessResult >{ > wchar*[] output; > bool ok; >} |
December 05, 2020 Re: converting D's string to use with C API with unicode | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jack | On Saturday, 5 December 2020 at 19:51:14 UTC, Jack wrote: > So in D I have a struct like this: > >>struct ProcessResult >>{ >> string[] output; >> bool ok; >>} > > in order to use output from C WINAPI with unicode, I need to convert each string to wchar* so that i can acess it from C with wchar_t*. Is that right or am I missing anything? > > >>struct ProcessResult >>{ >> string[] output; >> bool ok; >> >> C_ProcessResult toCResult() >> { >> auto r = C_ProcessResult(); >> r.ok = this.ok; // just copy, no conversion needed >> foreach(s; this.output) >> r.output ~= cast(wchar*)s.ptr; >> return r; >> } >>} > >>version(Windows) extern(C) export >>struct C_ProcessResult >>{ >> wchar*[] output; >> bool ok; >>} I would just use std.encoding https://dlang.org/phobos/std_encoding.html and use transcode https://dlang.org/phobos/std_encoding.html#transcode |
December 05, 2020 Re: converting D's string to use with C API with unicode | ||||
---|---|---|---|---|
| ||||
Posted in reply to IGotD- | On Saturday, 5 December 2020 at 20:12:52 UTC, IGotD- wrote:
> On Saturday, 5 December 2020 at 19:51:14 UTC, Jack wrote:
>> So in D I have a struct like this:
>>
>>>struct ProcessResult
>>>{
>>> string[] output;
>>> bool ok;
>>>}
>>
>> in order to use output from C WINAPI with unicode, I need to convert each string to wchar* so that i can acess it from C with wchar_t*. Is that right or am I missing anything?
>>
>>
>>>struct ProcessResult
>>>{
>>> string[] output;
>>> bool ok;
>>>
>>> C_ProcessResult toCResult()
>>> {
>>> auto r = C_ProcessResult();
>>> r.ok = this.ok; // just copy, no conversion needed
>>> foreach(s; this.output)
>>> r.output ~= cast(wchar*)s.ptr;
>>> return r;
>>> }
>>>}
>>
>>>version(Windows) extern(C) export
>>>struct C_ProcessResult
>>>{
>>> wchar*[] output;
>>> bool ok;
>>>}
>
> I would just use std.encoding
>
> https://dlang.org/phobos/std_encoding.html
>
> and use transcode
>
> https://dlang.org/phobos/std_encoding.html#transcode
Forget previous post, I didn't see the arrays.
extern(C) has no knowledge of D arrays, I think you need to use wchar** instead of []. Keep in mind you need to store the lengths as well unless you use zero terminated strings.
|
December 05, 2020 Re: converting D's string to use with C API with unicode | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jack | On Saturday, 5 December 2020 at 19:51:14 UTC, Jack wrote: >>version(Windows) extern(C) export >>struct C_ProcessResult >>{ >> wchar*[] output; In D, `T[]` (where T is some element type, `wchar*` in this case) is a slice structure that bundles a length and a pointer together. It is NOT the same thing as `T[]` in C. You will get memory corruption if you try to use `T[]` directly when interfacing with C. Instead, you must use a bare pointer, plus a separate length/size if the C API accepts one. I'm guessing that `C_ProcessResult.output` should have type `wchar**`, but I can't say for sure without seeing the Windows API documentation or C header file in which the C structure is detailed. >> bool ok; >>} >>struct ProcessResult >>{ >> string[] output; >> bool ok; >> >> C_ProcessResult toCResult() >> { >> auto r = C_ProcessResult(); >> r.ok = this.ok; // just copy, no conversion needed >> foreach(s; this.output) >> r.output ~= cast(wchar*)s.ptr; This is incorrect, and will corrupt memory. `cast(wchar*)` is a reinterpret cast, and an invalid one at that. It says, "just take my word for it, the data at the address stored in `s.ptr` is UTF16 encoded." But, that's not true: the data is UTF8 encoded, because `s` is a `string`, so this will thoroughly confuse things and not do what you want at all. The text will be garbled and you will likely trigger a buffer overrun on the C side of things. What you need to do instead is allocate a separate array of `wchar[]`, and then use the UTF8 to UTF16 conversion algorithm to fill the new `wchar[]` array based on the `char` elements in `s`. The conversion algorithm is non-trivial, but the `std.encoding` module can do it for you. >> return r; >> } >>} > Note also that when exchanging heap-allocated data (such as most strings or arrays) with a C API, you must figure out who is responsible for de-allocating the memory at the proper time - and NOT BEFORE. If you allocate memory with D's GC (using `new` or the slice concatenation operators `~` and `~=`), watch out that you keep a reference to it alive on the D side until after the C API is completely done with it. Otherwise, D's GC may not realize it's still in use, and may de-allocate it early, causing memory corruption in a way that is very difficult to debug. |
December 05, 2020 Re: converting D's string to use with C API with unicode | ||||
---|---|---|---|---|
| ||||
Posted in reply to tsbockman | I totally forget to malloc() the strings and array. I don't do C has been a while and totally forget this, thank you so much guys for your answer. my code now look like this, still there's a memory corrupt. Could anyone help point out where is it? >struct ProcessResult >{ > string[] output; > bool ok; > > C_ProcessResult* toCResult() > { > import core.stdc.stdlib : malloc, free; > import core.stdc.string : memcpy; > import core.exception : onOutOfMemoryError; > import std.encoding : transcode; > auto mem = malloc(C_ProcessResult.sizeof); > if(!mem) { > onOutOfMemoryError(); > } > auto r = cast(C_ProcessResult*) mem; > r.ok = this.ok; > r.outputLength = cast(int) output.length; > r.output = cast(wchar**) malloc((wchar*).sizeof * output.length); > if(!r.output) { > onOutOfMemoryError(); > } > foreach(i; 0..output.length) { > wstring ws; > transcode(output[i], ws); > auto s = malloc(ws.length + 1); > if(!s) { onOutOfMemoryError(); > } > memcpy(s, ws.ptr, ws.length); > r.output[i] = cast(wchar*)s; > } > return r; > } >} |
December 05, 2020 Re: converting D's string to use with C API with unicode | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jack | On Saturday, 5 December 2020 at 21:55:13 UTC, Jack wrote: > my code now look like this, still there's a memory corrupt. Could anyone help point out where is it? > > ... > >> foreach(i; 0..output.length) { >> wstring ws; >> transcode(output[i], ws); >> auto s = malloc(ws.length + 1); >> if(!s) { >> onOutOfMemoryError(); >> } >> memcpy(s, ws.ptr, ws.length); `ws.length` is the length in `wchar`s, but `memcpy` expects the size in bytes. (This is because it takes `void*` pointers as inputs, and so does not know the element type or its size.) Also, I think you need to manually zero-terminate `s`. You allocate space to do so, but don't actually use it. (I believe that transcode will only zero-terminate the destination if the source argument is already zero-terminated.) >> r.output[i] = cast(wchar*)s; >> } |
December 06, 2020 Re: converting D's string to use with C API with unicode | ||||
---|---|---|---|---|
| ||||
Posted in reply to tsbockman | On Saturday, 5 December 2020 at 23:31:31 UTC, tsbockman wrote: > On Saturday, 5 December 2020 at 21:55:13 UTC, Jack wrote: >> my code now look like this, still there's a memory corrupt. Could anyone help point out where is it? >> >> ... >> >>> foreach(i; 0..output.length) { >>> wstring ws; >>> transcode(output[i], ws); >>> auto s = malloc(ws.length + 1); >>> if(!s) { >>> onOutOfMemoryError(); >>> } >>> memcpy(s, ws.ptr, ws.length); > > `ws.length` is the length in `wchar`s, but `memcpy` expects the size in bytes. (This is because it takes `void*` pointers as inputs, and so does not know the element type or its size.) How do I get this size in bytes from wstring? > Also, I think you need to manually zero-terminate `s`. You allocate space to do so, but don't actually use it. (I believe that transcode will only zero-terminate the destination if the source argument is already zero-terminated.) > >>> r.output[i] = cast(wchar*)s; >>> } I'll fix |
December 06, 2020 Re: converting D's string to use with C API with unicode | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jack | On Saturday, 5 December 2020 at 19:51:14 UTC, Jack wrote:
> So in D I have a struct like this:
>
>>struct ProcessResult
>>{
>> string[] output;
>> bool ok;
>>}
>
> in order to use output from C WINAPI with unicode, I need to convert each string to wchar* so that i can acess it from C with wchar_t*. Is that right or am I missing anything?
>
>
>>struct ProcessResult
>>{
>> string[] output;
>> bool ok;
>>
>> C_ProcessResult toCResult()
>> {
>> auto r = C_ProcessResult();
>> r.ok = this.ok; // just copy, no conversion needed
>> foreach(s; this.output)
>> r.output ~= cast(wchar*)s.ptr;
>> return r;
>> }
>>}
>
>>version(Windows) extern(C) export
>>struct C_ProcessResult
>>{
>> wchar*[] output;
>> bool ok;
>>}
Drawing string via WinAPI. As example.
// UTF-16. wchar*
wstring ws = "Abc"w;
ExtTextOutW( hdc, x, y, 0, &clipRect, cast( LPCWSTR ) ws.ptr, cast( uint ) ws.length, NULL );
// UTF-8. char*
string s = "Abc";
import std.utf : toUTF16;
string ws = s.toUTF16;
ExtTextOutW( hdc, x, y, 0, &clipRect, cast( LPCWSTR ) ws.ptr, cast( uint ) ws.length, NULL );
// UTF-32. dchar*
dstring ds = "Abc"d;
import std.utf : toUTF16;
string ws = ds.toUTF16;
ExtTextOutW( hdc, x, y, 0, &clipRect, cast( LPCWSTR ) ws.ptr, cast( uint ) ws.length, NULL );
One char.
// UTF-16. wchar
wchar wc = 'A';
ExtTextOutW( hdc, x, y, 0, &clipRect, cast( LPCWSTR ) &wc, 1, NULL );
// UTF-32. dchar
dchar dc = 'A';
import std.utf : encode;
wchar[ 2 ] ws;
auto l = encode( ws, dc );
ExtTextOutW( hdc, x, y, 0, &clipRect, cast( LPCWSTR ) &ws.ptr, cast( uint ) l, NULL );
//
// Font API
string face = "Arial";
LOGFONT lf;
import std.utf : toUTF16;
lf.lfFaceName[ 0 .. face.length ] = face.toUTF16;
HFONT hfont = CreateFontIndirect( &lf );
// Common case
LPWSTR toLPWSTR( string s ) nothrow // wchar_t*. UTF-16
{
import std.utf : toUTFz, toUTF16z, UTFException;
try { return toUTFz!( LPWSTR )( s ); }
catch ( UTFException e ) { return cast( LPWSTR ) "ERR"w.ptr; }
catch ( Exception e ) { return cast( LPWSTR ) "ERR"w.ptr; }
}
alias toLPWSTR toPWSTR;
alias toLPWSTR toLPOLESTR;
alias toLPWSTR toPOLESTR;
// WinAPI
string windowName = "Abc";
HWND hwnd =
CreateWindowEx(
...
windowName.toLPWSTR,
...
);
|
December 06, 2020 Re: converting D's string to use with C API with unicode | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jack | On Sunday, 6 December 2020 at 02:07:10 UTC, Jack wrote:
> On Saturday, 5 December 2020 at 23:31:31 UTC, tsbockman wrote:
>> On Saturday, 5 December 2020 at 21:55:13 UTC, Jack wrote:
>>>> wstring ws;
>>>> transcode(output[i], ws);
>>>> auto s = malloc(ws.length + 1);
>>>> if(!s) {
>>>> onOutOfMemoryError();
>>>> }
>>>> memcpy(s, ws.ptr, ws.length);
>>
>> `ws.length` is the length in `wchar`s, but `memcpy` expects the size in bytes. (This is because it takes `void*` pointers as inputs, and so does not know the element type or its size.)
>
> How do I get this size in bytes from wstring?
`ws.length * wchar.sizeof` should do it. `wstring` is just an alias for `immutable(wchar[])`, and the `length` property is the number of `wchar` elements in the slice.
|
December 06, 2020 Re: converting D's string to use with C API with unicode | ||||
---|---|---|---|---|
| ||||
Posted in reply to Виталий Фадеев | On Sunday, 6 December 2020 at 04:41:56 UTC, Виталий Фадеев wrote: > On Saturday, 5 December 2020 at 19:51:14 UTC, Jack wrote: >> So in D I have a struct like this: >> >>>struct ProcessResult >>>{ >>> string[] output; >>> bool ok; >>>} >> >> in order to use output from C WINAPI with unicode, I need to convert each string to wchar* so that i can acess it from C with wchar_t*. Is that right or am I missing anything? >> >> >>>struct ProcessResult >>>{ >>> string[] output; >>> bool ok; >>> >>> C_ProcessResult toCResult() >>> { >>> auto r = C_ProcessResult(); >>> r.ok = this.ok; // just copy, no conversion needed >>> foreach(s; this.output) >>> r.output ~= cast(wchar*)s.ptr; >>> return r; >>> } >>>} >> >>>version(Windows) extern(C) export >>>struct C_ProcessResult >>>{ >>> wchar*[] output; >>> bool ok; >>>} > > Drawing string via WinAPI. As example. > > // UTF-16. wchar* > wstring ws = "Abc"w; > ExtTextOutW( hdc, x, y, 0, &clipRect, cast( LPCWSTR ) ws.ptr, cast( uint ) ws.length, NULL ); > > // UTF-8. char* > string s = "Abc"; > import std.utf : toUTF16; > string ws = s.toUTF16; > ExtTextOutW( hdc, x, y, 0, &clipRect, cast( LPCWSTR ) ws.ptr, cast( uint ) ws.length, NULL ); > > // UTF-32. dchar* > dstring ds = "Abc"d; > import std.utf : toUTF16; > string ws = ds.toUTF16; > ExtTextOutW( hdc, x, y, 0, &clipRect, cast( LPCWSTR ) ws.ptr, cast( uint ) ws.length, NULL ); > > One char. > // UTF-16. wchar > wchar wc = 'A'; > ExtTextOutW( hdc, x, y, 0, &clipRect, cast( LPCWSTR ) &wc, 1, NULL ); > > // UTF-32. dchar > dchar dc = 'A'; > import std.utf : encode; > wchar[ 2 ] ws; > auto l = encode( ws, dc ); > ExtTextOutW( hdc, x, y, 0, &clipRect, cast( LPCWSTR ) &ws.ptr, cast( uint ) l, NULL ); > > // > // Font API > string face = "Arial"; > LOGFONT lf; > import std.utf : toUTF16; > lf.lfFaceName[ 0 .. face.length ] = face.toUTF16; > HFONT hfont = CreateFontIndirect( &lf ); > > // Common case > LPWSTR toLPWSTR( string s ) nothrow // wchar_t*. UTF-16 > { > import std.utf : toUTFz, toUTF16z, UTFException; > try { return toUTFz!( LPWSTR )( s ); } > catch ( UTFException e ) { return cast( LPWSTR ) "ERR"w.ptr; } > catch ( Exception e ) { return cast( LPWSTR ) "ERR"w.ptr; } > } didn't know about toUTFz!( LPWSTR ), I'll save everything else for futher reference, I'll be using WINAPI for a while. Thanks > alias toLPWSTR toPWSTR; > alias toLPWSTR toLPOLESTR; > alias toLPWSTR toPOLESTR; that's interesting, I didn't about using multiples alias. > // WinAPI > string windowName = "Abc"; > HWND hwnd = > CreateWindowEx( > ... > windowName.toLPWSTR, > ... > ); |
Copyright © 1999-2021 by the D Language Foundation