Thread overview
wstring hex literals
Sep 20, 2017
jmh530
Sep 20, 2017
Neia Neutuladh
Sep 20, 2017
jmh530
September 20, 2017
I don't seem to be having any issues making strings or dstrings from hex, but I run into some issues with wstrings. Of course, my knowledge of UTF-16 is limited, but I don't see any issues with the code below and I get some errors on the hex string literal.

unittest
{
    wchar data = 0x03C0;
    auto data2 = x"03C0"w;
    static assert(typeof(data2) == wstring);
}

testing_utf16.d(5): Error: Truncated UTF-8 sequence
testing_utf16.d(6):        while evaluating: static assert((_error_) == (wstring
))
Failed: ["dmd", "-unittest", "-v", "-o-", "testing_utf16.d", "-I."]
September 20, 2017
On Wednesday, 20 September 2017 at 15:04:08 UTC, jmh530 wrote:
> testing_utf16.d(5): Error: Truncated UTF-8 sequence
> testing_utf16.d(6):        while evaluating: static assert((_error_) == (wstring
> ))
> Failed: ["dmd", "-unittest", "-v", "-o-", "testing_utf16.d", "-I."]

https://dlang.org/spec/lex.html#hex_strings says:

> The string literals are assembled as UTF-8 char arrays, and the postfix is applied to convert to wchar or dchar as necessary as a final step.

This isn't the friendliest thing ever and is contrary to my expectations too. You basically have to encode your string into UTF-8 and then paste the hex of that in.

What should work is escape sequences:

    wstring str = "\u03c0"w;
September 20, 2017
On Wednesday, 20 September 2017 at 16:26:46 UTC, Neia Neutuladh wrote:
> On Wednesday, 20 September 2017 at 15:04:08 UTC, jmh530 wrote:
>> testing_utf16.d(5): Error: Truncated UTF-8 sequence
>> testing_utf16.d(6):        while evaluating: static assert((_error_) == (wstring
>> ))
>> Failed: ["dmd", "-unittest", "-v", "-o-", "testing_utf16.d", "-I."]
>
> https://dlang.org/spec/lex.html#hex_strings says:
>
>> The string literals are assembled as UTF-8 char arrays, and the postfix is applied to convert to wchar or dchar as necessary as a final step.
>
> This isn't the friendliest thing ever and is contrary to my expectations too. You basically have to encode your string into UTF-8 and then paste the hex of that in.
>
> What should work is escape sequences:
>
>     wstring str = "\u03c0"w;

I see, thanks. I missed that bit on UTF-8. I was a little confused.