Thread overview
Encoding issue.
Jul 04, 2020
Jowei Dei
Jul 05, 2020
Ogi
Jul 05, 2020
Adam D. Ruppe
July 04, 2020
I'm writing a console program. I use stdin to get a file object, use the file object to get the original byte stream of the input string, and then use the decode method to decode it. For English, this works very well, but when I use Chinese, the test results in an exception, and then the program stops. I have a look. My system is win10 x64, and the console code page is 936 ASCII GBK (China national standard encoding) CHINESE. Is there any good way to convert my console input to the internal string of D?
July 05, 2020
This should switch Windows cmd encoding to UTF-8:

import core.sys.windows.windows : SetConsoleOutputCP;
SetConsoleOutputCP(65001);

July 05, 2020
On Saturday, 4 July 2020 at 15:16:22 UTC, Jowei Dei wrote:
> Is there any good way to convert my console input to the
> internal string of D?

The best thing to do is to use the wide-char versions of the Windows API

http://dpldocs.info/this-week-in-d/Blog.Posted_2019_11_25.html#tip-of-the-week

Notice the ReadConsoleW and WideCharToMultiByte function calls. ReadConsoleW reads input from the windows console as utf-16 wide chars. You can use those directly in D as the type `wstring` or you can convert them to plain utf-8 string via the WideCharToMultiByte function as seen in the example code in my blog.


You could also do some conversions like changing the console code page (I do NOT recommend this) or converting from the current code page to UTF8. The other answer suggested SetConsoleOutputCP; this is for output, and since you need input, the function is SetConsoleCP. https://docs.microsoft.com/en-us/windows/console/setconsolecp

Just while that looks like the easiest way, it changes a global setting in the console that remains after your program returns and is subtly buggy with regard to font selection, copy/paste and other issues.


To convert input yourself, use this function to get the current console code page:

https://docs.microsoft.com/en-us/windows/console/getconsolecp

And pass that as the CodePage argument to MultiByteToWideChar

https://docs.microsoft.com/en-us/windows/win32/api/stringapiset/nf-stringapiset-multibytetowidechar


This gives you a `wstring`.


But really better to just let Windows do this for you by calling ReadConsoleW to get the input in the first place. This works in all console cases and avoids the bug. Only worry here is it does NOT work if the user pipes data to your program

other_program | your_program.exe

will fail on ReadConsoleW. So you will have to check that in an if statement and change back to readln or whatever. My first blog discusses this in more detail.