Jump to page: 1 2
Thread overview
unicode characters are not printed correctly on the windows command line?
Dec 22, 2019
moth
Dec 22, 2019
rikki cattermole
Dec 22, 2019
Mike Parker
Dec 22, 2019
Adam D. Ruppe
Dec 22, 2019
Adam D. Ruppe
Dec 23, 2019
Symphony
Dec 23, 2019
bachmeier
Dec 23, 2019
Symphony
Dec 23, 2019
H. S. Teoh
Dec 23, 2019
Adam D. Ruppe
Dec 23, 2019
H. S. Teoh
Dec 22, 2019
Adam D. Ruppe
December 22, 2019
hi all.

been learning d for the last few years but suddenly realised...

when i use this code:

writeln('♥');

the output displayed on the windows command line is "ÔÖÑ" [it works fine when piped directly into a text file, however].

i've looked about in this forum, but all that i could find was people in 2016[!] saying the codepage had to be altered - clearly nonsense, since Rust [which i am also learning] has no problem whatsoever displaying "♥".

is there any function i can call or setting i can adjust to get D to do the same, or do i have to wait for something to be fixed in the language / compiler itself?

best regards

moth [su.angel-island.zone]

December 22, 2019
On 22/12/2019 7:11 PM, moth wrote:
> hi all.
> 
> been learning d for the last few years but suddenly realised...
> 
> when i use this code:
> 
> writeln('♥');
> 
> the output displayed on the windows command line is "ÔÖÑ" [it works fine when piped directly into a text file, however].
> 
> i've looked about in this forum, but all that i could find was people in 2016[!] saying the codepage had to be altered - clearly nonsense, since Rust [which i am also learning] has no problem whatsoever displaying "♥".

This is not nonsense. This is the correct solution if that is what you intend for your program to do.

Not everybody will want this. They may have set the code page themselves in some way. It may not have even occurred within a D application!

Its best we leave it as the default to play nice with other applications and libraries.

> is there any function i can call or setting i can adjust to get D to do the same, or do i have to wait for something to be fixed in the language / compiler itself?
> 
> best regards
> 
> moth [su.angel-island.zone]
> 

Not a bug. This is a known issue on the Windows side for people new to developing natively for it.

I just checked the terminal emulator I use, ConEmu and yeah it doesn't have to do anything to make Unicode "just work" settings wise. Its conhost with its legacy which is what you are facing.
December 22, 2019
On Sunday, 22 December 2019 at 06:25:42 UTC, rikki cattermole wrote:
> On 22/12/2019 7:11 PM, moth wrote:


>
>> is there any function i can call or setting i can adjust to get D to do the same, or do i have to wait for something to be fixed in the language / compiler itself?
>> 
>
> Not a bug. This is a known issue on the Windows side for people new to developing natively for it.

Yes, and it's not just D programs. And setting the code page isn't always perfect, as it matters which font cmd is configured to use. Google for "windows command prompt unicode output".

MS has updated the command prompt to support Unicode, but I don't know how to use it:

https://devblogs.microsoft.com/commandline/windows-command-line-unicode-and-utf-8-output-text-buffer/

If you're on Windows 10, there's also Windows Terminal, which was released on the app store in June:

https://devblogs.microsoft.com/commandline/windows-terminal-preview-v0-7-release/
December 22, 2019
On Sunday, 22 December 2019 at 06:25:42 UTC, rikki cattermole wrote:
> Not a bug.

No, Phobos is *clearly* in the wrong here. There is a proper fix.

http://dpldocs.info/this-week-in-d/Blog.Posted_2019_11_25.html#unicode

Use the correct WriteConsoleW api instead of the ancient ascii api. WriteConsoleW works without changing any settings. (on old versions of Windows, you may have to install fonts to display it, but new ones come with it all preinstalled).
December 22, 2019
On Sunday, 22 December 2019 at 06:11:13 UTC, moth wrote:
> is there any function i can call or setting i can adjust to get D to do the same, or do i have to wait for something to be fixed in the language / compiler itself?

It isn't the language/compiler per se, it is the library calling the wrong function. See the code in the link in my last email - if you call the Windows WriteConsoleW function directly it will do what you want. The rest of the surrounding code in the link is to handle conversions and pipes to files.

December 22, 2019
On 12/22/19 8:40 AM, Adam D. Ruppe wrote:
> On Sunday, 22 December 2019 at 06:25:42 UTC, rikki cattermole wrote:
>> Not a bug.
> 
> No, Phobos is *clearly* in the wrong here. There is a proper fix.

Phobos doesn't call the wrong function, libc does. Phobos uses fwrite for output.

> http://dpldocs.info/this-week-in-d/Blog.Posted_2019_11_25.html#unicode

You need to address that in DMC. I wonder, does MSVCRT have the same problem?

-Steve
December 22, 2019
On Sunday, 22 December 2019 at 18:41:16 UTC, Steven Schveighoffer wrote:
> Phobos doesn't call the wrong function, libc does. Phobos uses fwrite for output.

There is allegedly a way to set fwrite to do the translations on MSVCRT:
https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/setmode?view=vs-2019

but trying it here it throws invalid parameter exception so idk.

Regardless, I'm pretty well of the opinion that fwrite is the wrong thing to do anyway. fwrite writes bytes to a file, but we want to write strings to the console. There's other functions that do that.

There is the worry of mixing stuff from C and keeping the buffer consistent, but it could always just flush() before doing its thing too. Or maybe even merge the buffers, idk what the MS runtime supports for that.

or maybe i'm missing something and _setmode is a viable solution.


But whatever we do, passing the buck isn't solving anything. Windows has supported Unicode console output since NT 4.0 in 1996.. just have to call the right function, and whether it is Phobos calling it or druntime or the CRT, someone just needs to do it!
December 22, 2019
On 12/22/19 5:04 PM, Adam D. Ruppe wrote:
> On Sunday, 22 December 2019 at 18:41:16 UTC, Steven Schveighoffer wrote:
>> Phobos doesn't call the wrong function, libc does. Phobos uses fwrite for output.
> 
> There is allegedly a way to set fwrite to do the translations on MSVCRT:
> https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/setmode?view=vs-2019 

Looks like you need to switch to "wprintf". I'm not sure, but I think we rely only on fwrite, for which there is no "w" equivalent.

> but trying it here it throws invalid parameter exception so idk.

Not surprised ;)

Here's a cool feature of Windows:

https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/fwide?view=vs-2019

Basically does nothing, all parameters ignored (and yes, we use this function in Phobos, assuming it does something).

But let me just say, the fact that there is some "mode" you have to set, like binary mode, that makes unicode work is unsettling. I hate libc streams...

> 
> Regardless, I'm pretty well of the opinion that fwrite is the wrong thing to do anyway. fwrite writes bytes to a file, but we want to write strings to the console. There's other functions that do that.

Preaching to the choir here. I wanted to rip out libc reliance a decade ago.

> There is the worry of mixing stuff from C and keeping the buffer consistent, but it could always just flush() before doing its thing too. Or maybe even merge the buffers, idk what the MS runtime supports for that.

This is the crux. Some people gotta have their printf. And if you do different types of buffered streams, the result even from single-threaded output looks like garbage. The only solution is to wrap FILE *. And I do mean only. I looked into trying to hook the buffers. There's no reliable way without knowing all the implementation details.

> or maybe i'm missing something and _setmode is a viable solution.

_setmode is on a file descriptor. That already is a red flag to me, as there are no file descriptors in the OS. Windows use handles. So this has some weird library "translation" happening underneath. Ugh.

> But whatever we do, passing the buck isn't solving anything. Windows has supported Unicode console output since NT 4.0 in 1996.. just have to call the right function, and whether it is Phobos calling it or druntime or the CRT, someone just needs to do it!

Hey, you can always just call the function yourself! Just make an output stream that writes with the right function, and then you can use formattedWrite instead of writef.

To fix Phobos, we just(!) need to remove libc as the underlying stream implementation.

I had at one point agreement from Walter to make a "backwards-compatible-ish" mechanism for file/streams. But it's not pretty, and was convoluted. At the time, I was struggling getting what would become iopipe to be usable on its own, and I eventually quit worrying about that aspect of it.

We have the basic building blocks with https://github.com/MartinNowak/io and https://github.com/schveiguy/iopipe. It would be cool to get this into Phobos, but it's a lot of work.

I bet Rust just skips libc altogether.

-Steve
December 23, 2019
On Sunday, 22 December 2019 at 22:47:43 UTC, Steven Schveighoffer wrote:
> To fix Phobos, we just(!) need to remove libc as the underlying stream implementation.
>
> I had at one point agreement from Walter to make a "backwards-compatible-ish" mechanism for file/streams. But it's not pretty, and was convoluted. At the time, I was struggling getting what would become iopipe to be usable on its own, and I eventually quit worrying about that aspect of it.
>
> We have the basic building blocks with https://github.com/MartinNowak/io and https://github.com/schveiguy/iopipe. It would be cool to get this into Phobos, but it's a lot of work.
>
> I bet Rust just skips libc altogether.
>
> -Steve
I don't have the ingenuity, intelligence, nor experience that many of you possess, but I have *a lot* of time on my hands for something like this. I assume I should start with std.stdio's source code and the aforementioned projects' source code, but some guidance on this would be very helpful, if not needed. D has been quite useful to me since I stumbled upon it, and I think it's time to give back in some way. (I'd do it financially, but I'm poor, haha) Anyway, if anybody wants to take me up on this offer, just let me know!
December 23, 2019
On Sun, Dec 22, 2019 at 10:04:20PM +0000, Adam D. Ruppe via Digitalmars-d-learn wrote: [...]
> Regardless, I'm pretty well of the opinion that fwrite is the wrong thing to do anyway. fwrite writes bytes to a file, but we want to write strings to the console. There's other functions that do that.
[...]

Would it make sense for std.stdio.write* (the package global functions, as opposed to File.write*) to use the Windows console output functions instead of proxying to libc?

Alternatively, we could change std.stdio.File to check if the current file descriptor is the console (fd == stdout && stdout == console, however you figure that out in Windows), and silently switch to the Windows console output functions instead of libc.  We *are* already wrapping libc's FILE*, why not wrap the Windows console output functions as well.

Mixing raw libc printf with std.stdio.write* is a bad idea anyway; do we really need to support that??  Though calling fflush(stdout) may not be amiss, just to alleviate sudden breakage and ensuing complaints.

And of course, this only applies to Windows. On Posix libc is pretty much still the standard way of working with console output.


T

-- 
VI = Visual Irritation
« First   ‹ Prev
1 2