Thread overview
Activating UTF-8 in Windows Console: CHCP
Dec 21, 2004
Simon Buchan
Re: Activating UTF-8 in Windows Console: CHCP - Or maybe not...
Dec 21, 2004
Simon Buchan
Dec 22, 2004
Roberto Mariottini
Dec 22, 2004
Simon Buchan
Dec 23, 2004
Geoff Hickey
December 21, 2004
The console command chcp can change the current console's codepage, meaning
chcp 65001 will tell the console to use UTF-8. Make sure you arn't using
raster fonts!
(/me makes "whee!" noises while running around in circles)

-- 
"Unhappy Microsoft customers have a funny way of becoming Linux,
Salesforce.com and Oracle customers." - www.microsoft-watch.com:
"The Year in Review: Microsoft Opens Up"
--
"I plan on at least one critical patch every month, and I haven't been disappointed."
- Adam Hansen, manager of security at Sonnenschein Nath & Rosenthal LLP
(Quote from http://www.eweek.com/article2/0,1759,1736104,00.asp)
--
"It's been a challenge to "reteach or retrain" Web users to pay for content, said Pizey"
-Wired website: "The Incredible Shrinking Comic"
December 21, 2004
On Tue, 21 Dec 2004 22:41:40 +1300, Simon Buchan <currently@no.where> wrote:

> The console command chcp can change the current console's codepage, meaning
> chcp 65001 will tell the console to use UTF-8. Make sure you arn't using
> raster fonts!
> (/me makes "whee!" noises while running around in circles)
>

Although it seems it is an error if you write when not using raster fonts?
i.e. It will write fine to raster fonts, then be fine when you change it
to lucidia console (and display it correctly!) but error out if it was written
while in lucidia console. WTF?

Gives "Unable to write to stream" after writing and simply ignores
special char's.
Anyone know what's going on?

-- 
"Unhappy Microsoft customers have a funny way of becoming Linux,
Salesforce.com and Oracle customers." - www.microsoft-watch.com:
"The Year in Review: Microsoft Opens Up"
--
"I plan on at least one critical patch every month, and I haven't been disappointed."
- Adam Hansen, manager of security at Sonnenschein Nath & Rosenthal LLP
(Quote from http://www.eweek.com/article2/0,1759,1736104,00.asp)
--
"It's been a challenge to "reteach or retrain" Web users to pay for content, said Pizey"
-Wired website: "The Incredible Shrinking Comic"
December 22, 2004
In article <opsjcqjqqhjccy7t@simon.mshome.net>, Simon Buchan says...
>
>The console command chcp can change the current console's codepage, meaning
>chcp 65001 will tell the console to use UTF-8. Make sure you arn't using
>raster fonts!
>(/me makes "whee!" noises while running around in circles)

This gives even strabger results: trying to writef an UTF-8 string terminates the program.

See the following transcript for details:

--------------------------------------------------------------------------
C:\Down\dlang>ver

Microsoft Windows XP [Versione 5.1.2600]

C:\Down\dlang>chcp 850
Tabella codici attiva: 850

C:\Down\dlang>type testUTF.d
´&#9559;&#9488;import std.stdio;
import std.c.stdio;
import std.c.windows.windows;

extern (Windows)
{
export BOOL CharToOemW(
LPCWSTR lpszSrc,  // string to translate
LPSTR lpszDst     // translated string
);
}

int main()
{
puts("-- untranslated --");
puts("&#9500;ñ&#9500;Â&#9500;&#9565;&#9500;ƒ&#9500;ä&#9500;û&#9500;£");
writef("&#9500;ñ&#9500;Â&#9500;&#9565;&#9500;ƒ&#9500;ä&#9500;û&#9500;£\n");

puts("-- translated --");
wchar[] mess = "&#9500;ñ&#9500;Â&#9500;&#9565;&#9500;ƒ&#9500;ä&#9500;û&#9500;£";
char[] OEMmess = new char[mess.length];
CharToOemW(mess, OEMmess);
puts(OEMmess);
writef(OEMmess);

return 0;
}
C:\Down\dlang>testUTF.exe
-- untranslated --
&#9500;ñ&#9500;Â&#9500;&#9565;&#9500;ƒ&#9500;ä&#9500;û&#9500;£
&#9500;ñ&#9500;Â&#9500;&#9565;&#9500;ƒ&#9500;ä&#9500;û&#9500;£
-- translated --
äöüßÄÖÜ
Error: invalid UTF-8 sequence

C:\Down\dlang>chcp 65001
Tabella codici attiva: 65001

C:\Down\dlang>type testUTF.d
&#65279;import std.stdio;
import std.c.stdio;
import std.c.windows.windows;

extern (Windows)
{
export BOOL CharToOemW(
LPCWSTR lpszSrc,  // string to translate
LPSTR lpszDst     // translated string
);
}

int main()
{
puts("-- untranslated --");
puts("äöüßÄÖÜ");
writef("äöüßÄÖÜ\n");

puts("-- translated --");
wchar[] mess = "äöüßÄÖÜ";
char[] OEMmess = new char[mess.length];
CharToOemW(mess, OEMmess);
puts(OEMmess);
writef(OEMmess);

return 0;
}
C:\Down\dlang>testUTF.exe
-- untranslated --
äöüßÄÖÜ

C:\Down\dlang>
--------------------------------------------------------------------------

Ciao


December 22, 2004
On Wed, 22 Dec 2004 13:46:27 +0000 (UTC), Roberto Mariottini <Roberto_member@pathlink.com> wrote:

> In article <opsjcqjqqhjccy7t@simon.mshome.net>, Simon Buchan says...
>>
>> The console command chcp can change the current console's codepage, meaning
>> chcp 65001 will tell the console to use UTF-8. Make sure you arn't using
>> raster fonts!
>> (/me makes "whee!" noises while running around in circles)
>
> This gives even strabger results: trying to writef an UTF-8 string terminates
> the program.
>
> See the following transcript for details:
>
> --------------------------------------------------------------------------
> C:\Down\dlang>ver
>
> Microsoft Windows XP [Versione 5.1.2600]
>
> C:\Down\dlang>chcp 850
> Tabella codici attiva: 850
>
> C:\Down\dlang>type testUTF.d
> �&#9559;&#9488;import std.stdio;
> import std.c.stdio;
> import std.c.windows.windows;
>
> extern (Windows)
> {
> export BOOL CharToOemW(
> LPCWSTR lpszSrc,  // string to translate
> LPSTR lpszDst     // translated string
> );
> }
>
> int main()
> {
> puts("-- untranslated --");
> puts("&#9500;�&#9500;�&#9500;&#9565;&#9500;�&#9500;�&#9500;�&#9500;�");
> writef("&#9500;�&#9500;�&#9500;&#9565;&#9500;�&#9500;�&#9500;�&#9500;�\n");
>
> puts("-- translated --");
> wchar[] mess = "&#9500;�&#9500;�&#9500;&#9565;&#9500;�&#9500;�&#9500;�&#9500;�";
> char[] OEMmess = new char[mess.length];
> CharToOemW(mess, OEMmess);
> puts(OEMmess);
> writef(OEMmess);
>
> return 0;
> }
> C:\Down\dlang>testUTF.exe
> -- untranslated --
> &#9500;�&#9500;�&#9500;&#9565;&#9500;�&#9500;�&#9500;�&#9500;�
> &#9500;�&#9500;�&#9500;&#9565;&#9500;�&#9500;�&#9500;�&#9500;�
> -- translated --
> �������
> Error: invalid UTF-8 sequence
>
> C:\Down\dlang>chcp 65001
> Tabella codici attiva: 65001
>
> C:\Down\dlang>type testUTF.d
> &#65279;import std.stdio;
> import std.c.stdio;
> import std.c.windows.windows;
>
> extern (Windows)
> {
> export BOOL CharToOemW(
> LPCWSTR lpszSrc,  // string to translate
> LPSTR lpszDst     // translated string
> );
> }
>
> int main()
> {
> puts("-- untranslated --");
> puts("�������");
> writef("�������\n");
>
> puts("-- translated --");
> wchar[] mess = "�������";
> char[] OEMmess = new char[mess.length];
> CharToOemW(mess, OEMmess);
> puts(OEMmess);
> writef(OEMmess);
>
> return 0;
> }
> C:\Down\dlang>testUTF.exe
> -- untranslated --
> �������
>
> C:\Down\dlang>
> --------------------------------------------------------------------------
>
> Ciao
>
>

See my above reply: I may have been too hasty...
This may have something to do with surrogates, etc...
Putting cmd.exe in raster fonts, running the program, then
changeing the font to lucidia displays the UTF correctly,
but changing it back results in corruption. WTF?

Does anyone know of a completly seperate to window command shell?
I suppose one could use MSYS (or MinGW?) or something...

-- 
"Unhappy Microsoft customers have a funny way of becoming Linux,
Salesforce.com and Oracle customers." - www.microsoft-watch.com:
"The Year in Review: Microsoft Opens Up"
--
"I plan on at least one critical patch every month, and I haven't been disappointed."
- Adam Hansen, manager of security at Sonnenschein Nath & Rosenthal LLP
(Quote from http://www.eweek.com/article2/0,1759,1736104,00.asp)
--
"It's been a challenge to "reteach or retrain" Web users to pay for content, said Pizey"
-Wired website: "The Incredible Shrinking Comic"
December 23, 2004
> 
> Does anyone know of a completly seperate to window command shell?
> I suppose one could use MSYS (or MinGW?) or something...
> 

4NT might work. You can download it here: ftp://jpsoft.com/4nt/. It's not free, but there's a trial download.

I'm not a 4NT user myself, but quite a few programmers I know are. I have no idea if it supports UTF-8. It does support Unicode, but that might just mean UTF-16.

- Geoff Hickey