November 19, 2004 Re: Character encoding problem | ||||
---|---|---|---|---|
| ||||
Posted in reply to Ben Hinkle | Ben Hinkle wrote:
<snip>
> Are you sure your command window is set to use UTF-8? On Windows I think you
> change it by going to the "Regional Settings" control panel.
In Windows 98, a command prompt is still a plain old MS-DOS window. As such, it can't possibly use UTF-8, as this would break the essential one-to-one mapping between bytes and on-screen character positions.
I don't know how different this really is in Windows 2000/XP....
Stewart.
|
November 19, 2004 Re: Character encoding problem | ||||
---|---|---|---|---|
| ||||
Posted in reply to Ilya Minkov | Am Fri, 19 Nov 2004 17:03:36 +0100 schrieb Ilya Minkov <minkov@cs.tum.edu>: >> Are you sure your command window is set to use UTF-8? On Windows I think you >> change it by going to the "Regional Settings" control panel. > > That doesn't matter - or rather i think there is nothing to configure. The problem is, he misuses Mozilla for something wrong. He should rather use a programmer's editor which supports UTF-8, for example SciTE. In this example, also go to File -> Encoding -> UTF-8. I've just downloaded SciTE and have done what you suggested. I admit that using Mozilla for encoding issues is not very elegant. SciTE doesn't change anything, though. I still get garbage. By the way, I there a D plugin for SciTE? > The output will be another problem - either multi-character garbage (C functions) or automatically converted to local codepage (D native Unicode functions) |
November 19, 2004 Re: Character encoding problem | ||||
---|---|---|---|---|
| ||||
Posted in reply to Stewart Gordon | Am Fri, 19 Nov 2004 16:02:17 +0000 schrieb Stewart Gordon <smjg_1998@yahoo.com>: > You can include MS-DOS characters in a string, but only as escape codes. In your case (assuming your code page is 437, 850, 852, 853 or 857): > > puts("\x84\x94\x81\xE1\x8E\x99\x9A"); > > Since the whole point of this is for outputting to MS-DOS, you could argue that this is appropriate use of non-Unicode characters in a string. Yep, that works. Maybe this is a more portable (encoded as UTF-8): import std.c.stdio; int main() { version(Win32) puts("\x84\x94\x81\xE1\x8E\x99\x9A"); else puts("äöüßÄÖÜ"); return 0; } What do you think?! |
November 19, 2004 [PATCH] Re: Character encoding problem | ||||
---|---|---|---|---|
| ||||
Posted in reply to Thomas Kuehne |
Here a patch that enables GDC-0.8 and DMD-0.106 to handle UTF-8/16/32 with and without bom.
Thomas
--- gdc-0.8/d/dmd/module.c 2004-10-02 19:19:31.000000000 +0200
+++ gdc-0.8d/d/dmd/module.c 2004-11-19 19:19:09.522419400 +0100
@@ -241,6 +241,7 @@
* EF BB BF UTF-8
*/
+ int haveNoBom=0;
if (buf[0] == 0xFF && buf[1] == 0xFE)
{
if (buflen >= 4 && buf[2] == 0 && buf[3] == 0)
@@ -257,6 +258,7 @@
fatal();
}
+ pu-=haveNoBom;
dbuf.reserve(buflen / 4);
while (++pu < pumax)
{ unsigned u;
@@ -292,6 +294,7 @@
fatal();
}
+ pu-=haveNoBom;
dbuf.reserve(buflen / 2);
while (++pu < pumax)
{ unsigned u;
@@ -354,6 +357,8 @@
* figure out the encoding.
*/
+ haveNoBom=1;
+
if (buflen >= 4)
{ if (buf[1] == 0 && buf[2] == 0 && buf[3] == 0)
{ // UTF-32LE
Thomas Kuehne schrieb am Fri, 19 Nov 2004 14:19:33 +0000 (UTC):
>>> Let's try to track down the real problem.
>>>
>>> change the string into "\u00E2\u00F6\u00FC\u00DF" (ae)(oe)(ue)(ss).
>>>
>>> If the output is still garbage try printf instead of puts.
>>
>>I've tested the above string. The result for both puts and printf is that either it doesn't compile or it outputs garbage:
>>
>>MS-DOS/Western (ISO-8859-1), UTF-16, UTF-8
>>compile fine but output garbage under MS-DOS
>>(Windows 98 SE, German edition)
>
> Clearly seems to be a shell problem.
>
>>Unicode (UTF-16 and UTF-32, each with Big Endian and Little Endian):
>>(1) "semicolon expected, not '.'"
>>(1) no identifier for declarator
>
> This is a known problem. If you use UTF-16/32 without a BOM(byte order mark) the current dmd assumes UTF-8 and subsequently fails.
>
> http://svn.kuehne.cn/dstress/www/dstress.html#encoding_utf_16be http://svn.kuehne.cn/dstress/www/dstress.html#encoding_utf_16le http://svn.kuehne.cn/dstress/www/dstress.html#encoding_utf_32be http://svn.kuehne.cn/dstress/www/dstress.html#encoding_utf_32le
|
November 19, 2004 Re: [PATCH] Re: Character encoding problem | ||||
---|---|---|---|---|
| ||||
Posted in reply to Thomas Kuehne |
Thomas Kuehne schrieb am Fri, 19 Nov 2004 19:26:25 +0100:
>>>Unicode (UTF-16 and UTF-32, each with Big Endian and Little Endian):
>>>(1) "semicolon expected, not '.'"
>>>(1) no identifier for declarator
>>
>> This is a known problem. If you use UTF-16/32 without a BOM(byte order mark) the current dmd assumes UTF-8 and subsequently fails.
The real problem was that it removed the bytes of the not existing BOM.
Thomas
|
November 19, 2004 Re: Character encoding problem | ||||
---|---|---|---|---|
| ||||
Posted in reply to Mathias Bierschenk | "Mathias Bierschenk" <Mathias.Bierschenk@web.de> a écrit dans le message de news: opshp0d1h29gaiaw@dialin-145-254-035-176.arcor-ip.net... > By the way, I there a D plugin for SciTE? You'll find it there http://www.prowiki.org/wiki4d/wiki.cgi?EditorSupport#SciTE |
November 19, 2004 Re: Character encoding problem | ||||
---|---|---|---|---|
| ||||
Posted in reply to Valéry Croizier | Am Fri, 19 Nov 2004 22:08:56 +0100 schrieb Valéry Croizier <valery@freesurf.fr>:
>> By the way, I there a D plugin for SciTE?
>
> You'll find it there
> http://www.prowiki.org/wiki4d/wiki.cgi?EditorSupport#SciTE
Thanks!
|
November 19, 2004 Re: Character encoding problem | ||||
---|---|---|---|---|
| ||||
Posted in reply to Mathias Bierschenk | "Mathias Bierschenk" <Mathias.Bierschenk@web.de> wrote in message news:opshpm3zlo9gaiaw@dialin-212-144-051-051.arcor-ip.net... > How can I print German characters? I've tried the following simple program: > > import std.c.stdio; > > int main() > { > puts("äöüßÄÖÜ"); // German characters > > return 0; > } > > As the normal MS-DOS EDIT encoding didn't work (Windows 98 SE, German edition) I tried Mozilla to save the source code file with different character encodings but none worked as expected. Here's what I tried using the current DMD version: > > MS-DOS encoding as performed by Microsoft's EDIT editor: Using Microsoft Notepad, click on "Save As" and under encoding, select "UTF-8". Then, use std.stdio.writef() instead of std.c.stdio.puts(), and it should work. |
November 19, 2004 Re: Character encoding problem | ||||
---|---|---|---|---|
| ||||
Posted in reply to Mathias Bierschenk | "Mathias Bierschenk" <Mathias.Bierschenk@web.de> escribió en el mensaje news:opshpm3zlo9gaiaw@dialin-212-144-051-051.arcor-ip.net... | How can I print German characters? I've tried the following simple program: | | import std.c.stdio; | | int main() | { | puts("äöüßÄÖÜ"); // German characters | | return 0; | } | | As the normal MS-DOS EDIT encoding didn't work (Windows 98 SE, German | edition) I tried Mozilla to save the source code file with different | character encodings but none worked as expected. Here's what I tried using | the current DMD version: | | MS-DOS encoding as performed by Microsoft's EDIT editor: | (5) "invalid UTF-sequence" | | Western (ISO-8859-1): | (5) "invalid UTF-sequence" | | Unicode (UTF-16 and UTF-32, each with Big Endian and Little Endian): | (1) "semicolon expected, not '.'" | (1) no identifier for declarator | | Unicode (UTF-16 and UTF-8): | both compile fine but output garbage under MS-DOS | (Windows 98 SE, German edition) I was investigating the same thing recently. What I really wanted was a Windows console that did Unicode, but I couldn't find it. But I came across to some C++ program which allows you to output UTF-16 strings (wchar * in C++ on Windows). Translated to D, the program was like this: import std.file; import std.string; import std.utf; import win32.winbase; import win32.wincon; import win32.winnls; void main () { wchar [] tmp_w = toUTF16(cast(char[])"carlos andrés"); wchar * szwOut = tmp_w; DWORD dwBytesWritten; DWORD fdwMode; HANDLE outHandle = GetStdHandle(STD_OUTPUT_HANDLE); if( (GetFileType(outHandle) & FILE_TYPE_CHAR) && GetConsoleMode( outHandle, &fdwMode) ) WriteConsoleW( outHandle, szwOut, wcslen(szwOut), &dwBytesWritten, null); else { int nOutputCP = GetConsoleOutputCP(); //int charCount = WideCharToMultiByte(nOutputCP, 0, szwOut, -1, null, 0, null, null); //char* szaStr = new char[charCount]; //WideCharToMultiByte( nOutputCP, 0, szwOut, -1, szaStr, charCount, null, null); char [] tmp = toUTF8(tmp_w); char * szaStr = toMBSz(tmp); int charCount = tmp.length; WriteFile(outHandle, szaStr, charCount-1, &dwBytesWritten, null); } } It uses Y Tomino's Win32 headers. The encoding how it's saved doesn't seem to matter. I really don't remember where I found the original, so you can use this code as you want since it's not mine. For linux, I don't think there's any problem since it goes UTF-8 by default (at least with RedHat based distros, in my experience). BTW, if someone knows about a Unicode console for Windows, please let me know. ----------------------- Carlos Santander Bernal |
November 19, 2004 Re: Character encoding problem | ||||
---|---|---|---|---|
| ||||
Posted in reply to Mathias Bierschenk | Mathias Bierschenk schrieb:
> I've just downloaded SciTE and have done what you suggested. I admit that using Mozilla for encoding issues is not very elegant. SciTE doesn't change anything, though. I still get garbage.
Ah, i missed out that you are through to getting garbage. :) Well, i'll see what can be wrong. In general, non-NT Windows has not been largely considered in the Phobos implementation, because these Windows versions are not very Unicode compatible.
-eye
|
Copyright © 1999-2021 by the D Language Foundation