Jump to page: 1 2
Thread overview
writef doesn't work on Windows XP console
Dec 01, 2004
Roberto Mariottini
Dec 01, 2004
Ben Hinkle
Dec 01, 2004
Roberto Mariottini
Dec 01, 2004
Stewart Gordon
Dec 01, 2004
Stewart Gordon
Dec 01, 2004
Stewart Gordon
Dec 01, 2004
Ben Hinkle
Dec 02, 2004
Roberto Mariottini
Dec 03, 2004
Ben Hinkle
Dec 04, 2004
kris
Dec 02, 2004
Roberto Mariottini
Dec 02, 2004
Roberto Mariottini
Dec 02, 2004
Regan Heath
December 01, 2004
Hi.
I can't make writef work on Windows XP using non-7bit-ASCII characters.

The attached test program outputs:

-- untranslated --
├ñ├Â├╝├ƒ├ä├û├£
├ñ├Â├╝├ƒ├ä├û├£
-- translated --
äöüßÄÖÜ
Error: invalid UTF-8 sequence

Test program:

import std.stdio;
import std.c.stdio;
import std.c.windows.windows;

extern (Windows)
{
  export BOOL CharToOemW(
    LPCWSTR lpszSrc,  // string to translate
    LPSTR lpszDst     // translated string
  );
}

int main()
{
   puts("-- untranslated --");
   puts("äöüßÄÖÜ");
   writef("äöüßÄÖÜ\n");

   puts("-- translated --");
   wchar[] mess = "äöüßÄÖÜ";
   char[] OEMmess = new char[mess.length];
   CharToOemW(mess, OEMmess);
   puts(OEMmess);
   writef(OEMmess);

   return 0;
}


December 01, 2004
"Roberto Mariottini" <Roberto_member@pathlink.com> wrote in message news:cok6li$1pkp$1@digitaldaemon.com...
> Hi.
> I can't make writef work on Windows XP using non-7bit-ASCII characters.
>
> The attached test program outputs:
>
> -- untranslated --
> &#9500;ñ&#9500;Â&#9500;&#9565;&#9500;f&#9500;ä&#9500;û&#9500;£
> &#9500;ñ&#9500;Â&#9500;&#9565;&#9500;f&#9500;ä&#9500;û&#9500;£
> -- translated --
> äöüßÄÖÜ
> Error: invalid UTF-8 sequence
>
> Test program:
>
> import std.stdio;
> import std.c.stdio;
> import std.c.windows.windows;
>
> extern (Windows)
> {
>  export BOOL CharToOemW(
>    LPCWSTR lpszSrc,  // string to translate
>    LPSTR lpszDst     // translated string
>  );
> }
>
> int main()
> {
>   puts("-- untranslated --");
>   puts("äöüßÄÖÜ");
>   writef("äöüßÄÖÜ\n");
>
>   puts("-- translated --");
>   wchar[] mess = "äöüßÄÖÜ";
>   char[] OEMmess = new char[mess.length];
>   CharToOemW(mess, OEMmess);
>   puts(OEMmess);
>   writef(OEMmess);
>
>   return 0;
> }
>
>

This is expected behavior. Writef takes utf-8 strings hence the error that the supplied string is not in utf-8 (because it isn't).


December 01, 2004
In article <coketl$25kv$1@digitaldaemon.com>, Ben Hinkle says...
>
[...]
>> int main()
>> {
>>   puts("-- untranslated --");
>>   puts("äöüßÄÖÜ");
>>   writef("äöüßÄÖÜ\n");
^^^^^^^^^^^^^^^^^^^^
>>   puts("-- translated --");
>>   wchar[] mess = "äöüßÄÖÜ";
>>   char[] OEMmess = new char[mess.length];
>>   CharToOemW(mess, OEMmess);
>>   puts(OEMmess);
>>   writef(OEMmess);
>>
>>   return 0;
>> }
>>
>>
>
>This is expected behavior. Writef takes utf-8 strings hence the error that the supplied string is not in utf-8 (because it isn't).

The first writef uses an UTF-8 string, but it doesn't print what expected. Either one should work, but both don't work.

Ciao


December 01, 2004
Ben Hinkle wrote:

>> I can't make writef work on Windows XP using non-7bit-ASCII characters.
>[...snip...]
> This is expected behavior. Writef takes utf-8 strings hence the error that the supplied string is not in utf-8 (because it isn't). 

Moral of the story being that 8-bit strings should be declared ubyte[].
Even if it makes you cast it to a pointer, before usage with C routines:

> ubyte[] OEMmess = new ubyte[mess.length];
> CharToOemW(mess, cast(LPSTR) OEMmess);
> puts(cast(char *) OEMmess);

The "char" type in C, is known as "byte" in D. Confusingly enough.
Like Ben says, the D char type only accepts valid UTF-8 code units...

--anders

PS. No, it doesn't help that the C routines are declared as (char *)
    when they really take (ubyte *) arguments. It's just as a shortcut
    to avoid having to translate the C function declarations to D...

    And of course, it also works just fine for ASCII-only strings.
    (a char[] can be directly converted to char *, iff it is ASCII)
    With non-US-ASCII characters, it doesn't work - as you've seen.
December 01, 2004
Roberto Mariottini wrote:

> The first writef uses an UTF-8 string, but it doesn't print what expected.
> Either one should work, but both don't work.

It works just fine, but you *have* to set your console to UTF-8.
D does *not* support consoles or shells which are not Unicode... :(

Simple example:
> import std.stdio;
> void main()
> {
>   writefln("äöüßÄÖÜ");
> }

In UTF-8 Terminal mode, this prints:
> äöüßÄÖÜ

In Latin-1 Terminal mode, you get:
> äöüÃÃÃÃ

I'm assuming it prints similar garbage on a non-Unicode XP console ?
(being a Mac user myself I have no idea how to change it on Windows)

--anders
December 01, 2004
Anders F Björklund wrote:
> Roberto Mariottini wrote:
> 
>> The first writef uses an UTF-8 string, but it doesn't print what expected.
>> Either one should work, but both don't work.
> 
> It works just fine, but you *have* to set your console to UTF-8.
> D does *not* support consoles or shells which are not Unicode... :(
<snip>

A while back I suggested writing some classes to do text file I/O, which would have conversion capabilities built in.

http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/6089

I guess it would extend to console I/O.

Stewart.

-- 
My e-mail is valid but not my primary mailbox.  Please keep replies on the 'group where everyone may benefit.
December 01, 2004
Stewart Gordon wrote:

> A while back I suggested writing some classes to do text file I/O, which would have conversion capabilities built in.
> 
> http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/6089
> 
> I guess it would extend to console I/O.

Sounds like a good idea. I have some very small encoding additions...
(just a lookup for each supported charset, without entire icu/iconv)

http://www.algonet.se/~afb/d/mapping.zip

And I think it should use char[] instead of dchar/dchar[], but that's
rather minor (and it should probably overload all three string types)

--anders
December 01, 2004
Anders F Björklund wrote:
> Stewart Gordon wrote:
> 
>> A while back I suggested writing some classes to do text file I/O, which would have conversion capabilities built in.
>>
>> http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/6089
<snip>
> And I think it should use char[] instead of dchar/dchar[], but that's
> rather minor (and it should probably overload all three string types)

I wrote that, but then discovered that the 'norm' (if Phobos is anything to go by) is for strings to be manipulated as UTF-8, while dchar gets used for individual characters.  Maybe it should 'normally' use char[].  After all, that's the most compact for text in alphabets below U+0800.

But you're probably right that it should overload the lot.

Stewart.

-- 
My e-mail is valid but not my primary mailbox.  Please keep replies on the 'group where everyone may benefit.
December 01, 2004
Stewart Gordon wrote:

>> And I think it should use char[] instead of dchar/dchar[], but that's
>> rather minor (and it should probably overload all three string types)
> 
> I wrote that, but then discovered that the 'norm' (if Phobos is anything to go by) is for strings to be manipulated as UTF-8, while dchar gets used for individual characters.  Maybe it should 'normally' use char[]. After all, that's the most compact for text in alphabets below U+0800.

Or one can do like Java and use wchar[] and wchar, and ignore the bloat
for ASCII strings - and hack in support for surrogates some other way...

Most of the consoles mentioned only support old 16-bit Unicode anyway ?

> But you're probably right that it should overload the lot.

It's the D way :-)

--anders
December 01, 2004
"Stewart Gordon" <smjg_1998@yahoo.com> wrote in message news:cokn8o$2id0$1@digitaldaemon.com...
> Anders F Björklund wrote:
> > Roberto Mariottini wrote:
> >
> >> The first writef uses an UTF-8 string, but it doesn't print what
> >> expected.
> >> Either one should work, but both don't work.
> >
> > It works just fine, but you *have* to set your console to UTF-8.
> > D does *not* support consoles or shells which are not Unicode... :(
> <snip>
>
> A while back I suggested writing some classes to do text file I/O, which would have conversion capabilities built in.
>
> http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/6089

Note std.stream now has BOM support. Call readBOM or writeBOM in
EndianStream. Now that you mention it it might be nice to make another
Stream subclass and add support for the "native" encodings. It sounds fun -
I'll give it a shot. It should be pretty easy actually since you just
override writeString and writeStringW to call some OS function to convert
the string or char from utf to native encoding.
Supporting arbitrary encodings would probably be left for non-phobos
libraries since they would presumably require something like ICU or
libiconv. So basically what I have in mind is that to write to stdout with
native encoding you'd have to write

import std.stream;
...
stdoutn = NativeTextStream(stdout);
stdoutn.writef(<some utf encoded string>);

-Ben


« First   ‹ Prev
1 2