Jump to page: 1 2
Thread overview
Displaying non UTF-8 8 bit character codes with writefln()
Oct 04, 2007
Graham
Oct 04, 2007
Regan Heath
Oct 04, 2007
Graham
Oct 05, 2007
Stewart Gordon
Oct 05, 2007
Graham
Oct 05, 2007
Stewart Gordon
Oct 05, 2007
Regan Heath
Oct 05, 2007
Stewart Gordon
Oct 05, 2007
Regan Heath
Oct 05, 2007
Stewart Gordon
Oct 04, 2007
Graham
Oct 04, 2007
Aziz K.
October 04, 2007
Is there an easy way of displaying non UTF-8 8 bit codes with writefln() ?

E.g. code like:

writefln("elapsed time %.9f \µS", elapsed_time);

On a windows system displays output like:

elapsed time 2.598202392 µS

(displayed when running in a cmd.exe window)

The µ is character codes 0xC2 0xB5 for the UTF-8 encoding of µ.

Code like:
writefln("elapsed time %.9f \u00B5S", elapsed_time);

displays the same

and code like:
writefln("elapsed time %.9f \xB5S", elapsed_time);

understandably displays the run-time error:
Error: 4invalid UTF-8 sequence

trying a Wysiwyg string like: writefln("elapsed time %.9f " r"µ" "S", elapsed_time);

displays a compiler error: invalid UTF-8 sequence

Is there any simple way to output a non UTF-8 string containing the B5 character code without the C2 prefix ?


October 04, 2007
Try printf and saving the file as a UTF-8 encoded text file...

--[b5.d]--
import std.stdio;

void main()
{
	printf("\µ\n");
	printf("\u00B5\n");
	printf("\xB5\n");  //doesn't output anything
	writefln("µ");
}

Using this source saved as b5.d as a UTF-8 encoded text file (IMPORTANT)  I can set my command prompt font to "Lucida Console" and execute the following commands:

E:\D\src\tmp>chcp 65001
Active code page: 65001

E:\D\src\tmp>dmd -run b5.d
µ
µ
µ

The 3rd printf doesn't output anything, not sure why, the others all output the same character.

chcp 65001 changes to UTF-8 code page :)

Regan
October 04, 2007
After searching back a bit further than before I see this was discussed in April and the answer was to use printf for the 8 bit string.

something like:

writef("elapsed time %.9f", elapsed_time);
printf(" \xB5S\n");

does work, but if anybody has a more elegant solution please let me know.

October 04, 2007
Regan Heath Wrote:

> Try printf and saving the file as a UTF-8 encoded text file...
> 
> --[b5.d]--
> import std.stdio;
> 
> void main()
> {
> 	printf("\µ\n");
> 	printf("\u00B5\n");
> 	printf("\xB5\n");  //doesn't output anything
> 	writefln("µ");
> }
> 
> Using this source saved as b5.d as a UTF-8 encoded text file (IMPORTANT)
>   I can set my command prompt font to "Lucida Console" and execute the
> following commands:
> 
> E:\D\src\tmp>chcp 65001
> Active code page: 65001
> 
> E:\D\src\tmp>dmd -run b5.d
> µ
> µ
> µ
> 
> The 3rd printf doesn't output anything, not sure why, the others all output the same character.
> 
> chcp 65001 changes to UTF-8 code page :)
> 
> Regan

Thanks, I was hoping for something more elegant but if all char variables in phobos have to be UTF-8 I guess this is the only way.
October 04, 2007
Graham wrote:
> After searching back a bit further than before I see this was discussed
> in April and the answer was to use printf for the 8 bit string.
>
> something like:
>
> writef("elapsed time %.9f", elapsed_time);
> printf(" \xB5S\n");
>
> does work, but if anybody has a more elegant solution please let me know.
>
Hi,

There's a better solution. You could switch to the Tango librabry which uses WriteConsoleW() internally to correctly write Unicode characters on the Windows console.

Regards,
Aziz
October 05, 2007
"Regan Heath" <regan@netmail.co.nz> wrote in message news:fe2uf5$2gsa$1@digitalmars.com...
> Try printf and saving the file as a UTF-8 encoded text file...

Why, exactly, are you advocating going back to the printf abomination?

<snip>
> Using this source saved as b5.d as a UTF-8 encoded text file (IMPORTANT) I can set my command prompt font to "Lucida Console" and execute the following commands:
>
> E:\D\src\tmp>chcp 65001
> Active code page: 65001
<snip>

This misses the point slightly.  The user shouldn't have to change the codepage just to get someone else's application to work properly.

What you want is my utility library:
http://pr.stewartsplace.org.uk/d/sutil/

Stewart.

-- 
My e-mail address is valid but not my primary mailbox.  Please keep replies on the 'group where everybody may benefit. 

October 05, 2007
Stewart Gordon Wrote:
> 
> What you want is my utility library: http://pr.stewartsplace.org.uk/d/sutil/
> 
> Stewart.
> 
> -- 

Thanks, that's nice.

By the way, I spotted some minor errors on a couple of your documentation pages:

ConsoleOutput referring to ConsoleInput in second column on http://pr.stewartsplace.org.uk/d/sutil/ref/annotated.html

and the subtitle on http://pr.stewartsplace.org.uk/d/sutil/ref/classsmjg_1_1libs_1_1util_1_1console_1_1ConsoleOutput.html is ConsoleInput instead of ConsoleOutput

October 05, 2007
Stewart Gordon wrote:
> "Regan Heath" <regan@netmail.co.nz> wrote in message news:fe2uf5$2gsa$1@digitalmars.com...
>> Try printf and saving the file as a UTF-8 encoded text file...
> 
> Why, exactly, are you advocating going back to the printf abomination?

Well.. there were 2 ways to solve his problem:

1. avoid the valid utf-8 cahracter check.
2. make the console display utf-8 correctly.

To achive #1 you've gotta use printf, eg.
  printf("%c\n", 230);

To achive #2 you use chcp and lucida console, eg.
  writefln("\u00B5");

or save the file as UTF-8 and use
  writefln("µ");

> <snip>
>> Using this source saved as b5.d as a UTF-8 encoded text file (IMPORTANT) I can set my command prompt font to "Lucida Console" and execute the following commands:
>>
>> E:\D\src\tmp>chcp 65001
>> Active code page: 65001
> <snip>
> 
> This misses the point slightly.  The user shouldn't have to change the codepage just to get someone else's application to work properly.

Sadly, if the application is outputting UTF-8 you don't have a choice.

> What you want is my utility library:
> http://pr.stewartsplace.org.uk/d/sutil/

Cool.  You're converting UTF-8 to the console code page I assume.

Regan
October 05, 2007
"Regan Heath" <regan@netmail.co.nz> wrote in message news:fe5d88$15l$1@digitalmars.com...
<snip>
> 1. avoid the valid utf-8 cahracter check.
> 2. make the console display utf-8 correctly.
>
> To achive #1 you've gotta use printf, eg.
>   printf("%c\n", 230);

No I gottan't.  I could use putchar, puts or OutputStream.writeString for example.

<snip>
>> This misses the point slightly.  The user shouldn't have to change the codepage just to get someone else's application to work properly.
>
> Sadly, if the application is outputting UTF-8 you don't have a choice.

But how many DOS or Windows console apps in the real world output UTF-8? Presumably not many, considering that no versions of DOS and only a few versions of Windows support it.  There's also a causal loop in that even modern Windows versions don't come with the console code page set to 65001 by default.  I don't know what is likely to break this loop, but I doubt that the restrictiveness of one language's standard library is going to do it.

>> What you want is my utility library:
>> http://pr.stewartsplace.org.uk/d/sutil/
>
> Cool.  You're converting UTF-8 to the console code page I assume.

Exactly.  (Well, as exactly as is possible under the constraints.)

Stewart.

-- 
My e-mail address is valid but not my primary mailbox.  Please keep replies on the 'group where everybody may benefit. 

October 05, 2007
"Graham >" <GC <grahamc001uk@nospam-yahoo.co.uk> wrote in message news:fe5cp5$bp$1@digitalmars.com...
<snip>
> By the way, I spotted some minor errors on a couple of your documentation pages:
>
> ConsoleOutput referring to ConsoleInput in second column on
> http://pr.stewartsplace.org.uk/d/sutil/ref/annotated.html
>
> and the subtitle on
> http://pr.stewartsplace.org.uk/d/sutil/ref/classsmjg_1_1libs_1_1util_1_1console_1_1ConsoleOutput.html
> is ConsoleInput instead of ConsoleOutput

Good catch.  Also noticed quite a few cases where the automatic removal of words like "The ConsoleInput class" in the brief description hasn't worked.

Stewart.

-- 
My e-mail address is valid but not my primary mailbox.  Please keep replies on the 'group where everybody may benefit. 

« First   ‹ Prev
1 2