ANSI - output with phobos - D Programming Language Discussion Forum

ANSI - output with phobos

Apr 04, 2007

Derek Parnell

Apr 04, 2007

Apr 04, 2007

Apr 04, 2007

Apr 04, 2007

Apr 04, 2007

Apr 05, 2007

Apr 05, 2007

for(char c = 0; c < c.max; c++) writefln(c); In a not too distant past the above code could produce the entire ANSI table, however this is not the case today. Today it peters out at 127 and any code beyond that cannot be desplayed. The error message produced is: Error: 4invalid UTF-8 sequence Please provide some guidance on how to accomplish this in present D. Thanks, Drew

me Wrote: > for(char c = 0; c < c.max; c++) > writefln(c); > > In a not too distant past the above code could produce the entire ANSI table, however this is not the case today. Today it peters out at 127 and any code beyond that cannot be desplayed. The error message produced is: > > Error: 4invalid UTF-8 sequence > > Please provide some guidance on how to accomplish this in present D. > > Thanks, > Drew First let me apologize for the double post. I am aware that printf() can still be used to achieve the desired result. However, I’m interested in accomplishing this through writef()/writefln(); Thanks again, Drew

On Tue, 03 Apr 2007 20:16:06 -0400, me wrote: > for(char c = 0; c < c.max; c++) > writefln(c); > > In a not too distant past the above code could produce the entire ANSI table, however this is not the case today. Today it peters out at 127 and any code beyond that cannot be desplayed. The error message produced is: > > Error: 4invalid UTF-8 sequence > > Please provide some guidance on how to accomplish this in present D. > Characters whose numeric representation is above 127 and less than 256, are not UTF-8 characters and the function 'writefln' expects 'char' values to be UTF-8. So, to do what you want, you must either not use writefln or not use 'char' types. import std.stdio; void main() { for(ubyte c = 0; c < c.max; c++) { if (c <= 127) writef("'%s' ", cast(char)c); writefln(c); } } -- Derek (skype: derek.j.parnell) Melbourne, Australia "Justice for David Hicks!" 4/04/2007 10:22:49 AM

On Tue, 03 Apr 2007 20:26:49 -0400, me wrote: > me Wrote: > >> for(char c = 0; c < c.max; c++) >> writefln(c); >> >> In a not too distant past the above code could produce the entire ANSI table, however this is not the case today. Today it peters out at 127 and any code beyond that cannot be desplayed. The error message produced is: >> >> Error: 4invalid UTF-8 sequence >> >> Please provide some guidance on how to accomplish this in present D. >> >> Thanks, >> Drew > You seem to be wanting to display the characters of the console's current code-page. > I am aware that printf() can still be used to achieve the desired result. Yes, because it's a C routine and not D. So I guess, the issue you are trying to resolve is how to convert code-page characters into UTF-8 form. Character values 128-255 are displayed on the Windows console using the console's current code-page to select the appropriate glyph. To get the same glyph to display using Unicode (which is the only character set that D supports) would mean that you have to set the console to a Unicode "code-page" and manually convert the character values from the code-page you were assuming, to the equivalent Unicode value. Not a trivial task at all. -- Derek (skype: derek.j.parnell) Melbourne, Australia "Justice for David Hicks!" 4/04/2007 11:05:14 AM

The problem is that the 'char' type can only contain valid UTF-8 *characters*. A character in UTF-8 can be composed of 1 to 4 *bytes*, and not all of the values a byte can take are valid in UTF-8. In fact, most of the byte values above 127 are not valid. You have two options: 1) use the wchar type (the Latin 1/ISO8859-1 character set is very similar to ANSI and all of its characters are 2 byte-wide when mapped to the UTF-16 character set); 2) manually convert the 'ANSI' value into UTF-8. For more information I suggest reading this: http://en.wikipedia.org/wiki/Utf-8 http://en.wikipedia.org/wiki/Utf-16 me wrote: > for(char c = 0; c < c.max; c++) > writefln(c); > > In a not too distant past the above code could produce the entire ANSI table, however this is not the case today. Today it peters out at 127 and any code beyond that cannot be desplayed. The error message produced is: > > Error: 4invalid UTF-8 sequence > > Please provide some guidance on how to accomplish this in present D. > > Thanks, > Drew

April 04, 2007

Re: ANSI - output with phobos

Posted by Daniel Keep
in reply to Juan Jose Comellas

Permalink

Daniel Keep

Posted in reply to Juan Jose Comellas

Permalink


Juan Jose Comellas wrote:
> The problem is that the 'char' type can only contain valid UTF-8 *characters*. A character in UTF-8 can be composed of 1 to 4 *bytes*, and not all of the values a byte can take are valid in UTF-8. In fact, most of the byte values above 127 are not valid. You have two options: 1) use the wchar type (the Latin 1/ISO8859-1 character set is very similar to ANSI and all of its characters are 2 byte-wide when mapped to the UTF-16 character set); 2) manually convert the 'ANSI' value into UTF-8.
> 
> For more information I suggest reading this:
> 
> http://en.wikipedia.org/wiki/Utf-8 http://en.wikipedia.org/wiki/Utf-16

Here's another one (shameless plug):

http://www.prowiki.org/wiki4d/wiki.cgi?DanielKeep/TextInD

	-- Daniel

> me wrote:
> 
>> for(char c = 0; c < c.max; c++)
>>     writefln(c);
>>
>> In a not too distant past the above code could produce the entire ANSI table, however this is not the case today. Today it peters out at 127 and any code beyond that cannot be desplayed. The error message produced is:
>>
>>   Error: 4invalid UTF-8 sequence
>>
>> Please provide some guidance on how to accomplish this in present D.
>>
>> Thanks,
>> Drew
> 

-- 
int getRandomNumber()
{
    return 4; // chosen by fair dice roll.
              // guaranteed to be random.
}

http://xkcd.com/

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/

me wrote: >> for(char c = 0; c < c.max; c++) >> writefln(c); >> >> In a not too distant past the above code could produce the entire ANSI table, however this is not the case today. Today it peters out at 127 and any code beyond that cannot be desplayed. The error message produced is: >> >> Error: 4invalid UTF-8 sequence >> >> Please provide some guidance on how to accomplish this in present D. >> > > > First let me apologize for the double post. > > I am aware that printf() can still be used to achieve the desired result. However, I知 interested in accomplishing this through writef()/writefln(); > Not possible. Just use the C library, writing a wrapper around it if you don't want to worry about whether strings are zero-terminated all the time. -- Remove ".doesnotlike.spam" from the mail address.

Juan Jose Comellas wrote: > The problem is that the 'char' type can only contain valid UTF-8 *characters*. A character in UTF-8 can be composed of 1 to 4 *bytes*, and not all of the values a byte can take are valid in UTF-8. In fact, most of the byte values above 127 are not valid. You have two options: 1) use the wchar type (the Latin 1/ISO8859-1 character set is very similar to ANSI and all of its characters are 2 byte-wide when mapped to the UTF-16 character set); 2) manually convert the 'ANSI' value into UTF-8. 3) Use ubyte (or use char, but be careful about what functions you pass non-UTF-8 chars to), and print using the C standard library. One might have to output a string without knowing its encoding, thus making it impossible to convert it to a UTF encoding reliably. -- Remove ".doesnotlike.spam" from the mail address.

Don Clugston wrote: > Deewiant wrote: >> One might have to output a string without knowing its encoding, thus >> making it >> impossible to convert it to a UTF encoding reliably. > > Then how can you know how to output it? Just pass the bytes to the console, and let the user worry about how it's displayed. If you write "\xe4" to a file, you expect the file to contain the byte 0xE4. If you write it in a console, the console should display the character in the current character set which 0xE4 is mapped to. -- Remove ".doesnotlike.spam" from the mail address.

Forums