writef crashes on international string output - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » Issues » writef crashes on international string output

Thread overview

writef crashes on international string output

Jan 28, 2005

Jan 28, 2005

Ouch! It is a dmd parsing bug too.
Jan 29, 2005 Dr.Dizel
Jan 29, 2005 Anders F Björklund
Jan 30, 2005 Dr.Dizel
Jan 30, 2005 Sebastian Beschke
Jan 30, 2005 Anders F Björklund
Jan 30, 2005 Benjamin Herr
Jan 30, 2005 Sebastian Beschke
Jan 30, 2005 Benjamin Herr
Jan 30, 2005 Sebastian Beschke
Jan 30, 2005 Benjamin Herr
Jan 30, 2005 Anders F Björklund
Jan 30, 2005 Benjamin Herr
Feb 01, 2005 Dr.Dizel

January 28, 2005

writef crashes on international string output

Posted by Dr.Dizel

Dr.Dizel

Writef crashes on international (russian) string output not UTF but generic.

January 28, 2005

Re: writef crashes on international string output

Posted by Thomas Kuehne
in reply to Dr.Dizel

Thomas Kuehne

Posted in reply to Dr.Dizel

Dr.Dizel schrieb in news:ctea06$k6q$1@digitaldaemon.com...
> Writef crashes on international (russian) string output not UTF but generic.

plattform?
OS?
compiler version?
sample string?
what shell?

Thomas

January 29, 2005

Ouch! It is a dmd parsing bug too.

Posted by Dr.Dizel
in reply to Thomas Kuehne

Dr.Dizel

Posted in reply to Thomas Kuehne

In article <cteamj$ku0$1@digitaldaemon.com>, Thomas Kuehne says...

>Dr.Dizel schrieb in news:ctea06$k6q$1@digitaldaemon.com...
>> Writef crashes on international (russian) string output not UTF but generic.

Ouch! It is a dmd parsing bug.

I cannot write source files on my national language not identifiers but for example just simple strings for output. If I do so dmd cannot parse they in any encoding: ANSI, OEM, KOI8R ... except UTF-16. If I use UTF-16 dmd do strange codepage conversions. However, I need to write and print my strings on Russian!

Examples with DOS codepage (866):
------------------------------------
import std.stdio;

int main(char[][] args)
{
char[] hello_on_russian	= "Ïðèâåò, ìèð!";

return 0;
}

C:\dmd\bin>dmd helloworld.d
helloworld.d(6): invalid UTF-8 sequence
helloworld.d(6): invalid UTF-8 sequence
helloworld.d(6): invalid UTF-8 sequence
helloworld.d(6): invalid UTF-8 sequence
helloworld.d(6): invalid UTF-8 sequence
helloworld.d(6): invalid UTF-8 sequence
helloworld.d(6): invalid UTF-8 sequence
helloworld.d(6): invalid UTF-8 sequence
helloworld.d(6): invalid UTF-8 sequence
--------------------------------------------------
import std.stdio;

int main(char[][] args)
{
char[] hello_on_russian	= `Ïðèâåò, ìèð!`;	// backquotes here
writef(hello_on_russian);

return 0;
}

C:\dmd\bin>dmd helloworld.d
C:\dmd\bin\..\..\dm\bin\link.exe helloworld,,,user32+kernel32/noi;

C:\dmd\bin>helloworld
Error: invalid UTF-8 sequence
------------------------------------
import std.stdio;

int main(char[][] args)
{
char[] hello_on_russian	= `Ïðèâåò, ìèð!`;	// backquotes here
printf(hello_on_russian);

return 0;
}

C:\dmd\bin>helloworld
Ïðèâåò, ìèð!

Old printf way is good.

I think other parts of dmd library have some bugs in national language strings parsing.

P.S. I use Windows XP and dmd version is 0.111.

January 29, 2005

Re: Ouch! It is a dmd parsing bug too.

Posted by Anders F Björklund
in reply to Dr.Dizel

Anders F Björklund

Posted in reply to Dr.Dizel

Dr.Dizel wrote:

> Ouch! It is a dmd parsing bug.

It's not a dmd bug, but a limitation by design...

> I cannot write source files on my national language not identifiers but for
> example just simple strings for output. If I do so dmd cannot parse they in any
> encoding: ANSI, OEM, KOI8R ... except UTF-16. If I use UTF-16 dmd do strange
> codepage conversions. However, I need to write and print my strings on Russian! 

D *only* supports Unicode (UTF-8, UTF-16, UTF-32)

This means:
1) Your source code must be in UTF-8
2) Your console input must be UTF-8
3) Your console output will be UTF-8

Otherwise you *will* get errors such as
"invalid UTF-8 sequence" or wrong output.

However, Unicode does have full support
for Russian / Kyrillic - and so does D.

This means that if you want to run D programs on an unsupported console,
you need to cast and change encoding on the char[] before input/output.

The input you get will be in ubyte[], in the local encoding, and can be
converted to wchar[] with a lookup table... Similarly, you can convert
your char[] to an ubyte[] for output by using the reverse of that table.
The lookup table, "wchar[256] mapping", is different for each encoding.

I can post some sample code, if wanted ?

You can also use routines from the Windows API, to convert to and from
the current console code page. They should be somewhere in D, as well.

--anders

PS.
Lookup from codepage 866 (ubyte) to unicode (wchar) can be found at:
http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/PC/CP866.TXT

January 30, 2005

Re: Ouch! It is a dmd parsing bug too.

Posted by Dr.Dizel
in reply to Anders F Björklund

Dr.Dizel

Posted in reply to Anders F Björklund

In article <ctgl26$4jh$1@digitaldaemon.com>, =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= says...
>
>>Dr.Dizel wrote:
>> Ouch! It is a dmd parsing bug.
>
>It's not a dmd bug, but a limitation by design...
>

Then backquotes in my example destroy this design.
Why I can use only English strings but cannot others? Is it tyranny of US? :-)

>> I cannot write source files on my national language not identifiers but for example just simple strings for output. If I do so dmd cannot parse they in any encoding: ANSI, OEM, KOI8R ... except UTF-16. If I use UTF-16 dmd do strange codepage conversions. However, I need to write and print my strings on Russian!
>
>D *only* supports Unicode (UTF-8, UTF-16, UTF-32)

However backquotes ...

>This means:
>1) Your source code must be in UTF-8
>2) Your console input must be UTF-8
>3) Your console output will be UTF-8

Where did you see such console? Which programs can use it? Is it sferic horse in
vacuum? :-)
If module std.stdio has no any input, how can I do it? Is it codepage safe?
How can I input from and output to none UTF console?
Is it a big problem or difficult thing to use dmd for programs, which use
multilanguage envieroment?

>Otherwise you *will* get errors such as
>"invalid UTF-8 sequence" or wrong output.
>
>However, Unicode does have full support
>for Russian / Kyrillic - and so does D.
>
>
>This means that if you want to run D programs on an unsupported console, you need to cast and change encoding on the char[] before input/output.

How can I do so: char[] can hold only UTF-8 chars and writef cannot output other codepages (see my example)?

>The input you get will be in ubyte[], in the local encoding, and can be converted to wchar[] with a lookup table... Similarly, you can convert your char[] to an ubyte[] for output by using the reverse of that table. The lookup table, "wchar[256] mapping", is different for each encoding.

How can I output ubyte[] with writef?

>I can post some sample code, if wanted ?

Yes.

In addition, developers must rename char to utf8 because it is not real char and wchar to utf16 and dchar to utf32. Char must store any char from 0x00 to 0xFF.

January 30, 2005

Re: Ouch! It is a dmd parsing bug too.

Posted by Sebastian Beschke
in reply to Dr.Dizel

Sebastian Beschke

Posted in reply to Dr.Dizel

Dr.Dizel schrieb:
> In article <ctgl26$4jh$1@digitaldaemon.com>,
> =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= says...
> 
>>>Dr.Dizel wrote:
>>>Ouch! It is a dmd parsing bug.
>>
>>It's not a dmd bug, but a limitation by design...
>>
> 
> 
> Then backquotes in my example destroy this design.
> Why I can use only English strings but cannot others? Is it tyranny of US? :-)

Funny one gets accused as a tyrant when using the most liberal and general encoding available... ;)

> 
> 
>>>I cannot write source files on my national language not identifiers but for
>>>example just simple strings for output. If I do so dmd cannot parse they in any
>>>encoding: ANSI, OEM, KOI8R ... except UTF-16. If I use UTF-16 dmd do strange
>>>codepage conversions. However, I need to write and print my strings on Russian! 
>>
>>D *only* supports Unicode (UTF-8, UTF-16, UTF-32)
> 
> 
> However backquotes ...

You oughta make sure your text editor saves the source code correctly. If you wish to use UTF-16 or UTF-32, be sure that there is a Byte Order Mark at the start of the file.

I use jEdit and save files in UTF-8, which works fine.

> 
> 
>>This means:
>>1) Your source code must be in UTF-8
>>2) Your console input must be UTF-8
>>3) Your console output will be UTF-8
> 
> 
> Where did you see such console? Which programs can use it? Is it sferic horse in
> vacuum? :-)

I guess your best bet currently would be to not use the console, sad as that is. Alternatively, you might use something like iconv, but I have no idea if it's available for D.

How does Russian console input work, anyway? I'd be interested in that ^^

> 
> In addition, developers must rename char to utf8 because it is not real char and
> wchar to utf16 and dchar to utf32. Char must store any char from 0x00 to 0xFF.

This has been up for discussion a lot of times, actually. IMHO, it doesn't really matter what you call them; the docs state clearly enough  what they *are*.

-Sebastian

January 30, 2005

Re: Ouch! It is a dmd parsing bug too.

Posted by Anders F Björklund
in reply to Dr.Dizel

Anders F Björklund

Posted in reply to Dr.Dizel

Dr.Dizel wrote:

> Why I can use only English strings but cannot others? Is it tyranny of US? :-)

On the contrary, you can now use a lot more than just Western languages.

>>This means:
>>1) Your source code must be in UTF-8

This implies that your text editor must also be able to handle UTF-8.

>>2) Your console input must be UTF-8
>>3) Your console output will be UTF-8
> 
> Where did you see such console? Which programs can use it?

Linux has one. Mac OS X has one. I hope Windows XP can get one...

> If module std.stdio has no any input, how can I do it? Is it codepage safe?
> How can I input from and output to none UTF console?
> Is it a big problem or difficult thing to use dmd for programs,
> which use multilanguage envieroment?

Non-UTF consoles are unsupported, but it can still be done.

> How can I do so: char[] can hold only UTF-8 chars and writef cannot output other
> codepages (see my example)?

Yes.

> How can I output ubyte[] with writef?

That I am not 100% sure of, since I used printf instead.
writef works just fine for Unicode, but not for 8-bit...

>>I can post some sample code, if wanted ?
> 
> Yes.

See http://www.algonet.se/~afb/d/mapping.zip
Haven't added CP866, but CP437 is there for reference ?

Note: There are better version of this, for Windows only.
(maybe some one else can post a version using Win32 API ?)

> In addition, developers must rename char to utf8 because it is not real char and
> wchar to utf16 and dchar to utf32. Char must store any char from 0x00 to 0xFF.

The "char" type in D is, by definition, a UTF-8 type. Holding 0x00-0x7F,
and all different types of Unicode characters by using up to char[4]...

To store any so called character, from 0x00-0xFF, you *need* ubyte.
Note: The "real char", if we are talking C/C++, is called "byte" in D.

--anders

January 30, 2005

Re: Ouch! It is a dmd parsing bug too.

Posted by Benjamin Herr
in reply to Anders F Björklund

Benjamin Herr

Posted in reply to Anders F Björklund

Anders F Björklund wrote:
> Linux has one. Mac OS X has one. I hope Windows XP can get one...

Michael Walter has demonstrated that the WinXP console is indeed capable of UTF-8: <http://ilfirin.org/unicode.png>

January 30, 2005

Re: Ouch! It is a dmd parsing bug too.

Posted by Sebastian Beschke
in reply to Benjamin Herr

Sebastian Beschke

Posted in reply to Benjamin Herr

Benjamin Herr schrieb:
> <http://ilfirin.org/unicode.png>

OMG, don't open the homepage!

January 30, 2005

Re: Ouch! It is a dmd parsing bug too.

Posted by Benjamin Herr
in reply to Sebastian Beschke

Benjamin Herr

Posted in reply to Sebastian Beschke

Sebastian Beschke schrieb:
> Benjamin Herr schrieb:
> 
>> <http://ilfirin.org/unicode.png>
> 
> 
> OMG, don't open the homepage!

Sorry if I offended you :(

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation