Jump to page: 1 2
Thread overview
Error: 4invalid UTF-8 sequence
Feb 21, 2005
jicman
Feb 21, 2005
Lars Ivar Igesund
Feb 21, 2005
jicman
Feb 21, 2005
Regan Heath
Feb 21, 2005
jicman
Feb 22, 2005
Regan Heath
Feb 22, 2005
Lars Ivar Igesund
Feb 22, 2005
jicman
Feb 24, 2005
Charles Hixson
Feb 21, 2005
Regan Heath
Feb 21, 2005
Lars Ivar Igesund
February 21, 2005
Greetings!

admire this complex piece of code: :-)

import std.stdio;
int main(char[][] args)
{
printf("josé" ~ "\n");
writefln("josé");
return (0);
}

when I try to compile it, I get,

16:21:30.97>dmd name.d
name.d(4): invalid UTF-8 sequence
name.d(5): invalid UTF-8 sequence

The 50 cents question is, how can I get rid of it?  The real reason is why I ask is that I am downloading a bunch of xml code and some of the names are accented by different languages and I am getting this error when I try to print (writefln) a variable with an accented name.  However, an interesting outcome is that when I use printf, the above problem is not encountered.  HUH!

Thanks much!

josé

:-)


February 21, 2005
DMD don't "understand" non-ASCII chars unless the source file is stored as UTF-8. Either it's a config setting in your editor that let's you do it, or you should change editor ASAP :) Note that converting non-UTF-8 files to UTF-8 might produce artifacts.

Lars Ivar Igesund

jicman wrote:
> Greetings!
> 
> admire this complex piece of code: :-)
> 
> import std.stdio;
> int main(char[][] args)
> {
> printf("josé" ~ "\n");
> writefln("josé");
> return (0);
> }
> 
> when I try to compile it, I get,
> 
> 16:21:30.97>dmd name.d
> name.d(4): invalid UTF-8 sequence
> name.d(5): invalid UTF-8 sequence
> 
> The 50 cents question is, how can I get rid of it?  The real reason is why I ask
> is that I am downloading a bunch of xml code and some of the names are accented
> by different languages and I am getting this error when I try to print
> (writefln) a variable with an accented name.  However, an interesting outcome is
> that when I use printf, the above problem is not encountered.  HUH!
> 
> Thanks much!
> 
> josé 
> 
> :-)
> 
> 
February 21, 2005
jicman wrote:

> admire this complex piece of code: :-)
> 
> import std.stdio;
> int main(char[][] args)
> {
> printf("josé" ~ "\n");
> writefln("josé");
> return (0);
> }
> 
> when I try to compile it, I get,
> 
> 16:21:30.97>dmd name.d
> name.d(4): invalid UTF-8 sequence
> name.d(5): invalid UTF-8 sequence

Works For Me:
josé
josé

> The 50 cents question is, how can I get rid of it?

Save your file as UTF-8, and use an UTF-8 console...

D *only* supports Unicode, not any legacy encodings.

--anders
February 21, 2005
On Mon, 21 Feb 2005 21:23:32 +0000 (UTC), jicman <jicman_member@pathlink.com> wrote:
> Greetings!
>
> admire this complex piece of code: :-)
>
> import std.stdio;
> int main(char[][] args)
> {
> printf("josé" ~ "\n");
> writefln("josé");
> return (0);
> }
>
> when I try to compile it, I get,
>
> 16:21:30.97>dmd name.d
> name.d(4): invalid UTF-8 sequence
> name.d(5): invalid UTF-8 sequence
>
> The 50 cents question is, how can I get rid of it?

The 50 cents answer is, ensure your editor is saving the source file as UTF-8, UTF-16 (with a BOM) or UTF-32 (also with a BOM).

> The real reason is why I ask
> is that I am downloading a bunch of xml code and some of the names are accented
> by different languages and I am getting this error when I try to print
> (writefln) a variable with an accented name.  However, an interesting outcome is
> that when I use printf, the above problem is not encountered.  HUH!

This is a somewhat complex area, and I'm not sure I have it 100% sorted myself, but I'll give this a go, I _know_ someone will set us both straight if I have it wrong.

Things to consider/know:

- D source files must be saved in UTF encoding.
- on windows your console _might_ be in UTF, it might be in something else i.e. latin-1
- printf is a C function, it is oblivious to UTF etc.
- writef is a D function, it ensures you're writing in UTF.

So, what I suspect is happening to you is either:

1. You're reading these names from something which is not in UTF format.
2. Your source is not in UTF format.

You might see odd results once you get it working, this will be due to your console not being in utf mode, I don't know how to change console modes, someone else will have to chip in here.

Regan

Regan
February 21, 2005
Regan Heath wrote:

> - D source files must be saved in UTF encoding.

One simple such UTF encoding is (escaped) ASCII:

> import std.stdio;
> int main(char[][] args)
> {
>   printf("jos\u00e9\n");
>   writefln("jos\u00e9");
>   return (0);
> }

This source code will "work", even in ISO-8859-*...


> - on windows your console _might_ be in UTF, it might be in something else  i.e. latin-1

On Linux and other platforms, the console might also be in e.g. Latin-1.
If you see something like "josé", then D does not like your console...

Other languages, like C and Java for instance, support other encodings.
But D only does Unicode, preferrably in the form of the UTF-8 encoding.


On Linux and Mac OS X it is simple to set the console to UTF-8, and if
someone could detail the steps needed on Windows that would be great ?

I've heard some rumors that the "chcp 65001" command works on Win 2K...
(although you might also have to change the default font being used ?)

--anders
February 21, 2005
Anders F Björklund wrote:

> On Linux and Mac OS X it is simple to set the console to UTF-8, and if
> someone could detail the steps needed on Windows that would be great ?
> 
> I've heard some rumors that the "chcp 65001" command works on Win 2K...
> (although you might also have to change the default font being used ?)
> 
> --anders

Yep, the 65001 cp is the one for UTF-8. In addition, the console font must be UTF-8. AFAIK, none of the raster fonts work which leaves Lucida Console font as the only feasible alternative on my comp.

Lars Ivar Igesund
February 21, 2005
Anders_F_Bj=F6rklund?= says...

>> The 50 cents question is, how can I get rid of it?
>
>Save your file as UTF-8, and use an UTF-8 console...

Ok, this is interesting Windows at its best!  I have to completely retype that whole program! :-)  Not a good thing.

Ok, thanks.



February 21, 2005
On Mon, 21 Feb 2005 22:27:04 +0000 (UTC), jicman <jicman_member@pathlink.com> wrote:
> Anders_F_Bj=F6rklund?= says...
>
>>> The 50 cents question is, how can I get rid of it?
>>
>> Save your file as UTF-8, and use an UTF-8 console...
>
> Ok, this is interesting Windows at its best!  I have to completely retype that
> whole program! :-)  Not a good thing.

What? Why? Can't you open it, then do a save-as, or copy/paste into another editor then do a save-as?

Regan
February 21, 2005
In article <opsmkl5qje23k2f5@ally>, Regan Heath says...

>> Ok, this is interesting Windows at its best!  I have to completely
>> retype that
>> whole program! :-)  Not a good thing.
>
>What? Why? Can't you open it, then do a save-as, or copy/paste into another editor then do a save-as?

:-)  I know exactly how you said that> :-)

Yes, I tried that.  I even opened the same program with notepad (that's as Windows as Windows can get) and tried to compile it and got the same error. Somehow, my dual keyboard system does not like those accented vowels.  I am now searching for a new editor.  I love vim, but this is going too far.

Which freeware editors have d syntax hightliting?

I am downloading one called Zeus that a d lover had on his page.

thanks.


February 22, 2005
On Mon, 21 Feb 2005 23:39:37 +0000 (UTC), jicman <jicman_member@pathlink.com> wrote:
> In article <opsmkl5qje23k2f5@ally>, Regan Heath says...
>
>>> Ok, this is interesting Windows at its best!  I have to completely
>>> retype that
>>> whole program! :-)  Not a good thing.
>>
>> What? Why? Can't you open it, then do a save-as, or copy/paste into
>> another editor then do a save-as?
>
> :-)  I know exactly how you said that> :-)

:)

> Yes, I tried that.  I even opened the same program with notepad (that's as
> Windows as Windows can get) and tried to compile it and got the same error.
> Somehow, my dual keyboard system does not like those accented vowels.  I am now
> searching for a new editor.  I love vim, but this is going too far.

I have windows XP sp2, and...

NotePad will save as:
  Unicode
  Unicode Big Endian
  UTF-8

(see "encoding" drop down in save-as dialog)

WordPad will save as a "unicode document". I'm guessing that means UTF-16, hopefully with a BOM. (see "save as type" drop down in save-as dialog)

> Which freeware editors have d syntax hightliting?
>
> I am downloading one called Zeus that a d lover had on his page.

Try:
http://www.prowiki.org/wiki4d/wiki.cgi?EditorSupport

Regan
« First   ‹ Prev
1 2