Thread overview | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
February 21, 2005 Error: 4invalid UTF-8 sequence | ||||
---|---|---|---|---|
| ||||
Greetings! admire this complex piece of code: :-) import std.stdio; int main(char[][] args) { printf("josé" ~ "\n"); writefln("josé"); return (0); } when I try to compile it, I get, 16:21:30.97>dmd name.d name.d(4): invalid UTF-8 sequence name.d(5): invalid UTF-8 sequence The 50 cents question is, how can I get rid of it? The real reason is why I ask is that I am downloading a bunch of xml code and some of the names are accented by different languages and I am getting this error when I try to print (writefln) a variable with an accented name. However, an interesting outcome is that when I use printf, the above problem is not encountered. HUH! Thanks much! josé :-) |
February 21, 2005 Re: Error: 4invalid UTF-8 sequence | ||||
---|---|---|---|---|
| ||||
Posted in reply to jicman | DMD don't "understand" non-ASCII chars unless the source file is stored as UTF-8. Either it's a config setting in your editor that let's you do it, or you should change editor ASAP :) Note that converting non-UTF-8 files to UTF-8 might produce artifacts.
Lars Ivar Igesund
jicman wrote:
> Greetings!
>
> admire this complex piece of code: :-)
>
> import std.stdio;
> int main(char[][] args)
> {
> printf("josé" ~ "\n");
> writefln("josé");
> return (0);
> }
>
> when I try to compile it, I get,
>
> 16:21:30.97>dmd name.d
> name.d(4): invalid UTF-8 sequence
> name.d(5): invalid UTF-8 sequence
>
> The 50 cents question is, how can I get rid of it? The real reason is why I ask
> is that I am downloading a bunch of xml code and some of the names are accented
> by different languages and I am getting this error when I try to print
> (writefln) a variable with an accented name. However, an interesting outcome is
> that when I use printf, the above problem is not encountered. HUH!
>
> Thanks much!
>
> josé
>
> :-)
>
>
|
February 21, 2005 Re: Error: 4invalid UTF-8 sequence | ||||
---|---|---|---|---|
| ||||
Posted in reply to jicman | jicman wrote: > admire this complex piece of code: :-) > > import std.stdio; > int main(char[][] args) > { > printf("josé" ~ "\n"); > writefln("josé"); > return (0); > } > > when I try to compile it, I get, > > 16:21:30.97>dmd name.d > name.d(4): invalid UTF-8 sequence > name.d(5): invalid UTF-8 sequence Works For Me: josé josé > The 50 cents question is, how can I get rid of it? Save your file as UTF-8, and use an UTF-8 console... D *only* supports Unicode, not any legacy encodings. --anders |
February 21, 2005 Re: Error: 4invalid UTF-8 sequence | ||||
---|---|---|---|---|
| ||||
Posted in reply to jicman | On Mon, 21 Feb 2005 21:23:32 +0000 (UTC), jicman <jicman_member@pathlink.com> wrote: > Greetings! > > admire this complex piece of code: :-) > > import std.stdio; > int main(char[][] args) > { > printf("josé" ~ "\n"); > writefln("josé"); > return (0); > } > > when I try to compile it, I get, > > 16:21:30.97>dmd name.d > name.d(4): invalid UTF-8 sequence > name.d(5): invalid UTF-8 sequence > > The 50 cents question is, how can I get rid of it? The 50 cents answer is, ensure your editor is saving the source file as UTF-8, UTF-16 (with a BOM) or UTF-32 (also with a BOM). > The real reason is why I ask > is that I am downloading a bunch of xml code and some of the names are accented > by different languages and I am getting this error when I try to print > (writefln) a variable with an accented name. However, an interesting outcome is > that when I use printf, the above problem is not encountered. HUH! This is a somewhat complex area, and I'm not sure I have it 100% sorted myself, but I'll give this a go, I _know_ someone will set us both straight if I have it wrong. Things to consider/know: - D source files must be saved in UTF encoding. - on windows your console _might_ be in UTF, it might be in something else i.e. latin-1 - printf is a C function, it is oblivious to UTF etc. - writef is a D function, it ensures you're writing in UTF. So, what I suspect is happening to you is either: 1. You're reading these names from something which is not in UTF format. 2. Your source is not in UTF format. You might see odd results once you get it working, this will be due to your console not being in utf mode, I don't know how to change console modes, someone else will have to chip in here. Regan Regan |
February 21, 2005 Re: Error: 4invalid UTF-8 sequence | ||||
---|---|---|---|---|
| ||||
Posted in reply to Regan Heath | Regan Heath wrote: > - D source files must be saved in UTF encoding. One simple such UTF encoding is (escaped) ASCII: > import std.stdio; > int main(char[][] args) > { > printf("jos\u00e9\n"); > writefln("jos\u00e9"); > return (0); > } This source code will "work", even in ISO-8859-*... > - on windows your console _might_ be in UTF, it might be in something else i.e. latin-1 On Linux and other platforms, the console might also be in e.g. Latin-1. If you see something like "josé", then D does not like your console... Other languages, like C and Java for instance, support other encodings. But D only does Unicode, preferrably in the form of the UTF-8 encoding. On Linux and Mac OS X it is simple to set the console to UTF-8, and if someone could detail the steps needed on Windows that would be great ? I've heard some rumors that the "chcp 65001" command works on Win 2K... (although you might also have to change the default font being used ?) --anders |
February 21, 2005 Re: Error: 4invalid UTF-8 sequence | ||||
---|---|---|---|---|
| ||||
Posted in reply to Anders F Björklund | Anders F Björklund wrote:
> On Linux and Mac OS X it is simple to set the console to UTF-8, and if
> someone could detail the steps needed on Windows that would be great ?
>
> I've heard some rumors that the "chcp 65001" command works on Win 2K...
> (although you might also have to change the default font being used ?)
>
> --anders
Yep, the 65001 cp is the one for UTF-8. In addition, the console font must be UTF-8. AFAIK, none of the raster fonts work which leaves Lucida Console font as the only feasible alternative on my comp.
Lars Ivar Igesund
|
February 21, 2005 Re: Error: 4invalid UTF-8 sequence | ||||
---|---|---|---|---|
| ||||
Posted in reply to Anders F Björklund | Anders_F_Bj=F6rklund?= says...
>> The 50 cents question is, how can I get rid of it?
>
>Save your file as UTF-8, and use an UTF-8 console...
Ok, this is interesting Windows at its best! I have to completely retype that whole program! :-) Not a good thing.
Ok, thanks.
|
February 21, 2005 Re: Error: 4invalid UTF-8 sequence | ||||
---|---|---|---|---|
| ||||
Posted in reply to jicman | On Mon, 21 Feb 2005 22:27:04 +0000 (UTC), jicman <jicman_member@pathlink.com> wrote:
> Anders_F_Bj=F6rklund?= says...
>
>>> The 50 cents question is, how can I get rid of it?
>>
>> Save your file as UTF-8, and use an UTF-8 console...
>
> Ok, this is interesting Windows at its best! I have to completely retype that
> whole program! :-) Not a good thing.
What? Why? Can't you open it, then do a save-as, or copy/paste into another editor then do a save-as?
Regan
|
February 21, 2005 Re: Error: 4invalid UTF-8 sequence | ||||
---|---|---|---|---|
| ||||
Posted in reply to Regan Heath | In article <opsmkl5qje23k2f5@ally>, Regan Heath says...
>> Ok, this is interesting Windows at its best! I have to completely
>> retype that
>> whole program! :-) Not a good thing.
>
>What? Why? Can't you open it, then do a save-as, or copy/paste into another editor then do a save-as?
:-) I know exactly how you said that> :-)
Yes, I tried that. I even opened the same program with notepad (that's as Windows as Windows can get) and tried to compile it and got the same error. Somehow, my dual keyboard system does not like those accented vowels. I am now searching for a new editor. I love vim, but this is going too far.
Which freeware editors have d syntax hightliting?
I am downloading one called Zeus that a d lover had on his page.
thanks.
|
February 22, 2005 Re: Error: 4invalid UTF-8 sequence | ||||
---|---|---|---|---|
| ||||
Posted in reply to jicman | On Mon, 21 Feb 2005 23:39:37 +0000 (UTC), jicman <jicman_member@pathlink.com> wrote: > In article <opsmkl5qje23k2f5@ally>, Regan Heath says... > >>> Ok, this is interesting Windows at its best! I have to completely >>> retype that >>> whole program! :-) Not a good thing. >> >> What? Why? Can't you open it, then do a save-as, or copy/paste into >> another editor then do a save-as? > > :-) I know exactly how you said that> :-) :) > Yes, I tried that. I even opened the same program with notepad (that's as > Windows as Windows can get) and tried to compile it and got the same error. > Somehow, my dual keyboard system does not like those accented vowels. I am now > searching for a new editor. I love vim, but this is going too far. I have windows XP sp2, and... NotePad will save as: Unicode Unicode Big Endian UTF-8 (see "encoding" drop down in save-as dialog) WordPad will save as a "unicode document". I'm guessing that means UTF-16, hopefully with a BOM. (see "save as type" drop down in save-as dialog) > Which freeware editors have d syntax hightliting? > > I am downloading one called Zeus that a d lover had on his page. Try: http://www.prowiki.org/wiki4d/wiki.cgi?EditorSupport Regan |
Copyright © 1999-2021 by the D Language Foundation