Thread overview | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
March 04, 2007 [Issue 1024] New: invalid UTF-8 sequence for \u00B6 (¶) in comment | ||||
---|---|---|---|---|
| ||||
http://d.puremagic.com/issues/show_bug.cgi?id=1024 Summary: invalid UTF-8 sequence for \u00B6 (¶) in comment Product: D Version: 1.007 Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: DMD AssignedTo: bugzilla@digitalmars.com ReportedBy: benoit@tionex.de Having \u00b6 in a single line comment (//) gives the message. An editor correctly shows "¶". -- |
March 04, 2007 [Issue 1024] invalid UTF-8 sequence for \u00B6 (¶) in comment | ||||
---|---|---|---|---|
| ||||
Posted in reply to d-bugmail | http://d.puremagic.com/issues/show_bug.cgi?id=1024 ------- Comment #1 from fvbommel@wxs.nl 2007-03-04 15:41 ------- Was the encoding UTF-8? Did your file start with the appropriate BOM? (DMD requires a BOM to consider a file anything other than pure ASCII) Here's a test for you, try to reproduce this: --- urxae@urxae:~/tmp$ cat utf.d //¶ urxae@urxae:~/tmp$ hd utf.d 00000000 ef bb bf 2f 2f c2 b6 0a |...//...| 00000008 urxae@urxae:~/tmp$ dmd -c utf.d urxae@urxae:~/tmp$ --- The first command shows the contents of the file (apparently cat doesn't handle BOMs, it just sends it straight to the console; that's where the extra symbol comes from). The second shows the hexdump of the file. Note the 'ef bb bf' UTF-8 BOM, and the 'c2 b6' encoding of the '¶'. The third command shows DMD compiling the file successfully. See http://www.digitalmars.com/d/lex.html (under "Source Text") for the details on encodings accepted by DMD -- |
March 04, 2007 [Issue 1024] invalid UTF-8 sequence for \u00B6 (¶) in comment | ||||
---|---|---|---|---|
| ||||
Posted in reply to d-bugmail | http://d.puremagic.com/issues/show_bug.cgi?id=1024 ------- Comment #2 from fvbommel@wxs.nl 2007-03-04 15:45 ------- (In reply to comment #1) [snip] > //¶ [snip] > (apparently cat doesn't handle > BOMs, it just sends it straight to the console; that's where the extra symbol > comes from). And apparently somewhere along the line from bugzilla to the newsgroup message showing up in my Thunderbird, that character is stripped... (for anyone only reading this in the newsgroups trying to figure out what I was talking about: look at the bugzilla page) -- |
March 04, 2007 [Issue 1024] invalid UTF-8 sequence for \u00B6 (¶) in comment | ||||
---|---|---|---|---|
| ||||
Posted in reply to d-bugmail | http://d.puremagic.com/issues/show_bug.cgi?id=1024 benoit@tionex.de changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |INVALID ------- Comment #3 from benoit@tionex.de 2007-03-04 16:15 ------- in my thunderbird it shows up correctly. But I changed the standard encoding to utf8. You are rigth with the bom. I added a bom to my file, and it compiles. -- |
March 04, 2007 Re: [Issue 1024] invalid UTF-8 sequence for \u00B6 (¶) in comment | ||||
---|---|---|---|---|
| ||||
Posted in reply to d-bugmail | d-bugmail@puremagic.com wrote:
> ------- Comment #3 from benoit@tionex.de 2007-03-04 16:15 -------
> in my thunderbird it shows up correctly.
Just to be clear: I meant the extra character before the '//'.
|
March 04, 2007 [Issue 1024] invalid UTF-8 sequence for \u00B6 (¶) in comment | ||||
---|---|---|---|---|
| ||||
Posted in reply to d-bugmail | http://d.puremagic.com/issues/show_bug.cgi?id=1024 ------- Comment #4 from benoit@tionex.de 2007-03-04 16:41 ------- hm, i did forget to reactivate the code that produces the utf chars.... Now i can say, it also works without BOM, and my problem was a coding error in writing the file. char[] line; ... line ~= '\u00b6'; // This made the corrupt file content line ~= "\u00b6"; // So the file is written correctly // write the line into the file -- |
March 04, 2007 Re: [Issue 1024] invalid UTF-8 sequence for \u00B6 (¶) in comment | ||||
---|---|---|---|---|
| ||||
Posted in reply to Frits van Bommel | right, they are corrupted. |
March 06, 2007 Re: [Issue 1024] invalid UTF-8 sequence for \u00B6 (¶) in comment | ||||
---|---|---|---|---|
| ||||
Posted in reply to d-bugmail | d-bugmail@puremagic.com wrote: > http://d.puremagic.com/issues/show_bug.cgi?id=1024 > > > > > > ------- Comment #4 from benoit@tionex.de 2007-03-04 16:41 ------- > hm, i did forget to reactivate the code that produces the utf chars.... > > Now i can say, it also works without BOM, and my problem was a coding error in > writing the file. > > char[] line; > ... > line ~= '\u00b6'; // This made the corrupt file content > line ~= "\u00b6"; // So the file is written correctly > // write the line into the file > THAT's a known bug, issue 111 http://d.puremagic.com/issues/show_bug.cgi?id=111 |
Copyright © 1999-2021 by the D Language Foundation