November 04, 2014 [Issue 13686] New: Reading unicode string with readf ("%s") produces a wrong string | ||||
---|---|---|---|---|
| ||||
https://issues.dlang.org/show_bug.cgi?id=13686 Issue ID: 13686 Summary: Reading unicode string with readf ("%s") produces a wrong string Product: D Version: D2 Hardware: x86_64 OS: Windows Status: NEW Severity: enhancement Priority: P1 Component: DMD Assignee: nobody@puremagic.com Reporter: gassa@mail.ru The following code does not correctly handle Unicode strings. ----- import std.stdio; void main () { string s; readf ("%s", &s); writeln (s.length); write (s); } ----- Example input ("Test." in cyrillic): ----- Тест. ----- (hex: D0 A2 D0 B5 D1 81 D1 82 2E 0D 0A) That is 11 bytes (with '\n'=CR/LF being two bytes on Windows). Example output: ----- 18 ТеÑÑ. ----- (hex: C3 90 C2 A2 C3 90 C2 B5 C3 91 C2 81 C3 91 C2 82 2E 0D 0A) The second line is 19 bytes (again with '\n'=CR/LF being two bytes on Windows). The reported length (18 counting '\n' as one character - instead of the expected length of 10) ensures that the problem is in reading, not in writing. Here, the input bytes are handled separately: D0 -> C3 90, A2 -> C2 A2, etc. On the bright side, reading the file with readln works properly. Relevant discussion: http://forum.dlang.org/thread/rblxsxrdhjtkmxugyvrf@forum.dlang.org -- |
Copyright © 1999-2021 by the D Language Foundation