Thread overview |
---|
May 16, 2007 [Issue 1235] New: std.string.tolower() fails on certain utf8 characters | ||||
---|---|---|---|---|
| ||||
http://d.puremagic.com/issues/show_bug.cgi?id=1235 Summary: std.string.tolower() fails on certain utf8 characters Product: D Version: unspecified Platform: All OS/Version: All Status: NEW Severity: minor Priority: P2 Component: Phobos AssignedTo: bugzilla@digitalmars.com ReportedBy: d@chqrlie.org import std.string; int main(char[][] args) { printf("tolower(\"\\u0130e\") -> \"%.*s\"\n", tolower("\u0130e")); return 0; } produces incorrect output: tolower("\u0130e") -> "i e" Bug comes from erroneous code in phobos/std/string.d line 843: if (r.length != i + j) r = r[0 .. i + j]; Turkish dotted capital I (U+0130) is correctly converted to ASCII i (u+0069). But converted character does not use the same number of bytes as original character. The code above is therefore incorrect. As far as I understand the implementation, it could be removed completely. A similar issue is present in toupper(), with the additional twist that conversion to uppercase should not be special cased for the ASCII subset in the Turkish Locale. Additionally, non ASCII code is triggered by if (c >= 0x7F) where it should be if (c > 0x7F). -- |
June 29, 2007 [Issue 1235] std.string.tolower() fails on certain utf8 characters | ||||
---|---|---|---|---|
| ||||
Posted in reply to d-bugmail | http://d.puremagic.com/issues/show_bug.cgi?id=1235 ------- Comment #1 from bugzilla@digitalmars.com 2007-06-28 22:57 ------- I agree, with the exception that for UTF characters, there is no such thing as a locale. So the toupper("i") cannot be set to \u0130. -- |
July 01, 2007 [Issue 1235] std.string.tolower() fails on certain utf8 characters | ||||
---|---|---|---|---|
| ||||
Posted in reply to d-bugmail | http://d.puremagic.com/issues/show_bug.cgi?id=1235 bugzilla@digitalmars.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from bugzilla@digitalmars.com 2007-07-01 14:03 ------- Fixed DMD 1.018 and DMD 2.002 -- |
Copyright © 1999-2021 by the D Language Foundation