December 09, 2011 [Issue 7085] New: std.algorithm.reverse() problem with Unicode dchar[] | ||||
---|---|---|---|---|
| ||||
http://d.puremagic.com/issues/show_bug.cgi?id=7085 Summary: std.algorithm.reverse() problem with Unicode dchar[] Product: D Version: D2 Platform: x86 OS/Version: Windows Status: NEW Severity: normal Priority: P2 Component: Phobos AssignedTo: nobody@puremagic.com ReportedBy: bearophile_hugs@eml.cc --- Comment #0 from bearophile_hugs@eml.cc 2011-12-09 01:32:52 PST --- This code compiles and runs raising no assert error, so reverse() is giving a wrong result on a dchar[]: import std.algorithm: reverse; void main() { dchar[] txt = "\U00000041\U00000308\U00000042"d.dup; txt.reverse(); assert(txt == "\U00000042\U00000308\U00000041"d); } txt contains LATIN CAPITAL LETTER A, COMBINING DIAERESIS, LATIN CAPITAL LETTER B. See bug 7084 for more details. A more correct output for reversing txt is (LATIN CAPITAL LETTER B, LATIN CAPITAL LETTER A, COMBINING DIAERESIS): "\U00000042\U00000041\U00000308"d or even (LATIN CAPITAL LETTER B, LATIN CAPITAL LETTER A WITH DIAERESIS) (but this changes the array size and it's not necessary): "\U00000042\U000000C4"d -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
December 09, 2011 [Issue 7085] std.algorithm.reverse() problem with Unicode dchar[] | ||||
---|---|---|---|---|
| ||||
Posted in reply to bearophile_hugs@eml.cc | http://d.puremagic.com/issues/show_bug.cgi?id=7085 Jonathan M Davis <jmdavisProg@gmx.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED CC| |jmdavisProg@gmx.com Resolution| |INVALID --- Comment #1 from Jonathan M Davis <jmdavisProg@gmx.com> 2011-12-09 02:36:00 PST --- No, this behavior is as-designed. You're misunderstanding dchars. A dchar is a UTF-32 code unit, which is then guaranteed to be a code point. When you reverse a range of dchar - be it a dchar[] or some other data structure - the code points are reversed. It doesn't take graphemes into account _at all_. If you want to reverse a string based an graphemes, you need to have a range of graphemes not a range of dchar. Phobos does not currently have support for a range of graphemes, which makes that quite a bit harder to do, but until then, all ranges of characters are ranges of dchar, and any function which operates on a range is going to treat them as ranges of dchar, not graphemes, so reverse is going to reverse code points, even if that's not what the programmer really wanted. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
Copyright © 1999-2021 by the D Language Foundation