Thread overview | |||||
---|---|---|---|---|---|
|
October 08, 2011 [Issue 6791] New: std.algorithm.splitter random indexes utf strings | ||||
---|---|---|---|---|
| ||||
http://d.puremagic.com/issues/show_bug.cgi?id=6791 Summary: std.algorithm.splitter random indexes utf strings Product: D Version: D2 Platform: Other OS/Version: All Status: NEW Severity: normal Priority: P2 Component: Phobos AssignedTo: nobody@puremagic.com ReportedBy: dawg@dawgfoto.de --- Comment #0 from dawg@dawgfoto.de 2011-10-07 22:51:09 PDT --- Throws an UTFException. string s = `là dove terminava quella valle`; foreach(word; std.array.splitter(s)) writeln(word); --- The second UTF-8 code point of 'à' is 0xA0 for which isWhite is true. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
August 19, 2013 [Issue 6791] std.algorithm.splitter random indexes utf strings | ||||
---|---|---|---|---|
| ||||
Posted in reply to dawg@dawgfoto.de | http://d.puremagic.com/issues/show_bug.cgi?id=6791 hsteoh@quickfur.ath.cx changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |hsteoh@quickfur.ath.cx --- Comment #1 from hsteoh@quickfur.ath.cx 2013-08-18 22:22:41 PDT --- This is caused by struct SplitterResult in std.algorithm using array slicing and array indexing to pass char (not dchar!) to the lambda. SplitterResult appears to have multiple issues: it uses array slicing without a proper signature constraint on hasSlicing, and doesn't work properly for narrow strings because it uses indexing which for narrow strings doesn't handle multibyte UTF-8 sequences properly. It appears to be wanting a rewrite that uses only forward range primitives, or at least, an overload for narrow strings that properly take multibyte characters into account. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
August 19, 2013 [Issue 6791] std.algorithm.splitter random indexes utf strings | ||||
---|---|---|---|---|
| ||||
Posted in reply to dawg@dawgfoto.de | http://d.puremagic.com/issues/show_bug.cgi?id=6791 monarchdodra@gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |monarchdodra@gmail.com AssignedTo|nobody@puremagic.com |monarchdodra@gmail.com --- Comment #2 from monarchdodra@gmail.com 2013-08-18 23:25:05 PDT --- (In reply to comment #1) > This is caused by struct SplitterResult in std.algorithm using array slicing and array indexing to pass char (not dchar!) to the lambda. SplitterResult appears to have multiple issues: it uses array slicing without a proper signature constraint on hasSlicing, and doesn't work properly for narrow strings because it uses indexing which for narrow strings doesn't handle multibyte UTF-8 sequences properly. > > It appears to be wanting a rewrite that uses only forward range primitives, or at least, an overload for narrow strings that properly take multibyte characters into account. I had submitted a correction for this about 1 year ago, but it ended up being too big in scope (*all* splitter flavors have bugs). It also ended up being messy due to (trying to avoid) code duplication. It might be better to just fix things little by little though, rather than not at all. I'll fix *just* "splitter!pred": It's the easiest to fix. We'll see where we go from there. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
Copyright © 1999-2021 by the D Language Foundation