[Issue 7689] New: splitter() on ivalid UTF-8 sequences

http://d.puremagic.com/issues/show_bug.cgi?id=7689 Summary: splitter() on ivalid UTF-8 sequences Product: D Version: D2 Platform: x86 OS/Version: Windows Status: NEW Severity: normal Priority: P2 Component: Phobos AssignedTo: nobody@puremagic.com ReportedBy: bearophile_hugs@eml.cc --- Comment #0 from bearophile_hugs@eml.cc 2012-03-11 14:07:42 PDT --- Is this difference/inconsistency between split() and splitter() desired and good? import std.string, std.array, std.algorithm, std.range; void main() { char[] s = cast(char[])[131, 64, 32, 251, 22]; assert(std.string.split(s).length == 2); // no error assert(walkLength(std.array.splitter(s)) == 2); // Invalid UTF-8 sequence assert(walkLength(std.algorithm.splitter(s)) == 2); // Invalid UTF-8 sequence } Output, DMD 2.059head: std.utf.UTFException@std\utf.d(645): Invalid UTF-8 sequence (at index 1) ---------------- ...\dmd2\src\phobos\std\array.d(469): dchar std.array.front!(char[]).front(char[]) ...\dmd2\src\phobos\std\algorithm.d(2110): D3std9algorithm47__T8splitterS28... ...\dmd2\src\phobos\std\range.d(971): D3std5range97__... ---------------- -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------

October 23, 2012

[Issue 7689] splitter() on ivalid UTF-8 sequences

Posted by monarchdodra@gmail.com
in reply to bearophile_hugs@eml.cc

Permalink

monarchdodra@gmail.com

Posted in reply to bearophile_hugs@eml.cc

Permalink

http://d.puremagic.com/issues/show_bug.cgi?id=7689


monarchdodra@gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
                 CC|                            |monarchdodra@gmail.com
         AssignedTo|nobody@puremagic.com        |monarchdodra@gmail.com


--- Comment #1 from monarchdodra@gmail.com 2012-10-22 23:06:22 PDT ---
(In reply to comment #0)
> Is this difference/inconsistency between split() and splitter() desired and
> good?
> 
> 
> import std.string, std.array, std.algorithm, std.range;
> void main() {
>     char[] s = cast(char[])[131, 64, 32, 251, 22];
>     assert(std.string.split(s).length == 2); // no error
>     assert(walkLength(std.array.splitter(s)) == 2); // Invalid UTF-8 sequence
>     assert(walkLength(std.algorithm.splitter(s)) == 2); // Invalid UTF-8
> sequence
> }
> 
> 
> Output, DMD 2.059head:
> 
> std.utf.UTFException@std\utf.d(645): Invalid UTF-8 sequence (at index 1)
> ----------------
> ...\dmd2\src\phobos\std\array.d(469): dchar
> std.array.front!(char[]).front(char[])
> ...\dmd2\src\phobos\std\algorithm.d(2110): D3std9algorithm47__T8splitterS28...
> ...\dmd2\src\phobos\std\range.d(971): D3std5range97__...
> ----------------

This is a bug in string.split (which is actually a public import of
array.split).

Currently array.split only supports ascii white, and is oblivious to longer utf whites (but it does work on unicode).

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Forums