Thread overview
[Issue 5977] New: String splitting with empty separator
May 10, 2011
http://d.puremagic.com/issues/show_bug.cgi?id=5977

           Summary: String splitting with empty separator
           Product: D
           Version: D2
          Platform: x86
        OS/Version: Windows
            Status: NEW
          Keywords: patch
          Severity: normal
          Priority: P2
         Component: Phobos
        AssignedTo: nobody@puremagic.com
        ReportedBy: bearophile_hugs@eml.cc


--- Comment #0 from bearophile_hugs@eml.cc 2011-05-10 16:30:55 PDT ---
This D2 program seems to go in infinte loop (dmd 2.053beta):


import std.string;
void main() {
    split("a test", "");
}

------------------------

My suggestion is to add code like this in std.array.split():

if (delim.length == 0)
    return split(s);

This means that en empty splitting string is like splitting on generic whitespace. This is useful in code like:

auto foo(string txt, string delim="") {
    return txt.split(delim);
}


This means that calling foo with no arguments splits txt on whitespace, otherwise splits on the given string. This allows to use the two forms of split in foo() without if conditions. This is done in Python too, where None is used instead of an empty string.


The modified split is something like (there is a isSomeString!S2 because are special, they aren't generic arrays, splitting on whitespace is meaningful for strings only):


Unqual!(S1)[] split(S1, S2)(S1 s, S2 delim)
if (isForwardRange!(Unqual!S1) && isForwardRange!S2)
{
    Unqual!S1 us = s;
    if (isSomeString!S2 && delim.length == 0)
    {
        return split(s);
    }
    else
    {
        auto app = appender!(Unqual!(S1)[])();
        foreach (word; std.algorithm.splitter(us, delim))
        {
            app.put(word);
        }
        return app.data;
    }
}


Beside this change, I presume std.algorithm.splitter() too needs to test for an
empty delim.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
September 25, 2011
http://d.puremagic.com/issues/show_bug.cgi?id=5977



--- Comment #1 from bearophile_hugs@eml.cc 2011-09-25 08:16:21 PDT ---
Alternative: throw an ArgumentError("delim argument is empty") exception if
delim is empty.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
October 22, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=5977


monarchdodra@gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |daniel350@bigpond.com


--- Comment #2 from monarchdodra@gmail.com 2012-10-22 02:42:42 PDT ---
*** Issue 8551 has been marked as a duplicate of this issue. ***

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
October 22, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=5977


monarchdodra@gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
                 CC|                            |monarchdodra@gmail.com
         AssignedTo|nobody@puremagic.com        |monarchdodra@gmail.com


--- Comment #3 from monarchdodra@gmail.com 2012-10-22 02:52:16 PDT ---
(In reply to comment #0)
> This D2 program seems to go in infinte loop (dmd 2.053beta):
> 
> 
> import std.string;
> void main() {
>     split("a test", "");
> }
> 
> ------------------------
> 
> My suggestion is to add code like this in std.array.split():
> 
> if (delim.length == 0)
>     return split(s);
> 
> This means that en empty splitting string is like splitting on generic whitespace. This is useful in code like:
> 
> auto foo(string txt, string delim="") {
>     return txt.split(delim);
> }

I think it is a bad idea on two counts:

1. If the user wanted that behavior, he'd have written it as such. If the user actually passed a seperator that is an empty range, he probably didn't mean for it split by spaces.

2. I think it would also bring a deviation of behavior between strings and
non-strings. Supposing r is empty:
* "hello world".split(""); //Ok, split white
* [1, 2].split(r); //Derp.

(In reply to comment #1)
> Alternative: throw an ArgumentError("delim argument is empty") exception if
> delim is empty.

I *really* think that is a *much* saner approach. Splitting with an empty separator is just not logic. Trying to force a default behavior in that scenario is wishful thinking (IMO).

I think it should throw an error. I'll implement this.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
January 04, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=5977


hsteoh@quickfur.ath.cx changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hsteoh@quickfur.ath.cx


--- Comment #4 from hsteoh@quickfur.ath.cx 2013-01-03 20:28:42 PST ---
FWIW, in perl, splitting on an empty string simply returns an array of characters. I think that better reflects the symmetry of join("", array).

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------