Thread overview
inconsistent behavior of std.string.split
Aug 20, 2005
zwang
Aug 20, 2005
zwang
August 20, 2005
According to the documentation:
<spec>
char[][] split(char[] s)
    Split s[] into an array of words, using whitespace as the delimiter.

char[][] split(char[] s, char[] delim)
    Split s[] into an array of words, using delim[] as the delimiter.
</spec>

Intuitively, split(s) should be equivalent to split(s, " \t\f\r\n\v").
But the former function discards empty lines while the latter does not.
The following example demonstrates the difference.

<code>
import std.stdio;
import std.string;
void main(){
	writefln(std.string.split("0  3"," ")); //[0,,3]
	writefln(std.string.split("0  3"));     //[0,3]
	writefln(std.string.split("    "," ")); //[,,,,]
	writefln(std.string.split("    "));     //[]
}
</code>
August 20, 2005
"zwang" <nehzgnaw@gmail.com> wrote in message news:de7c7e$17au$1@digitaldaemon.com...
> According to the documentation:
> <spec>
> char[][] split(char[] s)
>     Split s[] into an array of words, using whitespace as the delimiter.
>
> char[][] split(char[] s, char[] delim)
>     Split s[] into an array of words, using delim[] as the delimiter.
> </spec>
>
> Intuitively, split(s) should be equivalent to split(s, " \t\f\r\n\v").
> But the former function discards empty lines while the latter does not.
> The following example demonstrates the difference.
>
> <code>
> import std.stdio;
> import std.string;
> void main(){
> writefln(std.string.split("0  3"," ")); //[0,,3]
> writefln(std.string.split("0  3"));     //[0,3]
> writefln(std.string.split("    "," ")); //[,,,,]
> writefln(std.string.split("    "));     //[]
> }
> </code>

Yeah, the one that takes a delimiter string should skip any zero-length strings in-between delimiters.  The whitespace one will keep skipping characters until it hits a non-whitespace one, but the delimiter one will create a new string after every delimiter, when it should just keep reading delimiters until it hits a non-delimiter sequence.


August 20, 2005
Jarrett Billingsley wrote:
> "zwang" <nehzgnaw@gmail.com> wrote in message news:de7c7e$17au$1@digitaldaemon.com...
> 
>>According to the documentation:
>><spec>
>>char[][] split(char[] s)
>>    Split s[] into an array of words, using whitespace as the delimiter.
>>
>>char[][] split(char[] s, char[] delim)
>>    Split s[] into an array of words, using delim[] as the delimiter.
>></spec>
>>
>>Intuitively, split(s) should be equivalent to split(s, " \t\f\r\n\v").
>>But the former function discards empty lines while the latter does not.
>>The following example demonstrates the difference.
>>
>><code>
>>import std.stdio;
>>import std.string;
>>void main(){
>>writefln(std.string.split("0  3"," ")); //[0,,3]
>>writefln(std.string.split("0  3"));     //[0,3]
>>writefln(std.string.split("    "," ")); //[,,,,]
>>writefln(std.string.split("    "));     //[]
>>}
>></code>
> 
> 
> Yeah, the one that takes a delimiter string should skip any zero-length strings in-between delimiters.  The whitespace one will keep skipping characters until it hits a non-whitespace one, but the delimiter one will create a new string after every delimiter, when it should just keep reading delimiters until it hits a non-delimiter sequence. 
> 
> 

Keeping zero-length strings is sometimes useful, for example, when parsing a CSV or tab-delimited file. A better solution might be two versions of split that handle consecutive delimiters differently. Or another two overloaded split functions for the special case of whitespace delimiters.
August 20, 2005
"zwang" <nehzgnaw@gmail.com> wrote in message news:de7e7l$18se$1@digitaldaemon.com...
> Keeping zero-length strings is sometimes useful, for example, when parsing a CSV or tab-delimited file. A better solution might be two versions of split that handle consecutive delimiters differently. Or another two overloaded split functions for the special case of whitespace delimiter

Good point.