Thread overview
Why does std.string.splitLines return an array?
Oct 21, 2012
Chad J
Oct 21, 2012
bearophile
Oct 21, 2012
Jonathan M Davis
Oct 21, 2012
Chad J
October 21, 2012
std.string.splitLines returns an array, which is pretty grody.  Why not return a lazily-evaluated range struct so that we can avoid allocations on this simple but common operation?
October 21, 2012
Chad J:

> std.string.splitLines returns an array, which is pretty grody.  Why not return a lazily-evaluated range struct so that we can avoid allocations on this simple but common operation?

splitLines is probably modeled on the str.splitlines() string method of Python, that returns a list (array) of strings (because originally Python was eager). In Phobos there is both a split() and splitter(), they are eager and lazy. So maybe you want a splitterLines().

I have asked for a lazy splitLines, vote here:
http://d.puremagic.com/issues/show_bug.cgi?id=4764

But I have suggested for a different naming:
http://d.puremagic.com/issues/show_bug.cgi?id=5838

See also:
http://d.puremagic.com/issues/show_bug.cgi?id=6730
http://d.puremagic.com/issues/show_bug.cgi?id=7689

And especially:
http://d.puremagic.com/issues/show_bug.cgi?id=8013

Bye,
bearophile
October 21, 2012
On Sun, 2012-10-21 at 18:00 -0400, Chad J wrote:
> std.string.splitLines returns an array, which is pretty grody.  Why not return a lazily-evaluated range struct so that we can avoid allocations on this simple but common operation?

If you want a lazy range, then use std.algorithm.splitter. std.string operates on and returns strings, not general ranges.

- Jonathan M Davis

October 21, 2012
On 10/21/2012 06:35 PM, Jonathan M Davis wrote:
> On Sun, 2012-10-21 at 18:00 -0400, Chad J wrote:
>> std.string.splitLines returns an array, which is pretty grody.  Why not
>> return a lazily-evaluated range struct so that we can avoid allocations
>> on this simple but common operation?
>
> If you want a lazy range, then use std.algorithm.splitter. std.string
> operates on and returns strings, not general ranges.
>
> - Jonathan M Davis
>

std.algorithm.splitter is simply not acceptable for this.  It doesn't have this kind of logic:

bool matchLineEnd( string text, size_t pos )
{
	if ( pos+1 < text.length
	  && text[pos] == '\r'
	  && text[pos+1] == '\n' )
		return true;
	else if ( pos < text.length
	  && (text[pos] == '\r' || text[pos] == '\n') )
		return true;
	else
		return false;
}

I've never used std.algorithm.splitter for line splitting, despite trying.  It's always more effective to write your own.

I'm with bearophile on this one:
http://d.puremagic.com/issues/show_bug.cgi?id=4764

I think his suggestions about naming also just make *sense*.  I'm not sure how practical some of those naming changes would be if there is a lot of wild D2 code that uses the current weirdly-named stuff that emphasizes eager evaluation and extraneous allocations.  I'm not sure how necessary it is to even /have/ functions that return arrays when there are lazy versions: the result of a lazy function can always be fed to std.array.array(range).  Heh, even parentheses nesting is nicely handled by UFCS now.

October 22, 2012
On 10/22/12 1:05 AM, Chad J wrote:
> On 10/21/2012 06:35 PM, Jonathan M Davis wrote:
>> On Sun, 2012-10-21 at 18:00 -0400, Chad J wrote:
>>> std.string.splitLines returns an array, which is pretty grody. Why not
>>> return a lazily-evaluated range struct so that we can avoid allocations
>>> on this simple but common operation?
>>
>> If you want a lazy range, then use std.algorithm.splitter. std.string
>> operates on and returns strings, not general ranges.
>>
>> - Jonathan M Davis
>>
>
> std.algorithm.splitter is simply not acceptable for this. It doesn't
> have this kind of logic:
>
> bool matchLineEnd( string text, size_t pos )
> {
> if ( pos+1 < text.length
> && text[pos] == '\r'
> && text[pos+1] == '\n' )
> return true;
> else if ( pos < text.length
> && (text[pos] == '\r' || text[pos] == '\n') )
> return true;
> else
> return false;
> }

Agreed. We should add splitter() accepting only one argument of some string type. It would use the line splitting logic above.

Could you please adapt your code to do this and package it in a pull request? Thanks!


Andrei