View mode: basic / threaded / horizontal-split · Log in · Help
October 21, 2012
Why does std.string.splitLines return an array?
std.string.splitLines returns an array, which is pretty grody.  Why not 
return a lazily-evaluated range struct so that we can avoid allocations 
on this simple but common operation?
October 21, 2012
Re: Why does std.string.splitLines return an array?
Chad J:

> std.string.splitLines returns an array, which is pretty grody.  
> Why not return a lazily-evaluated range struct so that we can 
> avoid allocations on this simple but common operation?

splitLines is probably modeled on the str.splitlines() string 
method of Python, that returns a list (array) of strings (because 
originally Python was eager). In Phobos there is both a split() 
and splitter(), they are eager and lazy. So maybe you want a 
splitterLines().

I have asked for a lazy splitLines, vote here:
http://d.puremagic.com/issues/show_bug.cgi?id=4764

But I have suggested for a different naming:
http://d.puremagic.com/issues/show_bug.cgi?id=5838

See also:
http://d.puremagic.com/issues/show_bug.cgi?id=6730
http://d.puremagic.com/issues/show_bug.cgi?id=7689

And especially:
http://d.puremagic.com/issues/show_bug.cgi?id=8013

Bye,
bearophile
October 21, 2012
Re: Why does std.string.splitLines return an array?
On Sun, 2012-10-21 at 18:00 -0400, Chad J wrote:
> std.string.splitLines returns an array, which is pretty grody.  Why not 
> return a lazily-evaluated range struct so that we can avoid allocations 
> on this simple but common operation?

If you want a lazy range, then use std.algorithm.splitter. std.string
operates on and returns strings, not general ranges.

- Jonathan M Davis
October 21, 2012
Re: Why does std.string.splitLines return an array?
On 10/21/2012 06:35 PM, Jonathan M Davis wrote:
> On Sun, 2012-10-21 at 18:00 -0400, Chad J wrote:
>> std.string.splitLines returns an array, which is pretty grody.  Why not
>> return a lazily-evaluated range struct so that we can avoid allocations
>> on this simple but common operation?
>
> If you want a lazy range, then use std.algorithm.splitter. std.string
> operates on and returns strings, not general ranges.
>
> - Jonathan M Davis
>

std.algorithm.splitter is simply not acceptable for this.  It doesn't 
have this kind of logic:

bool matchLineEnd( string text, size_t pos )
{
	if ( pos+1 < text.length
	  && text[pos] == '\r'
	  && text[pos+1] == '\n' )
		return true;
	else if ( pos < text.length
	  && (text[pos] == '\r' || text[pos] == '\n') )
		return true;
	else
		return false;
}

I've never used std.algorithm.splitter for line splitting, despite 
trying.  It's always more effective to write your own.

I'm with bearophile on this one:
http://d.puremagic.com/issues/show_bug.cgi?id=4764

I think his suggestions about naming also just make *sense*.  I'm not 
sure how practical some of those naming changes would be if there is a 
lot of wild D2 code that uses the current weirdly-named stuff that 
emphasizes eager evaluation and extraneous allocations.  I'm not sure 
how necessary it is to even /have/ functions that return arrays when 
there are lazy versions: the result of a lazy function can always be fed 
to std.array.array(range).  Heh, even parentheses nesting is nicely 
handled by UFCS now.
October 22, 2012
Re: Why does std.string.splitLines return an array?
On 10/22/12 1:05 AM, Chad J wrote:
> On 10/21/2012 06:35 PM, Jonathan M Davis wrote:
>> On Sun, 2012-10-21 at 18:00 -0400, Chad J wrote:
>>> std.string.splitLines returns an array, which is pretty grody. Why not
>>> return a lazily-evaluated range struct so that we can avoid allocations
>>> on this simple but common operation?
>>
>> If you want a lazy range, then use std.algorithm.splitter. std.string
>> operates on and returns strings, not general ranges.
>>
>> - Jonathan M Davis
>>
>
> std.algorithm.splitter is simply not acceptable for this. It doesn't
> have this kind of logic:
>
> bool matchLineEnd( string text, size_t pos )
> {
> if ( pos+1 < text.length
> && text[pos] == '\r'
> && text[pos+1] == '\n' )
> return true;
> else if ( pos < text.length
> && (text[pos] == '\r' || text[pos] == '\n') )
> return true;
> else
> return false;
> }

Agreed. We should add splitter() accepting only one argument of some 
string type. It would use the line splitting logic above.

Could you please adapt your code to do this and package it in a pull 
request? Thanks!


Andrei
Top | Discussion index | About this forum | D home