November 04, 2012
On 11/4/12 9:49 AM, deadalnix wrote:
> You are explaining here that is reasonable to expect so, not that this
> is the most obvious choice, the most simple, or whatever.

Agreed, it would be simpler to just create a new string for each line.

> The conclusion exceed the argument here.
>
> I'd also argue that if people that wrote phobos get that wrong (I think
> it is fair to consider people writing phobos code as D specialists), it
> is safe to assume that most D programmer will get it wrong.

Nah, actually what happened is that much phobos code has been written while the concepts were in flux, so probably it would be hasty to draw conclusions from there.

>> Requiring anything more would put undue pressure on any actual input
>> range implementations and would essentially make the entire range
>> infrastructure unusable efficiently with streaming.
>>
>
> This is already doomed to phobos implementation of range stuffs. That
> may sound nice on paper, but practice have shown this isn't a realistic
> approach.

Again, that would be a rushed conclusion to draw.

>> For ByLine we should add another range that automatically produces
>> strings - either ByLineStrings or ByLine!string.
>>
>
> If we are modifying things like that, better proposal have been made in
> this thread and previous.

That's an addition, not a modification. Big difference.


Andrei
November 04, 2012
On 11/4/12 9:36 AM, deadalnix wrote:
> Let's put back relevant elements of the solution I propose :
> 1/ range preserve .front by default .
> 2/ Ranges that would get performance boost from invalidating .front can
> propose a property (we called it .fast in the thread) that return a new
> range that invalidate .front .

IMHO this property is deducible from the others:

fast!R == isInputRange!R && !isForwardRange!R && hasMutableIndirections!(ElementType!R)

I think it would be vastly better to avoid defining a new property of ranges (that subsequently would need to be cross-checked by many algorithms). Then we have additional scalability issues because all derived ranges would need to propagate .fast etc. Then we need to cater for odd cases such as random-access ranges being .fast.

The simpler way would be to simply state that input ranges can't guarantee the persistence of .front after calling .popFront. This is quite expected for input ranges, and no different for any object that gives a char[]-based API; subsequently changing the object may change the contents of the char[] returned.

The algorithms that need to worry about .front's transience are only the ones that work with input ranges (as opposed to forward ranges or stronger). This approach puts the burden in the right place - on the implementers of specific algorithms supporting one-pass streaming.


Andrei
November 04, 2012
On Sun, Nov 04, 2012 at 08:24:55AM -0500, Andrei Alexandrescu wrote:
> On 11/3/12 7:21 PM, H. S. Teoh wrote:
[...]
> >I wish Andrei would give some input as to how we should proceed with this. I do consider this a major issue with ranges, because for efficiency reasons I often write ranges that have transient .front values, and they can lead to subtle bugs with the current implementation of std.algorithm. It would be good to settle on this issue one way or another. I'd be happy even if the decision is to say that transient ranges are invalid and shouldn't be considered "real" ranges. Anything is better than the current nebulous state of things which only leads to subtle bugs for the unwary.
> 
> I think at this point we should streamline and simplify ranges while addressing fundamental concepts (such as unbuffered ranges). Delving into defining increasingly niche attributes for ranges is important work, but to the extent we can make things simpler that would be a gain.
> 
> In my view, transiency of .front has a lot to do with input ranges. An input range means that .front is evanescent upon calling .popBack; it has no lasting presence in memory, and invoking popBack means the next item will replace the current one.
> 
> Fetching lines should be solved by using types; trafficking in char[] does not offer guarantees about the preservation of the content. In contrast, trafficking in string formalizes a binding agreement between the range and its user that the content is immutable.
[...]

I think that this may be oversimplifying the issue. What about the example of a range of in-place permutations over an array? That range can be made a forward range, or even bidirectional (by reversing the sense of the comparison used by nextPermutation). But it is inherently a transient range.

And this is merely one example. I have some non-trivial code that evaluates an expression tree, each node of which may generate multiple elements; one may take a range over all possible values that the tree may have. For efficiency reasons, each output value (which is vector-valued) reuses the same buffer. It is a forward range with a transient .front (it has to be a forward range, because otherwise one can't range over all combinations of multiple values that subtrees may produce).

While I'm perfectly fine with the decision that such ranges shouldn't be considered proper ranges at all, it would greatly limit the applicability of std.algorithm in my code. Either I would have to artificially limit the range to an input range (which is a bit silly, as it amounts to renaming .save to something else just so std.algorithm wouldn't recognize it as a forward range), or I wouldn't be able to use std.algorithm to manipulate these ranges, but would have to roll my own. Of course, D makes it so that rolling my own isn't all that hard, but it would be a bit disappointing that the supposedly-generic algorithms in the standard library aren't that generic after all.


T

-- 
I see that you JS got Bach.
November 04, 2012
11/4/2012 7:16 PM, Andrei Alexandrescu пишет:
> On 11/4/12 9:36 AM, deadalnix wrote:
>> Let's put back relevant elements of the solution I propose :
>> 1/ range preserve .front by default .
>> 2/ Ranges that would get performance boost from invalidating .front can
>> propose a property (we called it .fast in the thread) that return a new
>> range that invalidate .front .
>
> IMHO this property is deducible from the others:
>
> fast!R == isInputRange!R && !isForwardRange!R &&
> hasMutableIndirections!(ElementType!R)
>

Or rather onePass!R == ...

but I seriously doubt that ability to save iteration state and the fact of reusing single slot/memory chunk for .front are always correlated.


> I think it would be vastly better to avoid defining a new property of
> ranges (that subsequently would need to be cross-checked by many
> algorithms). Then we have additional scalability issues because all
> derived ranges would need to propagate .fast etc. Then we need to cater
> for odd cases such as random-access ranges being .fast.
>

Neither .fast nor separate transient flag strike me as good solutions as placing a lot of burden on the range implementer (and introducing yet another convention).

> The simpler way would be to simply state that input ranges can't
> guarantee the persistence of .front after calling .popFront. This is
> quite expected for input ranges, and no different for any object that
> gives a char[]-based API; subsequently changing the object may change
> the contents of the char[] returned.
>

> The algorithms that need to worry about .front's transience are only the
> ones that work with input ranges (as opposed to forward ranges or
> stronger).

The fact that algorithm doesn't save iteration state != it counts on transient .front.

A simple example of std.array.array will do - it iterates range once and pumps elements into appender/builder "sink". So it doesn't require forward range but just that elements are not _aliased_ under the hood.

Another example is dirEntries - it returns entries with immutable strings, but it can't save iteration state. It does work with array because of no aliasing of .front values.

>This approach puts the burden in the right place - on the
> implementers of specific algorithms supporting one-pass streaming.

Aside from mixing separate concepts into one* I like the approach.

*InputRange vs ForwardRange has nothing to do with mutable indirections of .front.
-- 
Dmitry Olshansky
November 04, 2012
Le 04/11/2012 16:16, Andrei Alexandrescu a écrit :
> On 11/4/12 9:36 AM, deadalnix wrote:
>> Let's put back relevant elements of the solution I propose :
>> 1/ range preserve .front by default .
>> 2/ Ranges that would get performance boost from invalidating .front can
>> propose a property (we called it .fast in the thread) that return a new
>> range that invalidate .front .
>
> IMHO this property is deducible from the others:
>
> fast!R == isInputRange!R && !isForwardRange!R &&
> hasMutableIndirections!(ElementType!R)
>
> I think it would be vastly better to avoid defining a new property of
> ranges (that subsequently would need to be cross-checked by many
> algorithms).

You can stop here. This shows that you didn't understood the proposal.

The whole point of the proposal is to avoid the cross checking part. See code sample posted earlier in the discussion for full understanding.

> Then we have additional scalability issues because all
> derived ranges would need to propagate .fast etc. Then we need to cater
> for odd cases such as random-access ranges being .fast.
>

The beauty of the proposal is that nothing NEED to be .fast .

> The simpler way would be to simply state that input ranges can't
> guarantee the persistence of .front after calling .popFront. This is
> quite expected for input ranges, and no different for any object that
> gives a char[]-based API; subsequently changing the object may change
> the contents of the char[] returned.
>
> The algorithms that need to worry about .front's transience are only the
> ones that work with input ranges (as opposed to forward ranges or
> stronger). This approach puts the burden in the right place - on the
> implementers of specific algorithms supporting one-pass streaming.
>

November 04, 2012
On 11/4/12 11:40 AM, deadalnix wrote:
> Le 04/11/2012 16:16, Andrei Alexandrescu a écrit :
>> On 11/4/12 9:36 AM, deadalnix wrote:
>>> Let's put back relevant elements of the solution I propose :
>>> 1/ range preserve .front by default .
>>> 2/ Ranges that would get performance boost from invalidating .front can
>>> propose a property (we called it .fast in the thread) that return a new
>>> range that invalidate .front .
>>
>> IMHO this property is deducible from the others:
>>
>> fast!R == isInputRange!R && !isForwardRange!R &&
>> hasMutableIndirections!(ElementType!R)
>>
>> I think it would be vastly better to avoid defining a new property of
>> ranges (that subsequently would need to be cross-checked by many
>> algorithms).
>
> You can stop here. This shows that you didn't understood the proposal.
>
> The whole point of the proposal is to avoid the cross checking part. See
> code sample posted earlier in the discussion for full understanding.

Indeed I'd misunderstood. So I went back and my current understanding is that it's all about defining this:

auto transient(R r) if (isInputRange!R)
{
    return r;
}

Then certain ranges would implement a property .transient if there's an opportunity for e.g. reusing buffers (a la ByLine and ByChunk). At the end of the day, simply invoking r.transient for any range r would either get the optimized range or would vacuously return that same range.

As far as I can tell from the subsequent discussion, there are quite a few issues related to propagating transient by ranges that compose on top of other ranges. By and large, adding an optional attribute of ranges is difficult to make free or minimal in cost.


Andrei
November 04, 2012
On 11/4/12 11:36 AM, Dmitry Olshansky wrote:
>
> The fact that algorithm doesn't save iteration state != it counts on
> transient .front.

I didn't claim that. My strongest claim was: input-range-having-front-with-mutable-indirection IMPLIES no-transience-guarantee.

Note the IMPLIES (not EQUIVALENT) and no guarantee as opposed to guaranteed change of contents.

> A simple example of std.array.array will do - it iterates range once and
> pumps elements into appender/builder "sink". So it doesn't require
> forward range but just that elements are not _aliased_ under the hood.

No surprise here.

> Another example is dirEntries - it returns entries with immutable
> strings, but it can't save iteration state. It does work with array
> because of no aliasing of .front values.

This example is actually nice because it shows that indeed an input range with NO mutable indirection on ElementType guarantees non-transience.


Andrei
November 04, 2012
Le 04/11/2012 17:57, Andrei Alexandrescu a écrit :
> Indeed I'd misunderstood. So I went back and my current understanding is
> that it's all about defining this:
>
> auto transient(R r) if (isInputRange!R)
> {
> return r;
> }
>
> Then certain ranges would implement a property .transient if there's an
> opportunity for e.g. reusing buffers (a la ByLine and ByChunk). At the
> end of the day, simply invoking r.transient for any range r would either
> get the optimized range or would vacuously return that same range.
>

Now we are on the same page. (and I like transient much better than .fast).

> As far as I can tell from the subsequent discussion, there are quite a
> few issues related to propagating transient by ranges that compose on
> top of other ranges. By and large, adding an optional attribute of
> ranges is difficult to make free or minimal in cost.
>

To sum it up, a transformation range should implement .transient if it can handle the fact that its source can be transient. The good news is that if it doesn't, then the program is still 100% correct. It is just slower.

So this approach promote program correctness, but still allow crazy bearded programmer to optimize the code where needed.

I think it fit nicely D's phylosophy, in the sense it does provide a safe, easy to use interface while providing a backdoor when this isn't enough.
November 04, 2012
On 11/4/12 12:26 PM, deadalnix wrote:
> I think it fit nicely D's phylosophy, in the sense it does provide a
> safe, easy to use interface while providing a backdoor when this isn't
> enough.

It doesn't fit the (admittedly difficult to fulfill) desideratum that the obvious code is safe and fast. And the obvious code uses byLine, not byLine.transient.

Back to a simpler solution: what's wrong with adding alternative APIs for certain input ranges? We have byLine, byChunk, byChunkAsync. We may as well add eachLine, eachChunk, eachChunkAsync and let the documentation explain the differences.


Andrei
November 04, 2012
11/4/2012 9:02 PM, Andrei Alexandrescu пишет:
> On 11/4/12 11:36 AM, Dmitry Olshansky wrote:
>>
>> The fact that algorithm doesn't save iteration state != it counts on
>> transient .front.
>
> I didn't claim that. My strongest claim was:
> input-range-having-front-with-mutable-indirection IMPLIES
> no-transience-guarantee.

Indeed, but it wasn't your claim that I meant. It's fine with me but it's hardly moving us forward.

I was investigating your simplified proposal and its consequences for e.g. std.array.array. And apparently forgot to make the central point.

I was mostly going by:

> The simpler way would be to simply state that input ranges can't guarantee the persistence of .front after calling .popFront.

and

>The algorithms that need to worry about .front's transience are only the ones that work with input ranges (as opposed to forward ranges or stronger).

And conclude that applying this simple rule to std.array.array* means it has to assume non-transient elements and somehow copy/dup'em. Problem is we don't have generic function to make 100% guaranteed unaliased copy (it could omit doing any work on non-aliased objects).

Also that it misses forward ranges that have aliased .front, assuming they are non-transient.


*It has to work with input ranges as I noted below.

>> A simple example of std.array.array will do - it iterates range once and
>> pumps elements into appender/builder "sink". So it doesn't require
>> forward range but just that elements are not _aliased_ under the hood.
>
> No surprise here.
>
>> Another example is dirEntries - it returns entries with immutable
>> strings, but it can't save iteration state. It does work with array
>> because of no aliasing of .front values.
>
> This example is actually nice because it shows that indeed an input
> range with NO mutable indirection on ElementType guarantees non-transience.
>


-- 
Dmitry Olshansky