View mode: basic / threaded / horizontal-split · Log in · Help
November 04, 2012
Re: Transience of .front in input vs. forward ranges
On 11/4/12 9:49 AM, deadalnix wrote:
> You are explaining here that is reasonable to expect so, not that this
> is the most obvious choice, the most simple, or whatever.

Agreed, it would be simpler to just create a new string for each line.

> The conclusion exceed the argument here.
>
> I'd also argue that if people that wrote phobos get that wrong (I think
> it is fair to consider people writing phobos code as D specialists), it
> is safe to assume that most D programmer will get it wrong.

Nah, actually what happened is that much phobos code has been written 
while the concepts were in flux, so probably it would be hasty to draw 
conclusions from there.

>> Requiring anything more would put undue pressure on any actual input
>> range implementations and would essentially make the entire range
>> infrastructure unusable efficiently with streaming.
>>
>
> This is already doomed to phobos implementation of range stuffs. That
> may sound nice on paper, but practice have shown this isn't a realistic
> approach.

Again, that would be a rushed conclusion to draw.

>> For ByLine we should add another range that automatically produces
>> strings - either ByLineStrings or ByLine!string.
>>
>
> If we are modifying things like that, better proposal have been made in
> this thread and previous.

That's an addition, not a modification. Big difference.


Andrei
November 04, 2012
Re: Transience of .front in input vs. forward ranges
On 11/4/12 9:36 AM, deadalnix wrote:
> Let's put back relevant elements of the solution I propose :
> 1/ range preserve .front by default .
> 2/ Ranges that would get performance boost from invalidating .front can
> propose a property (we called it .fast in the thread) that return a new
> range that invalidate .front .

IMHO this property is deducible from the others:

fast!R == isInputRange!R && !isForwardRange!R && 
hasMutableIndirections!(ElementType!R)

I think it would be vastly better to avoid defining a new property of 
ranges (that subsequently would need to be cross-checked by many 
algorithms). Then we have additional scalability issues because all 
derived ranges would need to propagate .fast etc. Then we need to cater 
for odd cases such as random-access ranges being .fast.

The simpler way would be to simply state that input ranges can't 
guarantee the persistence of .front after calling .popFront. This is 
quite expected for input ranges, and no different for any object that 
gives a char[]-based API; subsequently changing the object may change 
the contents of the char[] returned.

The algorithms that need to worry about .front's transience are only the 
ones that work with input ranges (as opposed to forward ranges or 
stronger). This approach puts the burden in the right place - on the 
implementers of specific algorithms supporting one-pass streaming.


Andrei
November 04, 2012
Re: Transience of .front in input vs. forward ranges
On Sun, Nov 04, 2012 at 08:24:55AM -0500, Andrei Alexandrescu wrote:
> On 11/3/12 7:21 PM, H. S. Teoh wrote:
[...]
> >I wish Andrei would give some input as to how we should proceed with
> >this. I do consider this a major issue with ranges, because for
> >efficiency reasons I often write ranges that have transient .front
> >values, and they can lead to subtle bugs with the current
> >implementation of std.algorithm. It would be good to settle on this
> >issue one way or another. I'd be happy even if the decision is to say
> >that transient ranges are invalid and shouldn't be considered "real"
> >ranges. Anything is better than the current nebulous state of things
> >which only leads to subtle bugs for the unwary.
> 
> I think at this point we should streamline and simplify ranges while
> addressing fundamental concepts (such as unbuffered ranges). Delving
> into defining increasingly niche attributes for ranges is important
> work, but to the extent we can make things simpler that would be a
> gain.
> 
> In my view, transiency of .front has a lot to do with input ranges.
> An input range means that .front is evanescent upon calling
> .popBack; it has no lasting presence in memory, and invoking popBack
> means the next item will replace the current one.
> 
> Fetching lines should be solved by using types; trafficking in
> char[] does not offer guarantees about the preservation of the
> content. In contrast, trafficking in string formalizes a binding
> agreement between the range and its user that the content is
> immutable.
[...]

I think that this may be oversimplifying the issue. What about the
example of a range of in-place permutations over an array? That range
can be made a forward range, or even bidirectional (by reversing the
sense of the comparison used by nextPermutation). But it is inherently a
transient range.

And this is merely one example. I have some non-trivial code that
evaluates an expression tree, each node of which may generate multiple
elements; one may take a range over all possible values that the tree
may have. For efficiency reasons, each output value (which is
vector-valued) reuses the same buffer. It is a forward range with a
transient .front (it has to be a forward range, because otherwise one
can't range over all combinations of multiple values that subtrees may
produce).

While I'm perfectly fine with the decision that such ranges shouldn't be
considered proper ranges at all, it would greatly limit the
applicability of std.algorithm in my code. Either I would have to
artificially limit the range to an input range (which is a bit silly, as
it amounts to renaming .save to something else just so std.algorithm
wouldn't recognize it as a forward range), or I wouldn't be able to use
std.algorithm to manipulate these ranges, but would have to roll my own.
Of course, D makes it so that rolling my own isn't all that hard, but it
would be a bit disappointing that the supposedly-generic algorithms in
the standard library aren't that generic after all.


T

-- 
I see that you JS got Bach.
November 04, 2012
Re: Transience of .front in input vs. forward ranges
11/4/2012 7:16 PM, Andrei Alexandrescu пишет:
> On 11/4/12 9:36 AM, deadalnix wrote:
>> Let's put back relevant elements of the solution I propose :
>> 1/ range preserve .front by default .
>> 2/ Ranges that would get performance boost from invalidating .front can
>> propose a property (we called it .fast in the thread) that return a new
>> range that invalidate .front .
>
> IMHO this property is deducible from the others:
>
> fast!R == isInputRange!R && !isForwardRange!R &&
> hasMutableIndirections!(ElementType!R)
>

Or rather onePass!R == ...

but I seriously doubt that ability to save iteration state and the fact 
of reusing single slot/memory chunk for .front are always correlated.


> I think it would be vastly better to avoid defining a new property of
> ranges (that subsequently would need to be cross-checked by many
> algorithms). Then we have additional scalability issues because all
> derived ranges would need to propagate .fast etc. Then we need to cater
> for odd cases such as random-access ranges being .fast.
>

Neither .fast nor separate transient flag strike me as good solutions as 
placing a lot of burden on the range implementer (and introducing yet 
another convention).

> The simpler way would be to simply state that input ranges can't
> guarantee the persistence of .front after calling .popFront. This is
> quite expected for input ranges, and no different for any object that
> gives a char[]-based API; subsequently changing the object may change
> the contents of the char[] returned.
>

> The algorithms that need to worry about .front's transience are only the
> ones that work with input ranges (as opposed to forward ranges or
> stronger).

The fact that algorithm doesn't save iteration state != it counts on 
transient .front.

A simple example of std.array.array will do - it iterates range once and 
pumps elements into appender/builder "sink". So it doesn't require 
forward range but just that elements are not _aliased_ under the hood.

Another example is dirEntries - it returns entries with immutable 
strings, but it can't save iteration state. It does work with array 
because of no aliasing of .front values.

>This approach puts the burden in the right place - on the
> implementers of specific algorithms supporting one-pass streaming.

Aside from mixing separate concepts into one* I like the approach.

*InputRange vs ForwardRange has nothing to do with mutable indirections 
of .front.
-- 
Dmitry Olshansky
November 04, 2012
Re: Transience of .front in input vs. forward ranges
Le 04/11/2012 16:16, Andrei Alexandrescu a écrit :
> On 11/4/12 9:36 AM, deadalnix wrote:
>> Let's put back relevant elements of the solution I propose :
>> 1/ range preserve .front by default .
>> 2/ Ranges that would get performance boost from invalidating .front can
>> propose a property (we called it .fast in the thread) that return a new
>> range that invalidate .front .
>
> IMHO this property is deducible from the others:
>
> fast!R == isInputRange!R && !isForwardRange!R &&
> hasMutableIndirections!(ElementType!R)
>
> I think it would be vastly better to avoid defining a new property of
> ranges (that subsequently would need to be cross-checked by many
> algorithms).

You can stop here. This shows that you didn't understood the proposal.

The whole point of the proposal is to avoid the cross checking part. See 
code sample posted earlier in the discussion for full understanding.

> Then we have additional scalability issues because all
> derived ranges would need to propagate .fast etc. Then we need to cater
> for odd cases such as random-access ranges being .fast.
>

The beauty of the proposal is that nothing NEED to be .fast .

> The simpler way would be to simply state that input ranges can't
> guarantee the persistence of .front after calling .popFront. This is
> quite expected for input ranges, and no different for any object that
> gives a char[]-based API; subsequently changing the object may change
> the contents of the char[] returned.
>
> The algorithms that need to worry about .front's transience are only the
> ones that work with input ranges (as opposed to forward ranges or
> stronger). This approach puts the burden in the right place - on the
> implementers of specific algorithms supporting one-pass streaming.
>
November 04, 2012
Re: Transience of .front in input vs. forward ranges
On 11/4/12 11:40 AM, deadalnix wrote:
> Le 04/11/2012 16:16, Andrei Alexandrescu a écrit :
>> On 11/4/12 9:36 AM, deadalnix wrote:
>>> Let's put back relevant elements of the solution I propose :
>>> 1/ range preserve .front by default .
>>> 2/ Ranges that would get performance boost from invalidating .front can
>>> propose a property (we called it .fast in the thread) that return a new
>>> range that invalidate .front .
>>
>> IMHO this property is deducible from the others:
>>
>> fast!R == isInputRange!R && !isForwardRange!R &&
>> hasMutableIndirections!(ElementType!R)
>>
>> I think it would be vastly better to avoid defining a new property of
>> ranges (that subsequently would need to be cross-checked by many
>> algorithms).
>
> You can stop here. This shows that you didn't understood the proposal.
>
> The whole point of the proposal is to avoid the cross checking part. See
> code sample posted earlier in the discussion for full understanding.

Indeed I'd misunderstood. So I went back and my current understanding is 
that it's all about defining this:

auto transient(R r) if (isInputRange!R)
{
    return r;
}

Then certain ranges would implement a property .transient if there's an 
opportunity for e.g. reusing buffers (a la ByLine and ByChunk). At the 
end of the day, simply invoking r.transient for any range r would either 
get the optimized range or would vacuously return that same range.

As far as I can tell from the subsequent discussion, there are quite a 
few issues related to propagating transient by ranges that compose on 
top of other ranges. By and large, adding an optional attribute of 
ranges is difficult to make free or minimal in cost.


Andrei
November 04, 2012
Re: Transience of .front in input vs. forward ranges
On 11/4/12 11:36 AM, Dmitry Olshansky wrote:
>
> The fact that algorithm doesn't save iteration state != it counts on
> transient .front.

I didn't claim that. My strongest claim was: 
input-range-having-front-with-mutable-indirection IMPLIES 
no-transience-guarantee.

Note the IMPLIES (not EQUIVALENT) and no guarantee as opposed to 
guaranteed change of contents.

> A simple example of std.array.array will do - it iterates range once and
> pumps elements into appender/builder "sink". So it doesn't require
> forward range but just that elements are not _aliased_ under the hood.

No surprise here.

> Another example is dirEntries - it returns entries with immutable
> strings, but it can't save iteration state. It does work with array
> because of no aliasing of .front values.

This example is actually nice because it shows that indeed an input 
range with NO mutable indirection on ElementType guarantees non-transience.


Andrei
November 04, 2012
Re: Transience of .front in input vs. forward ranges
Le 04/11/2012 17:57, Andrei Alexandrescu a écrit :
> Indeed I'd misunderstood. So I went back and my current understanding is
> that it's all about defining this:
>
> auto transient(R r) if (isInputRange!R)
> {
> return r;
> }
>
> Then certain ranges would implement a property .transient if there's an
> opportunity for e.g. reusing buffers (a la ByLine and ByChunk). At the
> end of the day, simply invoking r.transient for any range r would either
> get the optimized range or would vacuously return that same range.
>

Now we are on the same page. (and I like transient much better than .fast).

> As far as I can tell from the subsequent discussion, there are quite a
> few issues related to propagating transient by ranges that compose on
> top of other ranges. By and large, adding an optional attribute of
> ranges is difficult to make free or minimal in cost.
>

To sum it up, a transformation range should implement .transient if it 
can handle the fact that its source can be transient. The good news is 
that if it doesn't, then the program is still 100% correct. It is just 
slower.

So this approach promote program correctness, but still allow crazy 
bearded programmer to optimize the code where needed.

I think it fit nicely D's phylosophy, in the sense it does provide a 
safe, easy to use interface while providing a backdoor when this isn't 
enough.
November 04, 2012
Re: Transience of .front in input vs. forward ranges
On 11/4/12 12:26 PM, deadalnix wrote:
> I think it fit nicely D's phylosophy, in the sense it does provide a
> safe, easy to use interface while providing a backdoor when this isn't
> enough.

It doesn't fit the (admittedly difficult to fulfill) desideratum that 
the obvious code is safe and fast. And the obvious code uses byLine, not 
byLine.transient.

Back to a simpler solution: what's wrong with adding alternative APIs 
for certain input ranges? We have byLine, byChunk, byChunkAsync. We may 
as well add eachLine, eachChunk, eachChunkAsync and let the 
documentation explain the differences.


Andrei
November 04, 2012
Re: Transience of .front in input vs. forward ranges
11/4/2012 9:02 PM, Andrei Alexandrescu пишет:
> On 11/4/12 11:36 AM, Dmitry Olshansky wrote:
>>
>> The fact that algorithm doesn't save iteration state != it counts on
>> transient .front.
>
> I didn't claim that. My strongest claim was:
> input-range-having-front-with-mutable-indirection IMPLIES
> no-transience-guarantee.

Indeed, but it wasn't your claim that I meant. It's fine with me but 
it's hardly moving us forward.

I was investigating your simplified proposal and its consequences for 
e.g. std.array.array. And apparently forgot to make the central point.

I was mostly going by:

> The simpler way would be to simply state that input ranges can't 
guarantee the persistence of .front after calling .popFront.

and

>The algorithms that need to worry about .front's transience are only 
the ones that work with input ranges (as opposed to forward ranges or 
stronger).

And conclude that applying this simple rule to std.array.array* means it 
has to assume non-transient elements and somehow copy/dup'em. Problem is 
we don't have generic function to make 100% guaranteed unaliased copy 
(it could omit doing any work on non-aliased objects).

Also that it misses forward ranges that have aliased .front, assuming 
they are non-transient.


*It has to work with input ranges as I noted below.

>> A simple example of std.array.array will do - it iterates range once and
>> pumps elements into appender/builder "sink". So it doesn't require
>> forward range but just that elements are not _aliased_ under the hood.
>
> No surprise here.
>
>> Another example is dirEntries - it returns entries with immutable
>> strings, but it can't save iteration state. It does work with array
>> because of no aliasing of .front values.
>
> This example is actually nice because it shows that indeed an input
> range with NO mutable indirection on ElementType guarantees non-transience.
>


-- 
Dmitry Olshansky
1 2 3 4 5 6 7
Top | Discussion index | About this forum | D home