November 05, 2012
On Mon, Nov 05, 2012 at 01:42:47PM -0500, Jonathan M Davis wrote:
> On Monday, November 05, 2012 15:48:41 deadalnix wrote:
> > HS Teoh explained it nicely.
> > 
> > The responsibility of using .transient or not belong to the consumer. Only the consumer can know if a transient range is suitable.
> > 
> > So you don't wrap a transient range. You wrap a non transient range, and, if the consumer is able to handle the transcient version, it call .transient on its source, that call it on its source, etc . . . up to the real source.
> > 
> > If one transformer is unable to handle transient range, the it don't pass .transient to its source.
> 
> You still need to wrap it in every wrapper range which could possibly support transience. So, this affects every single range which could possibly support transience.

You *can* wrap it if the wrapper range can support transience. It doesn't *have* to be wrapped. That's what I like about deadalnix's proposal -- do nothing, and you get *correct* behaviour. Do something, and you make it faster. Sounds like a good deal to me.

Besides, almost *all* of those wrapper ranges are currently _broken_ for transient ranges. You get undefined behaviour when you inadvertently pass a transient range to them. No matter what you do, this has to be fixed, one way or another. By fixing *all* ranges to be non-transient by default (and, as you say, there are only a scant few of them -- byLine is the only Phobos example I can think of), we fix all of this broken behaviour instantly. The rest is optional optimization.


> At least with Andrei's proposal, transience is explicitly restricted to input ranges, which seriously reduces the problems caused by them, particularly since so few functions can really function with just an input range.
[...]

With Andrei's proposal, all code that assumes transient .front with input ranges are broken by definition. I've looked over std.algorithm -- there are a *lot* of places with this assumption. Instead of getting correct default behaviour, you get a whole bunch of broken code with buggy behaviour, unless you first rewrite a lot of code in std.algorithm (and probably std.range as well).

Furthermore, afterwards there is still no safety net: accidentally create a transient forward range? You're screwed, std.algorithm breaks all over the place. You said that ranges are supposed to be easy for users to create. Well, with Andrei's scheme, all the user has to do is to write a .save method in his input range, and all of a sudden everything breaks. The problem is that input ranges allow transient .front to begin with. Now users have to remember, once I add .save, I have to make .front non-transient. This is very counterintuitive to me. I'm extending a range to make it usable as a forward range, and now suddenly I can't reuse my buffers anymore? In my mind, forward ranges should be a refinement of input ranges. So I should be able to just add more stuff to my input range, and it should become a valid forward range.  But now this is not true anymore, and the reason? We arbitrarily decided to make forward ranges non-transient.

By making *all* ranges non-transient by default, we avoid this problem. Users don't have to remember the strange interaction between .front and .save. You only get transient ranges when you explicitly ask for it.  So code is safe by default, and if you know what you're doing, you can make it faster. And std.array.array actually works correctly 100% of the time with no further changes.

The current situation is, code is *not* safe by default. Pass a transient range to std.algorithm, and sometimes it works, sometimes it breaks. Passing it to std.array.array sometimes works, sometimes doesn't. Subtle bugs arise everywhere.

With Andrei's proposal, you still have the same problem: forget to update a Phobos algorithm that expects an input range, and it breaks. You have to pretty much rewrite every single algorithm in Phobos that expects an input range, AND complete this rewrite BEFORE code starts working correctly.  Then after that, every time you write code that takes an input range, you have to watch out for transient ranges, otherwise things will break. And std.array.array can no longer take an input range because it can't copy .front correctly. IOW, input ranges become almost completely useless.

How is this any better than deadalnix's solution?  From what I can tell, it's a lot worse, for all of the above reasons.


T

-- 
"Hi." "'Lo."
November 05, 2012
On 11/5/12 9:45 PM, H. S. Teoh wrote:
> Besides, almost *all* of those wrapper ranges are currently _broken_ for
> transient ranges. You get undefined behaviour when you inadvertently
> pass a transient range to them.

It's not undefined behavior, it's just surprising behavior. UB would be a fair amount more worrisome.

> With Andrei's proposal, all code that assumes transient .front with
> input ranges are broken by definition.

I think this should be: all code that DOES NOT assume transient .front with input ranges is broken.

> I've looked over std.algorithm --
> there are a *lot* of places with this assumption. Instead of getting
> correct default behaviour, you get a whole bunch of broken code with
> buggy behaviour, unless you first rewrite a lot of code in
> std.algorithm (and probably std.range as well).

Could you please put together a list of these algorithms so we have it? Thanks.

> Furthermore, afterwards there is still no safety net: accidentally
> create a transient forward range?

But that goes for any incorrect range.

> How is this any better than deadalnix's solution?  From what I can tell,
> it's a lot worse, for all of the above reasons.

I think we should start from here: the .transient proposal will not be accepted because it is too complex. Consider it a baseline that other proposals would be evaluated against. Then let's see how to devise a robust, simple solution.


Andrei
November 05, 2012
Le 05/11/2012 22:51, Andrei Alexandrescu a écrit :
> On 11/5/12 9:45 PM, H. S. Teoh wrote:
>> Besides, almost *all* of those wrapper ranges are currently _broken_ for
>> transient ranges. You get undefined behaviour when you inadvertently
>> pass a transient range to them.
>
> It's not undefined behavior, it's just surprising behavior. UB would be
> a fair amount more worrisome.
>

Unless you know the internal implementation of the range, it is undefined (nothing in the spec specify what the range does in this case, which IS undefined behavior).

>> With Andrei's proposal, all code that assumes transient .front with
>> input ranges are broken by definition.
>
> I think this should be: all code that DOES NOT assume transient .front
> with input ranges is broken.
>

You never addressed the std.array.array() problem. Neither the fact that forward ranges may require to be transient (I have that in my rubik's cube solver, and H. S. Teoh seems to needs that too).

>> How is this any better than deadalnix's solution? From what I can tell,
>> it's a lot worse, for all of the above reasons.
>
> I think we should start from here: the .transient proposal will not be
> accepted because it is too complex. Consider it a baseline that other
> proposals would be evaluated against. Then let's see how to devise a
> robust, simple solution.
>

I'd like to see a proposal discarded in favor of a better proposal. I'm certain I can say that we don't have one. Let me explain what is wrong in your proposal (=> forward range = non transient / input range = transient).

First two concepts are packed into one. I can safely say that both are not related, and uses cases has already been presented in all possible combination of input/forward transient or not. This usually seems like a win, but ends up creating a more complex situation at the ends. I've made that mistake quite a lot of time myself, surely enough to be sure that this is a bad idea.

See for instance how the private method made final automagicaly (which sounds nice at first) create inconsistent behavior when it come to NVI as explained in TDPL. This is typical of how different concept packed into one tends to explode at some point.

Additionally, the proposed solution put a responsibility on the developer : he/she must ensure that all its code with input ranges is compatible with transient range.

The fact is that transient stuff are highly unusual in most programming languages (even when possible, it is usually avoided like plague), and that all input ranges will not be transient. This WILL result in a lot of code in the field using input range but not supporting when it is transient.

Putting the responsibility on the programmer never worked since programmer exists, and it is not going to improve soon.
November 06, 2012
On 11/6/12 1:26 AM, deadalnix wrote:
> Le 05/11/2012 22:51, Andrei Alexandrescu a écrit :
>> On 11/5/12 9:45 PM, H. S. Teoh wrote:
>>> Besides, almost *all* of those wrapper ranges are currently _broken_ for
>>> transient ranges. You get undefined behaviour when you inadvertently
>>> pass a transient range to them.
>>
>> It's not undefined behavior, it's just surprising behavior. UB would be
>> a fair amount more worrisome.
>>
>
> Unless you know the internal implementation of the range, it is
> undefined (nothing in the spec specify what the range does in this case,
> which IS undefined behavior).

That would be unspecified behavior.

>>> With Andrei's proposal, all code that assumes transient .front with
>>> input ranges are broken by definition.
>>
>> I think this should be: all code that DOES NOT assume transient .front
>> with input ranges is broken.
>>
>
> You never addressed the std.array.array() problem. Neither the fact that
> forward ranges may require to be transient (I have that in my rubik's
> cube solver, and H. S. Teoh seems to needs that too).
>
>>> How is this any better than deadalnix's solution? From what I can tell,
>>> it's a lot worse, for all of the above reasons.
>>
>> I think we should start from here: the .transient proposal will not be
>> accepted because it is too complex. Consider it a baseline that other
>> proposals would be evaluated against. Then let's see how to devise a
>> robust, simple solution.
>>
>
> I'd like to see a proposal discarded in favor of a better proposal. I'm
> certain I can say that we don't have one.

Good solutions are found in the minds of people. Starting from the idea that .transient is unacceptably complicated, work from that angle. You can't claim you have reached perfection in design can you?

> Let me explain what is wrong
> in your proposal (=> forward range = non transient / input range =
> transient).

I am well aware of the relative tradeoffs. Arguing for .transient is futile at this point. What we need to do is find something better.


Andrei
November 06, 2012
Le 06/11/2012 01:21, Andrei Alexandrescu a écrit :
>> I'd like to see a proposal discarded in favor of a better proposal. I'm
>> certain I can say that we don't have one.
>
> Good solutions are found in the minds of people. Starting from the idea
> that .transient is unacceptably complicated, work from that angle. You
> can't claim you have reached perfection in design can you?
>

I certainly don't ! I'd be happy to switch to another solution granted it is superior. I just don't see any right now, which obviously don't mean none exist.

>> Let me explain what is wrong
>> in your proposal (=> forward range = non transient / input range =
>> transient).
>
> I am well aware of the relative tradeoffs. Arguing for .transient is
> futile at this point. What we need to do is find something better.
>

To be honest, my biggest fear isn't that this proposal is rejected, but that we fallback as default on the input range = transient / forward range = non transient scheme, because we fail to come up with something better, or that the status quo is choosen (as both seems to me worse than the .transient proposal).

Now, as previously said, if someone come up with something better, I'd be happy to drop it completely. I'm not pushing this proposal because it is mine.
November 06, 2012
On Mon, Nov 05, 2012 at 11:51:35PM +0200, Andrei Alexandrescu wrote: [...]
> >With Andrei's proposal, all code that assumes transient .front with input ranges are broken by definition.
> 
> I think this should be: all code that DOES NOT assume transient .front with input ranges is broken.

Then std.array.array is broken by definition, and cannot be implemented with anything less than a forward range. This will very likely break a lot of existing code.


> >I've looked over std.algorithm -- there are a *lot* of places with this assumption. Instead of getting correct default behaviour, you get a whole bunch of broken code with buggy behaviour, unless you first rewrite a lot of code in std.algorithm (and probably std.range as well).
> 
> Could you please put together a list of these algorithms so we have it? Thanks.

This is probably an incomplete list, but it's a start:

std.algorithm.reduce - when no seed is given.
std.algorithm.joiner - both variants (I have a fix for the second variant)
std.algorithm.group
std.algorithm.minCount
(std.algorithm.minPos - but takes forwardRange, so probably OK)
(std.algorithm.Levenshtein - takes forwardRange, probably OK)
(std.algorithm.makeIndex - takes forwardRange, probably OK)
std.algorithm.splitter - tries to take slices without checking if range is sliceable
std.algorithm.uniq
std.algorithm.topNCopy - copies .front of input range.
std.algorithm.NWayUnion - copies .front of (possibly input) outer range into a heap
(std.algorithm.largestPartialIntersection - uses NWayUnion)
std.array.array - tries to copy input range
std.array.insertInPlace - tries to copy input range
std.array.join - tries to copy input range
std.stdio.writeln - there's too much code here so I didn't narrow it
	down, but testing with a transient range shows that it doesn't
	work correctly. This probably implicates std.format somewhere.

I didn't check the other Phobos modules, but basically any code that currently takes an input range needs to be audited for this issue.


[...]
> >How is this any better than deadalnix's solution?  From what I can tell, it's a lot worse, for all of the above reasons.
> 
> I think we should start from here: the .transient proposal will not be accepted because it is too complex. Consider it a baseline that other proposals would be evaluated against. Then let's see how to devise a robust, simple solution.
[...]

I still think forcing input ranges to be transient is oversimplifying the issue. Whether a range is transient is orthogonal to whether you can .save the current position or whether you can traverse it in two directions. Trying to conflate the two only leads to leaky abstractions.

The only real solution IMO is to address the issue head-on: either recognize transience as an inherent property of ranges, or (re)define ranges to exclude all transience.

If we recognize transience as an inherent property of ranges, then no matter what we do, there's going to be complications, because you'll always have to deal with two cases. Either an algorithm can handle transience, or it can't. If it can't, there needs to be a way to make it so that it will reject transient ranges somehow. With deadalnix's proposal, the overhead of doing this is more or less minimized, but it is still there.

If we exclude all transience, then there is no additional complication. We just have to fix the current code and make it crystal clear in the documentation that .front must always be persistent no matter what. The disadvantage here is that some algorithms, like find(), don't *need* the persistence of .front, so you lose generality. Any code that works with transient ranges will have to stick with foreach, or reinvent std.algorithm. With .transient proposal, this is the default behaviour, except that we have the *option* of using transient ranges if both the range type and the algorithm can handle it.

So the .transient proposal is pretty close to a comfortable balance between the two extremes. If it is still considered too complicated, then I'd like to hear what other proposals will address all the underlying issues in just as nice a way (or an even nicer way), without compromising this balance between the two extremes.


T

-- 
In order to understand recursion you must first understand recursion.
November 06, 2012
On Monday, November 05, 2012 16:49:42 H. S. Teoh wrote:
> On Mon, Nov 05, 2012 at 11:51:35PM +0200, Andrei Alexandrescu wrote: [...]
> 
> > >With Andrei's proposal, all code that assumes transient .front with input ranges are broken by definition.
> > 
> > I think this should be: all code that DOES NOT assume transient .front with input ranges is broken.
> 
> Then std.array.array is broken by definition, and cannot be implemented with anything less than a forward range. This will very likely break a lot of existing code.

We can create an hasTransientFront trait for checking whether front is transient or not. It can't be 100% accurate, but it would fall on the side of declaring something transient when it wasn't, so it wouldn't declare something transient to be non-transient. Then any range for which hasTransientFront was false could work with std.array.array. For the rest, they can do something like

auto arr = std.array.array(map!"a.dup"(file.byLine()));

But it's certainly true that fixing std.array.array will break code, which is annoying. But as long as front can be transient, there's no way around that. byLine cannot possibly work with std.array.array as it stands. The only solutions that I see at this point are

1. Exclude transience completely.

2. Mark ranges as transient in some way and force algorithms to take this new trait into account.

3. Insist that input ranges have transient fronts (save perhaps when it can be statically verified that they aren't based on front's type) and that anything requiring a non-transient front require forward ranges.

4. Make all ranges non-transient but provide a way to get at a transient version of it and force algorithms to take this new property into account.

All 4 solutions have problems. They just have different problems.

> I still think forcing input ranges to be transient is oversimplifying the issue. Whether a range is transient is orthogonal to whether you can .save the current position or whether you can traverse it in two directions. Trying to conflate the two only leads to leaky abstractions.

I completly agree that transience and whether something is an input range are orthogonal, but we need to simplify.

> The only real solution IMO is to address the issue head-on: either recognize transience as an inherent property of ranges, or (re)define ranges to exclude all transience.

Personally, I'm all for excluding transience completely and insisting that anything which would be transient use opApply, but Andrei doesn't like that idea. It would certainly be the simplest solution though, and it probably wouldn't really cause much of a problem for ranges like ByLine, because in most cases what you do is use them with a foreach loop. std.array.array would be the main loss there, but if opApply is used for foreach rather than the range functions (I don't know which currently gets precedence), then ByLine can become a normal range when used with range functions (i.e. no transience) and just reuse its buffer with opApply. Then it would work with range-based functions but still avoid extra allocations when simply iterating over it. That would be my ideal solution at this point, but clearly not everyone agrees.

- Jonathan M Davis
November 06, 2012
On Tue, Nov 06, 2012 at 01:44:28AM +0100, deadalnix wrote:
> Le 06/11/2012 01:21, Andrei Alexandrescu a écrit :
[...]
> >Good solutions are found in the minds of people. Starting from the idea that .transient is unacceptably complicated, work from that angle. You can't claim you have reached perfection in design can you?

I'd like to hear a counterproposal to the .transient idea. Let's not get into a battle of words with nothing on the table to discuss.


[...]
> To be honest, my biggest fear isn't that this proposal is rejected, but that we fallback as default on the input range = transient / forward range = non transient scheme, because we fail to come up with something better, or that the status quo is choosen (as both seems to me worse than the .transient proposal).
[...]

Yeah, that is my fear also. Which is why I'm making noise about this issue.

The status quo is definitely out of the question, as it leads to undefined/unspecified behaviour with basically no warning.

Conflating transience with input range is not workable IMO, because it leads to leaky abstractions like transient forward ranges being considered non-forward ranges because we arbitrarily decided that forward ranges cannot be transient. I've already given a number of non-trivial examples of transient forward ranges, and apparently deadalnix has also encountered the same issue. Denying these cases just for the sake of holding on to an idealized abstraction is only further proof that the abstraction is leaky. We all know from experience that leaky abstractions will only lead to trouble down the road.

The only other remaining proposal so far is Jonathan's, that is to redefine all ranges to be non-transient. Then we fix byLine() to be non-transient, and everything else will work with no further trouble. We also don't get to use std.algorithm with any transient ranges, and will in all likelihood reinvent std.algorithm everywhere a transient range pops up. I don't see this as a viable solution either.

So the .transient idea currently seems to be the best we have. I'd love to hear a superior proposal, if there is one.


T

-- 
I don't trust computers, I've spent too long programming to think that they can get anything right. -- James Miller
November 06, 2012
On 11/6/12 3:06 AM, H. S. Teoh wrote:
> I've already given a number of
> non-trivial examples of transient forward ranges, and apparently
> deadalnix has also encountered the same issue.

I'd like to build more of this argument. I pointed out that your range is actually defining .dup, not .save. I think an argument can be made that forward ranges with transient .front do not qualify as forward ranges, seeing as a forward range is modeled after a list that actually has data sitting in memory.


Andrei
November 06, 2012
On 11/6/12 2:44 AM, deadalnix wrote:
> To be honest, my biggest fear isn't that this proposal is rejected, but
> that we fallback as default on the input range = transient / forward
> range = non transient scheme, because we fail to come up with something
> better, or that the status quo is choosen (as both seems to me worse
> than the .transient proposal).

I think the simplification of input range = transient and forward range = not transient has a lot going for it. It is simple, easy to explain and understand, and builds on simple real-life examples (buffered input and singly-linked lists). Clearly adding a new notion to the soup makes for more expressiveness, but it also makes for more complexity and subtlety in support for niche ranges. This should not be neglected.

Andrei