November 05, 2012
On Sunday, November 04, 2012 21:07:46 H. S. Teoh wrote:
> Stop right there. You're contradicting yourself. You kept saying that transient ranges are rare, abnormal, etc., and now you say they are a potentially large number of range types? I'm sorry, you've managed to utterly confuse me.  Are they rare, or are they not?
> 
> If they are rare, then this thin wrapper only needs to be written in those few rare cases. *No other range type needs to be changed.*
> 
> If they are common, then they aren't abnormal, and we should be fixing Phobos to deal with them. Everywhere. In a pervasive, invasive, large changeset, because almost all current code is broken w.r.t. to these common cases -- if indeed they are common.

Base ranges with transient fronts are rare, but as soon as you have to take them into account with wrapper ranges, then they potentially affect almost every range-based function in Phobos. So, you're taking a very rare case and forcing _everything_ to account for it. _That_ is why it affects a large number of range types.

> > It's yet one more thing that has to been maintained and tested. We need to be simplifying ranges, not complicating them yet further. They're already seriously pushing it in terms of how complicated they are. How many people outside of this newsgroup actually, properly understand ranges as it is?
> 
> [...]
> 
> As I said, by having a UFCS version of .transient that simply returns the original range, you don't even need to know what .transient does. Don't even include it in your range, and it just behaves as a normal non-transient range. How is that any more complicated?

You can't just return the original range from a wrapper range and have it work. Take filter for instance, if you got at the range that it wrapped, you'd be completely bypassing filter, and the range wouldn't be filtered at all. Any and all wrapper ranges would have to take transient into account. If they couldn't do transient ranges, then they just wouldn't propagate or use transient. But if they could, then they'd have to create their own transient property which returned an appropriate range type which functioned properly with transience. Yes, it would use the transient property of the range that it was wrapping, but it would have to create its own transient property which functioned properly with whatever it was doing.

The _only_ way that this doesn't affect a lot of ranges in Phobos is if we just don't bother to propagate the transient property, and if we don't bother to propagate it, then there was no point in having it in the first place. Such ranges might as well have their own functions for iterating or just use opApply if no wrapper ranges are going to propagate their extra properties or if no range-based functions take those extra properties into account.

- Jonathan M Davis
November 05, 2012
Le 05/11/2012 04:11, Jonathan M Davis a écrit :
> On Sunday, November 04, 2012 15:40:18 deadalnix wrote:
>> Le 04/11/2012 03:26, Jonathan M Davis a écrit :
>>> 3. Make it so that ranges which can be transient are non-transient by
>>> default but provide a function to get at a transient version for speed
>>> (which was the fastRange proposal in this thread). The main problem here
>>> is that when the fast range gets wrapped, it's transient, and so anything
>>> using the wrapped range will be forced to use the transient version
>>> rather than using the non- transient version and only using the transient
>>> version when it's asked for. So, I don't think that this is particularly
>>> viable.
>>
>> Can you elaborate on that. I definitively don't see a problem that big here.
>
> The problem is that you can't opt out of the fast range once you've got it. If
> you use fastRange and then the result ends up getting wrapped by another
> range, that range's front is transient and has no way of indicating it. So,
> you're right back in the boat that we're in right now. In order to avoid that,
> virtually all range-based functions would have to not use fastRange, because
> they have no idea whether you're going to use their result to pass to another
> function or not.
>
> - Jonathan M Davis

HS Teoh explained it nicely.

The responsibility of using .transient or not belong to the consumer. Only the consumer can know if a transient range is suitable.

So you don't wrap a transient range. You wrap a non transient range, and, if the consumer is able to handle the transcient version, it call .transient on its source, that call it on its source, etc . . . up to the real source.

If one transformer is unable to handle transient range, the it don't pass .transient to its source.
November 05, 2012
Le 04/11/2012 23:10, H. S. Teoh a écrit :
> On Sun, Nov 04, 2012 at 12:38:06PM -0500, Andrei Alexandrescu wrote:
>> On 11/4/12 12:26 PM, deadalnix wrote:
>>> I think it fit nicely D's phylosophy, in the sense it does provide a
>>> safe, easy to use interface while providing a backdoor when this
>>> isn't enough.
>>
>> It doesn't fit the (admittedly difficult to fulfill) desideratum that
>> the obvious code is safe and fast. And the obvious code uses byLine,
>> not byLine.transient.
>
> Actually, it does. The proposal was to modify byLine so that it returns
> strings instead of a reused char[] buffer. The current version of byLine
> that has transient front will be moved into a .transient property. This
> has the following results:
>
> - All existing code doesn't break: it just becomes a tad slower from
>    duplicating each line. Correctness is not broken.
>
> - Code that wants to be fast can call .transient at the end of the
>    construction, e.g., stdin.byLine().map!myFunc.transient. Assuming
>    that map propagates .transient (which also implies it's safe to use
>    with transient ranges), this will return an optimized range that
>    reuses the transient buffer safely.
>
> - Code that calls .transient on non-transient ranges get a no-op:
>    [1,2,3].map!myFunc.transient is equivalent to [1,2,3].map!myFunc,
>    because non-transient ranges just alias .transient to themselves (this
>    can be done via UFCS so no actual change needs to be made to existing
>    non-transient ranges).
>
> IOW, code is always 100% safe by default. You can optionally use
> .transient with certain ranges if you'd like for it to be faster. Using
> .transient where it isn't supported simply defaults to the usual safe
> behaviour (no performance increase, but no safety bugs either).
>
> Now, this can be taken one step further. User code need not even know
> what .transient is. As deadalnix said, the insight is that it is only
> invoked by range *consumers*. For example, one could in theory make
> canFind() transient-safe, so the user would write this:
>
> 	auto n = stdin.byLine().map!myFunc.canFind(myNeedle);
>
> and canFind uses the .transient range to do the finding. Assuming
> map!myFunc correctly supports .transient, this request for the faster
> range automatically propagates back to byLine(), and so the code is fast
> *without the user even asking for .transient*.
>
> And if the range being scanned is non-transient, then it behaves
> normally as it does today. For example, if map!myFunc does *not* support
> .transient, then when canFind asks for .transient, UFCS takes over and
> it just gets the (non-transient) mapped range back. Which in turn asks
> for the *non-transient* version of byLine(), so in no case will there be
> any safety breakage.
>
> I think this solution is most elegant so far, in the sense that:
>
> (1) Existing code doesn't break;
> (2) Existing code doesn't even need to change, or can be slowly
> optimized to use .transient on a case-by-case basis without any code
> breakage;
> (3) The user doesn't even need to know what .transient is to reap the
> benefits, where it's supported;
> (4) The algorithm writer decides whether or not .transient is supported,
> guaranteeing that there will be no hidden bugs caused by assuming .front
> is persistent and then getting a transient range passed to it. So it's
> the algorithm that declares whether or not it supports .transient.
>
>
>> Back to a simpler solution: what's wrong with adding alternative
>> APIs for certain input ranges? We have byLine, byChunk,
>> byChunkAsync. We may as well add eachLine, eachChunk, eachChunkAsync
>> and let the documentation explain the differences.
> [...]
>
> This is less elegant than deadalnix's proposal, because there is a
> certain risk associated with using .transient ranges. Now the onus is on
> the user to make sure he doesn't pass a transient range to an algorithm
> that can't handle it.
>
> With deadalnix's proposal, it's the *algorithm* that decides whether or
> not it knows how to handle .transient properly. The user doesn't have to
> know, and can still reap the benefits. An algorithm that doesn't know
> how to deal with transient ranges simply does nothing, and UFCS takes
> over to provide the default .transient, which just returns the original
> non-transient range.
>

I think I'll hire you when I need to promote my ideas :D This is much better explained than what I did myself and is exactly what I had in mind.
November 05, 2012
Le 05/11/2012 04:21, Jonathan M Davis a écrit :
> However, that means that we need to be very clear about the fact that input
> ranges can have transient fronts and that algorithms cannot assume that their
> fronts are not transient (something that only you appear to have thought was
> clear).

Note that this break inputRange.array(); Which seems like a really big problem to me.

> We then either have to change a bunch of algorithms so that they
> require forward ranges, or we need to create a trait which can determine
> whether front is transient in it least some cases (it would still be claiming
> that front is transient in some cases where it isn't, because it can't know
> for sure, but we'd at least be able to figure it out in some cases), and make
> algorithms use that in their constraints when they deal with input ranges.
> std.array.array would be a prime case where an algorithm would have to be
> changed, because it can't function with a transient front, and it's a prime
> example of a function that you'd normally expect to work with an input range.
> And this _will_ break code. It'll allow us to fix the ByLine problem though.
>
> - Jonathan M Davis

November 05, 2012
On Sun, Nov 04, 2012 at 11:58:18PM -0800, Jonathan M Davis wrote:
> On Sunday, November 04, 2012 21:07:46 H. S. Teoh wrote:
> > Stop right there. You're contradicting yourself. You kept saying that transient ranges are rare, abnormal, etc., and now you say they are a potentially large number of range types? I'm sorry, you've managed to utterly confuse me.  Are they rare, or are they not?
> > 
> > If they are rare, then this thin wrapper only needs to be written in those few rare cases. *No other range type needs to be changed.*
> > 
> > If they are common, then they aren't abnormal, and we should be fixing Phobos to deal with them. Everywhere. In a pervasive, invasive, large changeset, because almost all current code is broken w.r.t. to these common cases -- if indeed they are common.
> 
> Base ranges with transient fronts are rare, but as soon as you have to take them into account with wrapper ranges, then they potentially affect almost every range-based function in Phobos. So, you're taking a very rare case and forcing _everything_ to account for it. _That_ is why it affects a large number of range types.

Only wrapper ranges that explicit support transience need to propagate it. Wrapper ranges that don't support transience, by definition, should *not* propagate it, because that would be wrong (they don't support transience because they don't handle transient .front correctly, so they should not propagate it at all, otherwise they will exhibit buggy behaviour).


[...]
> > As I said, by having a UFCS version of .transient that simply returns the original range, you don't even need to know what .transient does.  Don't even include it in your range, and it just behaves as a normal non-transient range. How is that any more complicated?
> 
> You can't just return the original range from a wrapper range and have it work. Take filter for instance, if you got at the range that it wrapped, you'd be completely bypassing filter, and the range wouldn't be filtered at all.

No, you're misunderstanding how it works. Say I have this range:

	struct MyRange {
		@property auto front() { ... }
		@property bool empty() { ... }
		void popFront() { ... }
		auto transient() { return MyTransientRange(this); }
	}

Now, say filter doesn't support transience. In which case we simply do this:

	auto filter(alias pred, R)(R range) {
		struct Filter(alias pred) {
			@property auto front() { /* filtered front */ }
			@property bool empty() { ... }
			void popFront() { ... }
			// NB: No .transient is defined
		}
		return Filter!pred(range);
	}

Now we call filter on our range:

	MyRange r;
	auto filtered = filter!"a<10"(r);

What happens? Filter doesn't define .transient, so UFCS kicks in with the default function that simply returns Filter!pred (NOT r). This is the correct behaviour, because filter() doesn't support transience. It would be wrong to add this line to struct Filter:

	auto transient() { return _filterImpl(range.transient); }

because the filter implementation does NOT support transience. So in this case, .transient shouldn't be propagated.


> Any and all wrapper ranges would have to take transient into account. If they couldn't do transient ranges, then they just wouldn't propagate or use transient. But if they could, then they'd have to create their own transient property which returned an appropriate range type which functioned properly with transience. Yes, it would use the transient property of the range that it was wrapping, but it would have to create its own transient property which functioned properly with whatever it was doing.

Which is only a small cost, as I demonstrated with the thin wrapper.


> The _only_ way that this doesn't affect a lot of ranges in Phobos is if we just don't bother to propagate the transient property, and if we don't bother to propagate it, then there was no point in having it in the first place. Such ranges might as well have their own functions for iterating or just use opApply if no wrapper ranges are going to propagate their extra properties or if no range-based functions take those extra properties into account.
[...]

Like I said, it only affects Phobos ranges that explicitly support transience. Those are the ranges that *should* be modified in the first place. Those that don't support .transient, simply don't propagate it (and they *shouldn't* propagate it, because that would be a bug).  No changes are required in that case.

And of those ranges that should support transience, well, they are already failing today to work with transient ranges, so they need to be fixed anyway. With this proposal, we even get to delay the fix because we make all ranges non-transient by default, so the broken code will actually start working correctly, even *before* we fix it(!).


T

-- 
People tell me that I'm paranoid, but they're just out to get me.
November 05, 2012
Le 05/11/2012 04:47, Jonathan M Davis a écrit :
> On Sunday, November 04, 2012 19:41:39 H. S. Teoh wrote:
>> On Sun, Nov 04, 2012 at 07:11:22PM -0800, Jonathan M Davis wrote:
>>> On Sunday, November 04, 2012 15:40:18 deadalnix wrote:
>>>> Le 04/11/2012 03:26, Jonathan M Davis a écrit :
>>>>> 3. Make it so that ranges which can be transient are non-transient
>>>>> by default but provide a function to get at a transient version
>>>>> for speed (which was the fastRange proposal in this thread). The
>>>>> main problem here is that when the fast range gets wrapped, it's
>>>>> transient, and so anything using the wrapped range will be forced
>>>>> to use the transient version rather than using the non- transient
>>>>> version and only using the transient version when it's asked for.
>>>>> So, I don't think that this is particularly viable.
>>>>
>>>> Can you elaborate on that. I definitively don't see a problem that
>>>> big here.
>>>
>>> The problem is that you can't opt out of the fast range once you've
>>> got it. If you use fastRange and then the result ends up getting
>>> wrapped by another range, that range's front is transient and has no
>>> way of indicating it. So, you're right back in the boat that we're in
>>> right now. In order to avoid that, virtually all range-based functions
>>> would have to not use fastRange, because they have no idea whether
>>> you're going to use their result to pass to another function or not.
>>
>> [...]
>>
>> The crux of what deadalnix proposed is that range transformers simply
>> pass .transient along, whilst range consumers decide whether or not to
>> use .transient. So this isn't even an issue until you are in a range
>> consumer, which can decide whether or not the transient range should be
>> used.
>>
>> IOW, the choice between .transient or not isn't even made until the end.
>> The range consumer is the one in the position to decide whether or not
>> .transient should be used.
>
> And how on earth would it decide that when it's been wrapped by who-knows how
> many wrapper ranges? It would be far too late at that point. Unless you expect
> that every one of those ranges is going to duplicate itself such that when you
> call fastRange or transient or whatever you can get a different range type
> which is transient? That's worse than inTransient IMHO. The only advantage
> that it has is that if you fail to take it into account, you end up with the
> non-transient range, and your code doesn't end up doing crazy stuff. But in
> either case, all of the range types have to take it into account, complicating
> their implementations that much further, and fastRange would complicate those
> implementations far more than isTransient would, because it would require code
> duplication rather than just statically disallowing the transient range when
> it wouldn't work.
>
> - Jonathan M Davis

That is not true. As shown by sample code before the following happens :

Transient source ranges can be wrapped into the non transient one, using alias this and implementing only .front to dup. .transient then just unwrap. Code duplication is avoided most of the time.

Transformer ranges are usually transient or not according to their source. Joiner proposal by H. S. Teoh is done quickly, but in fact, you notice that joiner can have just one implementation, and that its transientness will depends on source transcientness. In this case, code duplication is avoided too.
November 05, 2012
Le 05/11/2012 05:42, Jonathan M Davis a écrit :
> On Sunday, November 04, 2012 20:20:58 H. S. Teoh wrote:
>> What's not to like about this proposal?
>
> It creates massive code duplication all over the place, and it's yet one more
> range concept that everyone has to remember and keep in mind. Remember that
> ranges are supposed to be perfectly writable by the average D user, not just
> Phobos devs. And this complicates them yet further. If anything, we need to
> _simplify_ ranges, not make them more complex (e.g. getting rid of the move*
> functions). And this proposal not only makes them more complex, it ends up
> requiring that a lot more code be written.
>
> Sure, the fact that ranges are non-transient by default reduces the problem,
> but that just means that it's that much less likely to actually be used,
> meaning that we're complicating ranges even further for something that most
> people are going to forget about completely.
>

That is probably true.

But, whatever the solution, this will be true. So it is better, considering this will be true, that the user get correct code.

> I honestly think that you're pushing for an incredibly abnormal use case here,
> and that it's just not worth the extra complication that it requires.
>

It isn't the first time the subject pop's, and I have already encountered the issue in my own code too (in a rubik's cube solver and in compiler code to be precise). We may not have the same coding style or code on apps that require different approach, but the problem seems real to me.

> It's bad enough to be stuck trying to support transient fronts with input
> ranges, but input ranges are so useless in general that it may actually end up
> affecting very few functions in reality. Anything like isTransient or transient
> or fastRange affects _everything_, and just because it can be added in piece by
> piece instead of all at once doesn't change that fact. It just allows us to
> spread out the work.
>

Dealing with transient range require care anyway. Bunch of code have to be touched, whatever the solution choosen. Most phobos act like .front isn't transient and currently just expose undefined behavior.
November 05, 2012
Le 05/11/2012 08:58, Jonathan M Davis a écrit :
> On Sunday, November 04, 2012 21:07:46 H. S. Teoh wrote:
>> Stop right there. You're contradicting yourself. You kept saying that
>> transient ranges are rare, abnormal, etc., and now you say they are a
>> potentially large number of range types? I'm sorry, you've managed to
>> utterly confuse me.  Are they rare, or are they not?
>>
>> If they are rare, then this thin wrapper only needs to be written in
>> those few rare cases. *No other range type needs to be changed.*
>>
>> If they are common, then they aren't abnormal, and we should be fixing
>> Phobos to deal with them. Everywhere. In a pervasive, invasive, large
>> changeset, because almost all current code is broken w.r.t. to these
>> common cases -- if indeed they are common.
>
> Base ranges with transient fronts are rare, but as soon as you have to take
> them into account with wrapper ranges, then they potentially affect almost
> every range-based function in Phobos. So, you're taking a very rare case and
> forcing _everything_ to account for it. _That_ is why it affects a large number
> of range types.
>

Don't you understand that every other solution proposed here, including Andrei's, have the same very drawback ?

I'll add that this drawback is much worse, because with the ;transient solution, unmodified code will work correctly. With all other proposed behaviors, non updated code will expose undefined behavior when facing transient ranges, which is much worse.
November 05, 2012
On Monday, November 05, 2012 15:48:41 deadalnix wrote:
> HS Teoh explained it nicely.
> 
> The responsibility of using .transient or not belong to the consumer. Only the consumer can know if a transient range is suitable.
> 
> So you don't wrap a transient range. You wrap a non transient range, and, if the consumer is able to handle the transcient version, it call .transient on its source, that call it on its source, etc . . . up to the real source.
> 
> If one transformer is unable to handle transient range, the it don't pass .transient to its source.

You still need to wrap it in every wrapper range which could possibly support transience. So, this affects every single range which could possibly support transience.

At least with Andrei's proposal, transience is explicitly restricted to input ranges, which seriously reduces the problems caused by them, particularly since so few functions can really function with just an input range.

- Jonathan M Davis
November 05, 2012
Le 05/11/2012 19:42, Jonathan M Davis a écrit :
> On Monday, November 05, 2012 15:48:41 deadalnix wrote:
>> HS Teoh explained it nicely.
>>
>> The responsibility of using .transient or not belong to the consumer.
>> Only the consumer can know if a transient range is suitable.
>>
>> So you don't wrap a transient range. You wrap a non transient range,
>> and, if the consumer is able to handle the transcient version, it call
>> .transient on its source, that call it on its source, etc . . . up to
>> the real source.
>>
>> If one transformer is unable to handle transient range, the it don't
>> pass .transient to its source.
>
> You still need to wrap it in every wrapper range which could possibly support
> transience. So, this affects every single range which could possibly support
> transience.
>

As shown before, most wrapper range wouldn't have to do a thing except rewraping with .transient . The typical situation is that a wrapper range get its transientness from the wraped range. So it boils down to womething like :

struct Wrapper(Wrapped) {
    Wrapped e;

    // Range implementation

    @property auto transient() {
        return Wrapper!(typeof(e.transient))(e.transient);
    }
}

Considering that doing a wrapper that is compatible with transient ranges is already some work and require the writer to be aware of such a concept (you can expect most programmer to simply ignore that fact).

The extra work for a wrapper range is really small, and add the benefice to ensure that the developer that programmed the wrapper took special care to ensure this will work with transient.

> At least with Andrei's proposal, transience is explicitly restricted to input
> ranges, which seriously reduces the problems caused by them, particularly
> since so few functions can really function with just an input range.
>

This proposal is simplistic. array() is not even implementable with such an approach. It is also slower for several use cases shown in this thread.

And finally, it is likely that many non transient compatible stuff will proliferate, leading to undefined behavior all over the place.

Simply telling to people to follow some guideline is not going to work. Many lead devs have experienced that with small teams, and we are talking here about every single D user.