November 05, 2012
On Sun, Nov 04, 2012 at 07:11:22PM -0800, Jonathan M Davis wrote:
> On Sunday, November 04, 2012 15:40:18 deadalnix wrote:
> > Le 04/11/2012 03:26, Jonathan M Davis a écrit :
> > > 3. Make it so that ranges which can be transient are non-transient by default but provide a function to get at a transient version for speed (which was the fastRange proposal in this thread). The main problem here is that when the fast range gets wrapped, it's transient, and so anything using the wrapped range will be forced to use the transient version rather than using the non- transient version and only using the transient version when it's asked for. So, I don't think that this is particularly viable.
> > 
> > Can you elaborate on that. I definitively don't see a problem that big here.
> 
> The problem is that you can't opt out of the fast range once you've got it. If you use fastRange and then the result ends up getting wrapped by another range, that range's front is transient and has no way of indicating it. So, you're right back in the boat that we're in right now. In order to avoid that, virtually all range-based functions would have to not use fastRange, because they have no idea whether you're going to use their result to pass to another function or not.
[...]

The crux of what deadalnix proposed is that range transformers simply pass .transient along, whilst range consumers decide whether or not to use .transient. So this isn't even an issue until you are in a range consumer, which can decide whether or not the transient range should be used.

IOW, the choice between .transient or not isn't even made until the end. The range consumer is the one in the position to decide whether or not .transient should be used.


T

-- 
A bend in the road is not the end of the road unless you fail to make the turn. -- Brian White
November 05, 2012
On Sunday, November 04, 2012 19:41:39 H. S. Teoh wrote:
> On Sun, Nov 04, 2012 at 07:11:22PM -0800, Jonathan M Davis wrote:
> > On Sunday, November 04, 2012 15:40:18 deadalnix wrote:
> > > Le 04/11/2012 03:26, Jonathan M Davis a écrit :
> > > > 3. Make it so that ranges which can be transient are non-transient by default but provide a function to get at a transient version for speed (which was the fastRange proposal in this thread). The main problem here is that when the fast range gets wrapped, it's transient, and so anything using the wrapped range will be forced to use the transient version rather than using the non- transient version and only using the transient version when it's asked for. So, I don't think that this is particularly viable.
> > > 
> > > Can you elaborate on that. I definitively don't see a problem that big here.
> > 
> > The problem is that you can't opt out of the fast range once you've got it. If you use fastRange and then the result ends up getting wrapped by another range, that range's front is transient and has no way of indicating it. So, you're right back in the boat that we're in right now. In order to avoid that, virtually all range-based functions would have to not use fastRange, because they have no idea whether you're going to use their result to pass to another function or not.
> 
> [...]
> 
> The crux of what deadalnix proposed is that range transformers simply pass .transient along, whilst range consumers decide whether or not to use .transient. So this isn't even an issue until you are in a range consumer, which can decide whether or not the transient range should be used.
> 
> IOW, the choice between .transient or not isn't even made until the end. The range consumer is the one in the position to decide whether or not .transient should be used.

And how on earth would it decide that when it's been wrapped by who-knows how many wrapper ranges? It would be far too late at that point. Unless you expect that every one of those ranges is going to duplicate itself such that when you call fastRange or transient or whatever you can get a different range type which is transient? That's worse than inTransient IMHO. The only advantage that it has is that if you fail to take it into account, you end up with the non-transient range, and your code doesn't end up doing crazy stuff. But in either case, all of the range types have to take it into account, complicating their implementations that much further, and fastRange would complicate those implementations far more than isTransient would, because it would require code duplication rather than just statically disallowing the transient range when it wouldn't work.

- Jonathan M Davis
November 05, 2012
On Sun, Nov 04, 2012 at 07:21:13PM -0800, Jonathan M Davis wrote: [...]
> Now, if we say that we need transient fronts for some input ranges but don't want to add an isTransient trait or something similar, and so we're going to say that transient fronts are only acceptable for input ranges and that algorithms have to assume that front can be transient for input ranges, then fine. I don't really like it, but I'm definitely in favor of simplifying things here.

I still think deadalnix's proposal is the simplest and most elegant so far. To summarize:

- We define a .transient property for ranges that default (via UFCS) to
  simply return the range:

	@property R transient(R)(R range) { return range; }

- We modify File.byLine() to be non-transient, and provide the current
  transient version as a .transient property:

	struct ByLine(C) {
		...
		C[] front() { return buffer.dup; }
		auto transient() {
			struct TransientByLine(C) {
				// current implementation of ByLine
			}
			return TransientByLine!C();
		}
	}

- Range transformations that cannot handle transient ranges do not need
  to be changed. UFCS will cause .transient on the result ranges to just
  alias the (non-transient) range itself. This is the default situation.

- Range transformations that *can* handle transient ranges will return
  a range that has a .transient property. For example, if we make joiner
  transient-capable:

	auto joiner(R)(R ror) {
		struct Joiner {
			...
			auto front() {
				// non-transient implementation
			}

			auto transient() {
				struct TransientJoiner {
					auto front() {
						// transient implementation
					}
				}
			}
		}
		return Joiner(ror);
	}

  The important thing to note here, is that joiner returns a *non*
  transient range by default. You do NOT get a transient range out of it
  unless you specifically ask for .transient on the resulting range.

- Range consumers decide whether or not they can handle transience. If
  they can't, nothing needs to be done: the current implementation
  already works. If they can, then they simply ask for .transient from
  the range passed in. For example, canFind() is such a candidate:

	bool canFind(R)(R r1) {
		// This line asks for a transient range. Remember, if r1
		// isn't transient, this automatically just becomes r1
		// itself.
		auto r = range.transient;

		while (!r.empty) {
			... // implementation here
		}
		return result;
	}

  The important thing to note here is that if canFind gets a
  non-transient range, then nothing different happens, it just runs as
  usual. It only gets a transient range if r1 is capable of transience.


Why do I think this is simpler than the other proposals (including my
own)? Because:

(1) We don't even have to change any existing code for everything to work correctly (besides making byLine non-transient and implementing the default .transient via UFCS). Everything by default is non-transient, and all existing code continues to work. They may run a bit slower due to byLine .dup'ing the internal buffer, but they will run *correctly*.

(2) Existing algorithms can be updated one-by-one to take advantage of transience. If a transient-capable algorithm isn't updated, then it just defaults to using non-transient ranges: it will not run faster, but it will run *correctly*.

Once an algorithm is updated, then transient ranges can be used where supported, so you get speedups where the algorithm has specifically been designed to deal with transient ranges.

In short, nothing needs to be done immediately. We can take our time to update existing algorithms one by one. In the meantime, correctness and safety is not compromised. Just a little speed sacrifice is all.

(3) New algorithms that are written without transience in mind will automatically work correctly -- because all ranges are non-transient by default. So new code doesn't even need to worry about transience. They can be updated later to take advantage of transience, if we decide to do so.

(4) The *user* doesn't have to care about transient ranges. It is the range consumers that decide whether or not they can consume a transient range, and they are the ones that decide whether a transient range is actually used. User-defined ranges will be non-transient by default; if they know what they're doing, then they can define .transient to allow reusing buffers. But if not, everything falls back to a safe (if a tad slower) default.


IOW, this change is non-intrusive, doesn't demand massive code changes, can be done incrementally, and gets rid of cognitive burden on user code to handle transient ranges carefully. The user doesn't even need to know there's such a thing as a transient range and everything will still work correctly. If they use transient-aware Phobos range consumers, they get a speed boost without even needing to know transient ranges were involved.

What's not to like about this proposal?


> However, that means that we need to be very clear about the fact that input ranges can have transient fronts and that algorithms cannot assume that their fronts are not transient (something that only you appear to have thought was clear). We then either have to change a bunch of algorithms so that they require forward ranges, or we need to create a trait which can determine whether front is transient in it least some cases (it would still be claiming that front is transient in some cases where it isn't, because it can't know for sure, but we'd at least be able to figure it out in some cases), and make algorithms use that in their constraints when they deal with input ranges. std.array.array would be a prime case where an algorithm would have to be changed, because it can't function with a transient front, and it's a prime example of a function that you'd normally expect to work with an input range.  And this _will_ break code. It'll allow us to fix the ByLine problem though.
[...]

Using deadalnix's proposal, std.array.array would be a non-issue. Since ranges will be non-transient by default, array() will Just Work with no further effort.

No existing algorithms need to be updated. If they aren't written to use .transient, then they will just get the default non-transient range, and work correctly as-is.

No existing code will break. They may just run slightly slower due to byLine() being non-transient now, but in no case will there be broken code. We can just fix byLine() to be non-transient right now, without doing anything more, and everything will Just Work.

We will just have to update the docs to state that, by default, ranges must be non-transient.

Only when we want to optimize certain algorithms, will we even need to worry about .transient. Unoptimized algorithms continue to work with no change.

The .transient property doesn't even need to be propagated: if it is not propagated, then everything just defaults to the non-transient version of the range. The optimization only kicks in when we specifically implement .transient for that range.

This is much less intrusive than any other alternative I've seen so far.


T

-- 
"I'm not childish; I'm just in touch with the child within!" - RL
November 05, 2012
On Sun, Nov 04, 2012 at 07:47:49PM -0800, Jonathan M Davis wrote:
> On Sunday, November 04, 2012 19:41:39 H. S. Teoh wrote:
[...]
> > The crux of what deadalnix proposed is that range transformers simply pass .transient along, whilst range consumers decide whether or not to use .transient. So this isn't even an issue until you are in a range consumer, which can decide whether or not the transient range should be used.
> > 
> > IOW, the choice between .transient or not isn't even made until the end.  The range consumer is the one in the position to decide whether or not .transient should be used.
> 
> And how on earth would it decide that when it's been wrapped by who-knows how many wrapper ranges? It would be far too late at that point.

I think you're missing the point here. The range consumer simply calls .transient on the range that gets passed in. If the range doesn't define .transient, then it falls back via UFCS to the default .transient, which just returns the range itself.

Only if the range defines .transient, will an actual transient range get returned.


> Unless you expect that every one of those ranges is going to duplicate itself such that when you call fastRange or transient or whatever you can get a different range type which is transient?

If a range does not support transience, then it needs not do anything more. UFCS kicks in, and the range itself gets returned.

Only when the range can support transience, will it return a transient version of itself. All ranges will be non-transient by default.


> That's worse than inTransient IMHO. The only advantage that it has is that if you fail to take it into account, you end up with the non-transient range, and your code doesn't end up doing crazy stuff. But in either case, all of the range types have to take it into account, complicating their implementations that much further, and fastRange would complicate those implementations far more than isTransient would, because it would require code duplication rather than just statically disallowing the transient range when it wouldn't work.
[...]

No, all existing ranges will work as-is (except for the currently transient ranges, which will be modified to become non-transient). They will not define .transient, UFCS kicks in, and any range consumer that calls .transient on them simply gets the original range back.

Only when the range supports .transient, *and* the range consumer asks for it, will the transient range be used. IOW, only those algorithms that are specifically transience-capable need to be updated to call .transient (and even if they aren't updated, they will still work, just slower). Only the ranges that can be made transient need to implement .transient. Nothing else needs to be touched.

Most user code won't even *see* transient ranges, so they will be safe by default.  (The only time user code needs to worry about .transient is when they want to write a range consumer that wants to take advantage of the speed up. In all other cases, transient ranges aren't even used, so the code is safe by default.)


T

-- 
Just because you survived after you did it, doesn't mean it wasn't stupid!
November 05, 2012
On Sun, Nov 04, 2012 at 07:47:49PM -0800, Jonathan M Davis wrote: [...]
> But in either case, all of the range types have to take it into account, complicating their implementations that much further, and fastRange would complicate those implementations far more than isTransient would, because it would require code duplication rather than just statically disallowing the transient range when it wouldn't work.
[...]

No code duplication is actually necessary, even when the range is transient-capable. For example:

	// New version of byLine
	struct ByLine {
		char[] buffer;

		@property bool empty() {
			...
		}

		@property char[] front() {
			// NOTE: non-transient by default
			return buffer.dup;
		}

		void popFront() {
			...
		}

		auto transient() {
			struct TransientByLine {
				ByLine innerRange;

				// This is the only code that needs to
				// be "duplicated". Because it's
				// actually different!
				@property char[] front() {
					return innerRange.buffer;
				}

				// empty, popFront, etc., just "inherit"
				// from the inner range. No code
				// duplication needed.
				alias innerRange this;
			}
			return TransientByLine(this);
		}
	}


T

-- 
Tech-savvy: euphemism for nerdy.
November 05, 2012
On Sunday, November 04, 2012 20:20:58 H. S. Teoh wrote:
> What's not to like about this proposal?

It creates massive code duplication all over the place, and it's yet one more range concept that everyone has to remember and keep in mind. Remember that ranges are supposed to be perfectly writable by the average D user, not just Phobos devs. And this complicates them yet further. If anything, we need to _simplify_ ranges, not make them more complex (e.g. getting rid of the move* functions). And this proposal not only makes them more complex, it ends up requiring that a lot more code be written.

Sure, the fact that ranges are non-transient by default reduces the problem, but that just means that it's that much less likely to actually be used, meaning that we're complicating ranges even further for something that most people are going to forget about completely.

I honestly think that you're pushing for an incredibly abnormal use case here, and that it's just not worth the extra complication that it requires.

It's bad enough to be stuck trying to support transient fronts with input ranges, but input ranges are so useless in general that it may actually end up affecting very few functions in reality. Anything like isTransient or transient or fastRange affects _everything_, and just because it can be added in piece by piece instead of all at once doesn't change that fact. It just allows us to spread out the work.

- Jonathan M Davis
November 05, 2012
On Sunday, November 04, 2012 20:36:32 H. S. Teoh wrote:
> No code duplication is actually necessary, even when the range is transient-capable. For example:
> 
> 	// New version of byLine
> 	struct ByLine {
> 		char[] buffer;
> 
> 		@property bool empty() {
> 			...
> 		}
> 
> 		@property char[] front() {
> 			// NOTE: non-transient by default
> 			return buffer.dup;
> 		}
> 
> 		void popFront() {
> 			...
> 		}
> 
> 		auto transient() {
> 			struct TransientByLine {
> 				ByLine innerRange;
> 
> 				// This is the only code that needs to
> 				// be "duplicated". Because it's
> 				// actually different!
> 				@property char[] front() {
> 					return innerRange.buffer;
> 				}
> 
> 				// empty, popFront, etc., just "inherit"
> 				// from the inner range. No code
> 				// duplication needed.
> 				alias innerRange this;
> 			}
> 			return TransientByLine(this);
> 		}
> 	}

How is that not duplicated? You have to create a second range type to support the transience. Sure, it's a thin wrapper, but it's yet more code that has to be put in a potentially large number of range types. It's yet one more thing that has to been maintained and tested. We need to be simplifying ranges, not complicating them yet further. They're already seriously pushing it in terms of how complicated they are. How many people outside of this newsgroup actually, properly understand ranges as it is?

The fact that we lack good articles and tutorials on ranges is obviously a major factor in making it so that many D users or potential D users don't understand ranges and std.algorithm. But if they're too complicated, even explaining them won't get us there. How many people do you know who fully understand how to properly use iterators in C++? A lot of C++ programmers understand them just well enough to get by and end up in trouble pretty fast if they try and do anything complicated with them. And ranges are arguably more complicated than iterators. We need them to be usable and understandable by the average programmer as much as possible, and that means trying to simplify them, not complicate them, especially when that complication is for an uncommon use case.

- Jonathan M Davis
November 05, 2012
On Sun, Nov 04, 2012 at 08:42:42PM -0800, Jonathan M Davis wrote:
> On Sunday, November 04, 2012 20:20:58 H. S. Teoh wrote:
> > What's not to like about this proposal?
> 
> It creates massive code duplication all over the place, and it's yet one more range concept that everyone has to remember and keep in mind.

It doesn't require code duplication at all. Users don't have to know what .transient is. They just write a non-transient range and everything works.


> Remember that ranges are supposed to be perfectly writable by the average D user, not just Phobos devs. And this complicates them yet further. If anything, we need to _simplify_ ranges, not make them more complex (e.g. getting rid of the move* functions). And this proposal not only makes them more complex, it ends up requiring that a lot more code be written.

It only requires more code where more code is needed (see my other post for an example of how the new version of ByLine can implement .transient with hardly any code duplication).


> Sure, the fact that ranges are non-transient by default reduces the problem, but that just means that it's that much less likely to actually be used, meaning that we're complicating ranges even further for something that most people are going to forget about completely.

Which is kinda the point -- most people who don't care about the difference between transient and non-transient ranges don't have to do a thing, and everything Just Works.

People who *do* care simply write code that's aware of .transient, and they get to enjoy better performance. If some Phobos code is made transient-aware, users get to reap the benefit without needing to know what .transient is.


> I honestly think that you're pushing for an incredibly abnormal use case here, and that it's just not worth the extra complication that it requires.

It's not abnormal at all to me. It's perfectly natural to want to reuse buffers for the sake of better performance. Being *required* to call .dup on everything, or having to rewrite std.algorithm because I can't use it with transient ranges, is what's abnormal.

The whole idea behind ranges is sequential access -- there's nothing inherent about sequential access that says returned elements must be persistent. That's just limiting the scope of the concept's applicability by arbitrary assumption rather than any actual conceptual need.


> It's bad enough to be stuck trying to support transient fronts with input ranges, but input ranges are so useless in general that it may actually end up affecting very few functions in reality.

Exactly, so only those few functions actually need to be changed. Everything else stays the same.


> Anything like isTransient or transient or fastRange affects _everything_,

It doesn't. If the code isn't changed, it just defaults to non-transient ranges, which is what you were pushing for in the first place. And just to be clear, input ranges will *also* be non-transient by default. So no massive code change is needed.


> and just because it can be added in piece by piece instead of all at once doesn't change that fact. It just allows us to spread out the work.
[...]

It lets us add code only where transience matters, which is with things like byLine, and a few algorithms that actually care to take advantage of transient ranges. Nothing else is affected. It certainly doesn't affect "everything" -- if indeed, as you say, transient ranges are rare occurrences.


T

-- 
Right now I'm having amnesia and deja vu at the same time. I think I've forgotten this before.
November 05, 2012
On Sun, Nov 04, 2012 at 08:49:53PM -0800, Jonathan M Davis wrote:
> On Sunday, November 04, 2012 20:36:32 H. S. Teoh wrote:
> > No code duplication is actually necessary, even when the range is transient-capable. For example:
> > 
> > 	// New version of byLine
> > 	struct ByLine {
> > 		char[] buffer;
> > 
> > 		@property bool empty() {
> > 			...
> > 		}
> > 
> > 		@property char[] front() {
> > 			// NOTE: non-transient by default
> > 			return buffer.dup;
> > 		}
> > 
> > 		void popFront() {
> > 			...
> > 		}
> > 
> > 		auto transient() {
> > 			struct TransientByLine {
> > 				ByLine innerRange;
> > 
> > 				// This is the only code that needs to
> > 				// be "duplicated". Because it's
> > 				// actually different!
> > 				@property char[] front() {
> > 					return innerRange.buffer;
> > 				}
> > 
> > 				// empty, popFront, etc., just "inherit"
> > 				// from the inner range. No code
> > 				// duplication needed.
> > 				alias innerRange this;
> > 			}
> > 			return TransientByLine(this);
> > 		}
> > 	}
> 
> How is that not duplicated? You have to create a second range type to support the transience. Sure, it's a thin wrapper, but it's yet more code that has to be put in a potentially large number of range types.

Stop right there. You're contradicting yourself. You kept saying that transient ranges are rare, abnormal, etc., and now you say they are a potentially large number of range types? I'm sorry, you've managed to utterly confuse me.  Are they rare, or are they not?

If they are rare, then this thin wrapper only needs to be written in those few rare cases. *No other range type needs to be changed.*

If they are common, then they aren't abnormal, and we should be fixing Phobos to deal with them. Everywhere. In a pervasive, invasive, large changeset, because almost all current code is broken w.r.t. to these common cases -- if indeed they are common.


> It's yet one more thing that has to been maintained and tested. We need to be simplifying ranges, not complicating them yet further. They're already seriously pushing it in terms of how complicated they are. How many people outside of this newsgroup actually, properly understand ranges as it is?
[...]

As I said, by having a UFCS version of .transient that simply returns the original range, you don't even need to know what .transient does. Don't even include it in your range, and it just behaves as a normal non-transient range. How is that any more complicated?


T

-- 
Bare foot: (n.) A device for locating thumb tacks on the floor.
November 05, 2012
On Sunday, November 04, 2012 20:59:16 H. S. Teoh wrote:
> It lets us add code only where transience matters, which is with things like byLine, and a few algorithms that actually care to take advantage of transient ranges. Nothing else is affected. It certainly doesn't affect "everything" -- if indeed, as you say, transient ranges are rare occurrences.

The problem is that every single range-based function in Phobos which could possibly work with a transient range would have to take such ranges into account in the ranges that they create. Otherwise, the transience is lost, making transient absolutely useless. So, it affects a _lot_ more than just ranges like ByLine and ByChunk, and regardless of how rare such ranges are, the entirety of Phobos would have to take them into account whenever a wrapper range is created.

If you don't care about wrapper ranges propagating the transient property when they can, then there's no point in having it in the first place. You might as well just use opApply. And if you _do_ care, then it affects everything, even if base ranges with transient fronts are rare. I don't see how you can possibly think that this isn't a highly intrusive change to ranges.

- Jonathan M Davis