Transience of .front in input vs. forward ranges

October 30, 2012

Posted by H. S. Teoh

Permalink

H. S. Teoh

Permalink

Now that Andrei is back (I think?), I want to bring up this discussion again, because I think it's important.

Recently, in another thread, it was found that std.algorithm.joiner doesn't work properly with input ranges whose .front value is invalidated by popFront(). Andrei stated that for input ranges .front should not be assumed to return a persistent value, whereas for forward ranges, .front can be assumed to be persistent. However, Jonathan believes that .front should never be transient.

Obviously, both cannot be the case simultaneously. So we need to decide
exactly what it should be, because the current situation is subtly
broken, and this subtle brokenness is pervasive. For example, I recently
rewrote joiner to eliminate the assumption that .front is persistent,
only to discover that in the unittests, I can't use array() or equal()
(or, for that matter, writefln()), because they apparently all make this
assumption at some point (I didn't bother to find out exactly where).

In other words, right now input ranges really only work with arrays and array-like objects. Not the generic ranges that Andrei has in mind in his article "On Iteration". Many input ranges will subtly break, the prime whipping boy example being byLine (which I hate to bring up because it does not represent the full scope of such transient ranges), a range that returns in-place permutations of an array, or anything that reuses a buffer, really.

This situation isn't as simple as input ranges being transient and forward ranges not, though. I want to bring another example besides the dead horse byLine() to the spotlight. Let's say you have a range R that spans all permutations of some starting array A. For efficiency reasons, we don't want to allocate a new array every time we return a permutation; so we have an internal buffer in R that holds the current permutation, which is what .front returns. Then popFront() simply permutes this buffer in-place. Something like this:

	struct AllPermutations(T) {
		T[] front, first;
		bool done;

		this(T[] initialBuf) {
			first = initialBuf;
			front = first.dup;
		}
		void popFront() {
			nextPermutation(current);
			if (equal(front, first))
				done = true;
		}
		@property bool empty() {
			return !done;
		}
	}

This is an input range, according to Andrei's definition. The value of .front is transient, since popFront() modifies it in-place. According to Jonathan's definition, however, this isn't a valid range for that very reason.

Now consider what happens if we add this member:

	auto save() {
		AllPermutations!T copy;
		copy.front = this.front;
		copy.first = this.first;
		copy.done = this.done;
		return copy;
	}

This returns a separate instance of the same range, starting with the current permutation, and ending with the original permutation, as before. I submit that this makes it a forward range. However, this fails to be a forward range under Andrei's definition, because forward ranges require .front to be persistent. So we'd have to modify the range to be something like this:

	struct AllPermutations(T) {
		T[] current, first;
		bool done;

		this(T[] initialBuf) {
			first = initialBuf;
			current = first.dup;
		}
		void popFront() {
			nextPermutation(current);
			if (equal(current, first))
				done = true;
		}
		@property bool empty() {
			return !done;
		}
		@property T[] front() {
			return current.dup; // <--- note this line
		}
		auto save() {
			AllPermutations!T copy;
			copy.front = this.front;
			copy.first = this.first;
			copy.done = this.done;
			return copy;
		}

	}

Note that now we have to duplicate the output array every time .front is accessed. So whatever gains we may have had by using nextPermutation to modify the array in-place is lost, just so that we can conform to an arbitrary standard of what a forward range is.

Under Jonathan's definition, we'd have to incur this cost regardless of whether we had save() or not, since .front is *always* required to be persistent.

But I propose that the correct solution is to recognize that whether or
not .front is transient is orthogonal to whether a range is an input
range or a forward range. Many algorithms actually don't care if .front
is persistent or not (at least in theory), including equal(), find(),
map(), count(), reduce(), joiner(). Maybe they *currently* assume that
.front is persistent, but they are easily implementable *without* this
assumption (I have an implementation of joiner that doesn't make this
assumption).

Not allowing forward ranges to have transient .front values, or not allowing transient .front values at all, will introduce the arbitrary need for lots of implicit copying, which makes D code inefficient. It will also limit the applicability of std.algorithm and std.range (if said the cost of said copying is unacceptable for a particular application).

The problem, of course, is how to check of a range has transient .front or not. I proposed adding a .isTransient member to the range, but this depends on the range implementor to remember to mark the range with .isTransient=true, and we know that coding by convention is not scalable. So here's another idea:

	template isTransient(R) if (isInputRange!R)
	{
		static if (is(typeof(R.front) : immutable(typeof(R.front))))
			enum isTransient = false;
		else
			enum isTransient = true;
	}

The idea is that value types are implicitly convertible to immutable, and value types are non-transient. Reference types, if they can be made immutable, ensures that calling popFront() will never invalidate them (by definition of immutable).

To use byLine as an example, if we want to make it non-transient, we simply have .front return string instead of char[]. Then we can safely use it with any of the existing algorithms that assume that .front won't mutate under them when popFront() is called.

Algorithms that *don't* need .front to be persistent can just take input/forward/whatever ranges as usual.

In any case, whether we decide to do this or not, we NEED to set down a clear, unambiguous definition of exactly what an input range is, and what a forward range is, and what kind of assumptions may be safely made regarding the value of .front. The current ambiguous situation is a hiding place for subtle bugs, and is unacceptable.


T

-- 
Long, long ago, the ancient Chinese invented a device that lets them see through walls. It was called the "window".

Le 30/10/2012 23:29, H. S. Teoh a écrit :
> Now that Andrei is back (I think?), I want to bring up this discussion
> again, because I think it's important.
>

Interesting post.

It appears to me that invalidating .front is done for performances reasons most of the time. As this is the unsafe face of the coin, I strongly think that this shouldn't be the default behavior.

To me the best solution is to provide a function or a property on ranges that provide the unsafe version of the range.

It is easy to provide a default implementation as UFCS that is a NOOP so no code is being broken.

With that design, it is possible to provide an always safe interface while allowing algorithm that can handle non persistent .front to run at full speed.

As it is always safe to provide a range that persist .front when a range that do not persist it is expected, no code needs to be broken. Obviously, the performance boost will not show up in this case :D

byLine and alike will have to be modified to follow this policy, but hopefully, no code should be broken in the process.

If I use your permuting example, it should be done as follow :

struct InvalidatingAllPermutations(T) {
	T[] current, first;
	bool done;

	this(T[] initialBuf) {
		first = initialBuf;
		current = first.dup;
	}

	void popFront() {
		nextPermutation(current);
		if (equal(current, first))
			done = true;
	}

	@property bool empty() {
		return !done;
	}

	@property T[] front() {
		return current;
	}

	auto save() {
		AllPermutations!T copy;
		copy.front = this.front;
		copy.first = this.first;
		copy.done = this.done;
		return copy;
	}
}

struct AllPermutations(T) {
	InvalidatingAllPermutations!T innerRange;
	alias innerRange this;

	T[] current;

	this(T[] initialBuf) {
		innerRange = InvalidatingAllPermutations!T(initialBuf);
		current = innerRange.front.dup;
	}

	@property T[] front() {
		return current;
	}

	void popFront() {
		innerRange.popFront();
		current = innerRange.front.dup;
	}
}

I do think this is the way that conciliate safe as default but still allow to be fast when needed, without adding much on most code.

Forums