Range returning an array

April 09, 2013

Re: Range returning an array

Posted by H. S. Teoh

Permalink

H. S. Teoh

Permalink

On Tue, Apr 09, 2013 at 10:03:01PM +0200, Joseph Rushton Wakeling wrote:
> Hello all,
> 
> I have a situation in a program I'm writing where it's convenient to define a range whose return value is an array.  Something like:

Congratulations! You've created your first transient range. :)

[...]
> Now, the problem with a design like this is that it's unsafe, in the sense that, let's say I do something like,
> 
> 	T[] output;
> 	foreach(opvar; mySim)
> 		output = opvar;
> 
> ... then, at the end of this loop, output will not be the same as it was set to with the last assignment, because popFront() will have been called one last time prior to the "empty" condition being set to true.

Yes, this is the typical behaviour of transient ranges.

> Now, there are a number of ways I can think of to avoid this.  One is to return var.dup rather than var from front(), but this is undesirable because it will hit performance in a big way.

Right, this is to make it a non-transient range, which has the associated performance hit.

> Another might be to calculate the updated value of var in a temporary array, and update var itself only if the diff is greater than the convergence criterion.

Personally, my preference is to let the user code call .dup to copy the value instead of aliasing it. The crux of the issue is that returning an array is by reference, so it's equivalent to returning a reference.  If you were calling a ref function, you'd be careful about assuming whether its value would change afterwards. The problem is that the range API conflates by-ref and by-value .front's, and there's currently no way to tell them apart aside from knowing the specifics of the range you're dealing with.

> However, I'm curious enough about this problem that I thought I'd throw it open to everyone else to see if anyone has a better idea than one of these.  The challenge here is to ensure really good performance while not risking bad or incorrect results leaking outside of the range.
[...]

Some time ago on the main D mailing list, somebody (deadalnix?) suggested extending the range interface to have a .transient function that returns a transient version of the range. The idea goes something like this:

	struct MyRange {
		auto front() {
			// Always .dup by default so there are no
			// surprises
			return frontImpl.dup;
		}
		auto transient() {
			struct TransientWrapper {
				MyRange r;

				// User specifically asks for high
				// performance transient range, so no
				// .dup here, they are responsible for
				// dealing with the aliasing effect
				// themselves
				auto front() {
					return r.frontImpl;
				}
			}
			return TransientWrapper(this);
		}
		... // other range functions
	}

Here's how you use it:

	void slowButSafeCode(MyRange r) {
		T[] x;

		// Use the range directly: everything is .dup'd by
		// default, it's slow but always safe.
		foreach (e; r) {
			x = e;	// this will be correct by default
		}
	}

	void fastTransientCode1(MyRange r) {
		// Specifically ask for transient range: we take
		// responsibility to call .dup if necessary
		auto tr = r.transient;

		foreach (e; tr) {
			// We're not storing the transient data, so no .dup
			doSomething(e);

			// Here we selectively .dup the elements we need
			// to cache.
			if (complexCondition(e))
				cache(e.dup);
		}
	}

The idea is that the range should be non-transient by default (so that user code ignorant of transience will always be correct by default), but if the user wants the speed, they can explicitly ask for a transient range and take responsibility for using it correctly.

T

-- 
Fact is stranger than fiction.

On 04/09/2013 10:37 PM, H. S. Teoh wrote:
> Congratulations! You've created your first transient range. :)

:-)

Well, the problem isn't transience per se.  It's acceptable that popFront() overwrite previous returned values -- the problem is strictly that in deciding that the range is empty, the 'last' value is overwritten.

> Personally, my preference is to let the user code call .dup to copy the value instead of aliasing it. The crux of the issue is that returning an array is by reference, so it's equivalent to returning a reference.  If you were calling a ref function, you'd be careful about assuming whether its value would change afterwards. The problem is that the range API conflates by-ref and by-value .front's, and there's currently no way to tell them apart aside from knowing the specifics of the range you're dealing with.

I'll give the take-by-dup rather than return-by-dup option a go just to see if it can work.  Unfortunately in practice it's a bit more complex because some of my use cases might want to return a more complex data structure than just an array (e.g. a struct containing one or two arrays and some values).

Thanks for the thoughts and especially for the in-depth explanation on transience! :-)

Forums