Jump to page: 1 2
Thread overview
More tricky range semantics
Jan 15, 2015
H. S. Teoh
Jan 16, 2015
H. S. Teoh
Jan 16, 2015
Dicebot
Jan 16, 2015
Dicebot
Jan 16, 2015
Tobias Pankrath
Jan 16, 2015
Tobias Pankrath
Jan 16, 2015
Tobias Pankrath
Jan 16, 2015
H. S. Teoh
Jan 16, 2015
H. S. Teoh
Jan 16, 2015
Tobias Pankrath
January 15, 2015
Just noticed this little gotcha today: suppose we have a forward range R, and an algorithm that wraps around R that looks something like this:

	struct WrapperRange {
		R src;
		bool stoppingCondition;
		this(R _src) {
			src = _src;
			stoppingCondition = someCondition(src.front);
		}
		@property bool empty() { return stoppingCondition; }
		@property auto front() { return src.front; }
		void popFront() {
			src.popFront();
			if (!src.empty && someCondition(src.front))
				stoppingCondition = true;
		}
		@property auto save() {
			typeof(this) copy = this;
			copy.src = src.save;
			return copy;
		}
	}

Basically, the wrapper iterates over R with some stopping condition that may cause it to end earlier than the actual end of R.

Now consider this innocuous-looking code:

	unittest {
		R r = ... /* get an instance of R */;
		auto wrapper = WrapperRange(r);

		// check that the contents of wrapper is what we expect
		assert(wrapper.equal(expectedData));

		// erhm... check the contents of wrapper again, just to
		// be doubly sure?
		assert(wrapper.equal(expectedData)); // <-- line 10
	}

You may laugh at line 10, marked above... but don't be so quick to laugh just yet. Consider now if R is a reference type. What would happen? Let's trace the steps:

- The "R r = ..." line is nothing unexpected: since R is a by-reference
  type, r simply stores a reference to the actual instance of R. Nothing
  surprising.

- The "auto wrapper" line creates an instance of WrapperRange -- which
  is by-value, btw, this will become important later -- and assigns to
  .src the reference to that instance of R.

- The next line calls equal() on wrapper, to verify that its contents
  are what we're expecting. This passes a copy of wrapper to equal(),
  because wrapper is a by-value type. But since R is a reference type,
  as equal() iterates over the copy of wrapper, the underlying R range
  is being popped.

  This has 2 consequences:

  1) When equal() returns, the original copy of wrapper no longer points
  to the same place in r as before, because equal() has popped the
  underlying instance of R.

  2) Furthermore, since equal() iterates over the entire wrapped range,
  .stoppingCondition is now true. However, this only affected the local
  copy of wrapper in equal(). This means that after equal() returns, the
  unittest's original copy of wrapper is now in an inconsistent state:
  r has reached an element for which someCondition() is true, but this
  is not reflected in wrapper, which still thinks it's in the original
  position in r as before the call to equal().

As a result, the assert on line 10 will fail.

This situation is quite counterintuitive. One would expect that either:

(i) wrapper has value semantics, meaning that passing it to equal()
would not consume it; or,

(ii) wrapper has reference semantics, meaning that passing it to equal()
would empty it.

However, what actually happens is that wrapper is left in an inconsistent state upon returning from equal() that is neither (i) nor (ii). The kicker is that this resulted from code that followed the range API to the letter. No hidden loophole in the range API was exploited. No tricky hacks were used that might have caused problems.

So what's the moral of the story?

a) At least as things currently stand, passing (wrapper) ranges around may exhibit "undefined" behaviour, like the above. Passing a range to a function may invalidate it unless you use .save.  Therefore, one should *always* use .save. (If we had passed wrapper.save to equal() instead, this problem would not have happened.) This applies even if the wrapper range is a by-value type. Or should we say, *especially* when it's a by-value type?

b) One may argue that WrapperRange ought to .save the underlying range in its postblit... but what if the only thing we wanted to do was to call equal() on wrapper and nothing else? In that case, we'd be incurring needless overhead of .save'ing the range when we didn't care whether it got consumed.

c) This issue is already latent in almost *all* Phobos algorithms.  We only haven't discovered it yet because most people just use arrays for ranges. But all it takes is for somebody to start using std.range.inputRangeObject (and there are cases where this is necessary), and this problem will surface.  Anytime you mix by-value and by-reference ranges, be ready for problems of this sort to arise.


Yep, range semantics just got even trickier. (And we thought transient
ranges were bad...)


T

-- 
"Computer Science is no more about computers than astronomy is about telescopes." -- E.W. Dijkstra
January 15, 2015
On 1/15/15 12:53 PM, H. S. Teoh via Digitalmars-d wrote:
> Passing a range to a
> function may invalidate it unless you use .save.  Therefore, one should
> *always*  use .save.

That's right. To simplify the problem space we might decree that forward (or better) ranges with reference semantics are not allowed. -- Andrei

January 15, 2015
On 1/15/15 12:53 PM, H. S. Teoh via Digitalmars-d wrote:
> This issue is already latent in almost*all*  Phobos algorithms.  We
> only haven't discovered it yet because most people just use arrays for
> ranges. But all it takes is for somebody to start using
> std.range.inputRangeObject (and there are cases where this is
> necessary), and this problem will surface.

There's a distinction here. Input non-forward ranges can be considered "reference" because popFront()ing any copy is tantamount to popFront()int any other copy. -- Andrei
January 16, 2015
On 16/01/15 00:24, Andrei Alexandrescu via Digitalmars-d wrote:
> That's right. To simplify the problem space we might decree that forward (or
> better) ranges with reference semantics are not allowed. -- Andrei

That would seem to place significant restrictions on the ability to define effective random number generators and related functionality ... ?

January 16, 2015
On Fri, Jan 16, 2015 at 01:58:15AM +0100, Joseph Rushton Wakeling via Digitalmars-d wrote:
> On 16/01/15 00:24, Andrei Alexandrescu via Digitalmars-d wrote:
> >That's right. To simplify the problem space we might decree that forward (or better) ranges with reference semantics are not allowed. -- Andrei
> 
> That would seem to place significant restrictions on the ability to define effective random number generators and related functionality ... ?

Not to mention, I just realized that while the example I used relied on the interaction between reference and value types, a similar problem exists with just value types alone. Using the same wrapper range I posted, suppose now that R is not a reference type range, but an input (non-forward) range, and that .save is not implemented. Then consider this code:

	unittest {
		auto r = R(...);
		auto wrapped = WrapperRange(r);

		assert(wrapped.equal(expectedData));
		bool b = wrapped.empty;
	}

Quiz: what's the value of b?

.

.

.

Go on, guess. :-)

.

.

.

Yup, b == false. Now guess what happens if you then access wrapped.front?

Well, that depends. If someCondition became true before r was exhausted, then wrapped.front would return a value that shouldn't be in the wrapped range, because it breaks the invariant that WrapperRange is supposed to stop when someCondition becomes true. However, if someCondition didn't become true and r was exhausted, accessing wrapped.front will call .front on an empty input range, which is UB. If you're lucky, you'll hit an assert; otherwise, anything could happen, like dereferencing a dangling pointer, reading invalid memory, etc..

So actually, my previous statement was not broad enough: this problem happens not just with reference type ranges; it happens with non-reference input ranges too!


T

-- 
If it breaks, you get to keep both pieces. -- Software disclaimer notice
January 16, 2015
On 1/15/15 4:58 PM, Joseph Rushton Wakeling via Digitalmars-d wrote:
> On 16/01/15 00:24, Andrei Alexandrescu via Digitalmars-d wrote:
>> That's right. To simplify the problem space we might decree that
>> forward (or
>> better) ranges with reference semantics are not allowed. -- Andrei
>
> That would seem to place significant restrictions on the ability to
> define effective random number generators and related functionality ... ?

Agreed. Pseudo-random generators are an interesting kind of range because they are forward yet do not iterate a "real" container. -- Andrei
January 16, 2015
On Thu, Jan 15, 2015 at 05:52:45PM -0800, Andrei Alexandrescu via Digitalmars-d wrote:
> On 1/15/15 4:58 PM, Joseph Rushton Wakeling via Digitalmars-d wrote:
> >On 16/01/15 00:24, Andrei Alexandrescu via Digitalmars-d wrote:
> >>That's right. To simplify the problem space we might decree that forward (or better) ranges with reference semantics are not allowed. -- Andrei
> >
> >That would seem to place significant restrictions on the ability to define effective random number generators and related functionality ... ?
> 
> Agreed. Pseudo-random generators are an interesting kind of range because they are forward yet do not iterate a "real" container. -- Andrei

I've been wondering about that. Must ranges have an "underlying" container? Or are they allowed to be more abstract entities that basically function as generators, producing data on demand? (This distinction is also somewhat related to transient ranges, in that the difference between providing a "view" of some static underlying data vs. computing something on-the-fly in a buffer that gets reused, could serve as a deciding factor on how to deal with transient ranges.)

My feeling is that allowing the latter is much more powerful, and more encompassing. Constructs like iota() and recurrence() belong to the latter category, for example, and they do provide forward range like functionality -- or they *could*, if they don't already, as it would be trivial to implement. Allowing generating functions to be ranges also allows one to plug in programmatic data generators as data sources in UFCS chains, without needing to deal with an artificial distinction between "view of underlying data" ranges and "generated-on-the-fly" ranges.

In this sense, an RNG could be thought of as a complex variant of
recurrence(), except with a far more complicated generating expression
than what one would normally use with recurrence(). If recurrence()
qualifies as a range (and a forward range, no less), then an RNG ought
to qualify too.


T

-- 
Those who don't understand Unix are condemned to reinvent it, poorly.
January 16, 2015
>
> Yep, range semantics just got even trickier. (And we thought transient
> ranges were bad...)
>
>
> T

This is related: https://issues.dlang.org/show_bug.cgi?id=11951
January 16, 2015
On Friday, 16 January 2015 at 00:58:34 UTC, Joseph Rushton Wakeling via Digitalmars-d wrote:
> On 16/01/15 00:24, Andrei Alexandrescu via Digitalmars-d wrote:
>> That's right. To simplify the problem space we might decree that forward (or
>> better) ranges with reference semantics are not allowed. -- Andrei
>
> That would seem to place significant restrictions on the ability to define effective random number generators and related functionality ... ?

Why RNG would be a forward range as opposed to just input range?
January 16, 2015
On Friday, 16 January 2015 at 14:58:09 UTC, Dicebot wrote:
> On Friday, 16 January 2015 at 00:58:34 UTC, Joseph Rushton Wakeling via Digitalmars-d wrote:
>> On 16/01/15 00:24, Andrei Alexandrescu via Digitalmars-d wrote:
>>> That's right. To simplify the problem space we might decree that forward (or
>>> better) ranges with reference semantics are not allowed. -- Andrei
>>
>> That would seem to place significant restrictions on the ability to define effective random number generators and related functionality ... ?
>
> Why RNG would be a forward range as opposed to just input range?

To specify : the way I see it you either want PRNG to be a forward range and that fits with value semantics. Or you want reference semantics and it naturally becomes input range.
« First   ‹ Prev
1 2