| Thread overview | |||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
January 15, 2015 More tricky range semantics | ||||
|---|---|---|---|---|
| ||||
Just noticed this little gotcha today: suppose we have a forward range R, and an algorithm that wraps around R that looks something like this:
struct WrapperRange {
R src;
bool stoppingCondition;
this(R _src) {
src = _src;
stoppingCondition = someCondition(src.front);
}
@property bool empty() { return stoppingCondition; }
@property auto front() { return src.front; }
void popFront() {
src.popFront();
if (!src.empty && someCondition(src.front))
stoppingCondition = true;
}
@property auto save() {
typeof(this) copy = this;
copy.src = src.save;
return copy;
}
}
Basically, the wrapper iterates over R with some stopping condition that may cause it to end earlier than the actual end of R.
Now consider this innocuous-looking code:
unittest {
R r = ... /* get an instance of R */;
auto wrapper = WrapperRange(r);
// check that the contents of wrapper is what we expect
assert(wrapper.equal(expectedData));
// erhm... check the contents of wrapper again, just to
// be doubly sure?
assert(wrapper.equal(expectedData)); // <-- line 10
}
You may laugh at line 10, marked above... but don't be so quick to laugh just yet. Consider now if R is a reference type. What would happen? Let's trace the steps:
- The "R r = ..." line is nothing unexpected: since R is a by-reference
type, r simply stores a reference to the actual instance of R. Nothing
surprising.
- The "auto wrapper" line creates an instance of WrapperRange -- which
is by-value, btw, this will become important later -- and assigns to
.src the reference to that instance of R.
- The next line calls equal() on wrapper, to verify that its contents
are what we're expecting. This passes a copy of wrapper to equal(),
because wrapper is a by-value type. But since R is a reference type,
as equal() iterates over the copy of wrapper, the underlying R range
is being popped.
This has 2 consequences:
1) When equal() returns, the original copy of wrapper no longer points
to the same place in r as before, because equal() has popped the
underlying instance of R.
2) Furthermore, since equal() iterates over the entire wrapped range,
.stoppingCondition is now true. However, this only affected the local
copy of wrapper in equal(). This means that after equal() returns, the
unittest's original copy of wrapper is now in an inconsistent state:
r has reached an element for which someCondition() is true, but this
is not reflected in wrapper, which still thinks it's in the original
position in r as before the call to equal().
As a result, the assert on line 10 will fail.
This situation is quite counterintuitive. One would expect that either:
(i) wrapper has value semantics, meaning that passing it to equal()
would not consume it; or,
(ii) wrapper has reference semantics, meaning that passing it to equal()
would empty it.
However, what actually happens is that wrapper is left in an inconsistent state upon returning from equal() that is neither (i) nor (ii). The kicker is that this resulted from code that followed the range API to the letter. No hidden loophole in the range API was exploited. No tricky hacks were used that might have caused problems.
So what's the moral of the story?
a) At least as things currently stand, passing (wrapper) ranges around may exhibit "undefined" behaviour, like the above. Passing a range to a function may invalidate it unless you use .save. Therefore, one should *always* use .save. (If we had passed wrapper.save to equal() instead, this problem would not have happened.) This applies even if the wrapper range is a by-value type. Or should we say, *especially* when it's a by-value type?
b) One may argue that WrapperRange ought to .save the underlying range in its postblit... but what if the only thing we wanted to do was to call equal() on wrapper and nothing else? In that case, we'd be incurring needless overhead of .save'ing the range when we didn't care whether it got consumed.
c) This issue is already latent in almost *all* Phobos algorithms. We only haven't discovered it yet because most people just use arrays for ranges. But all it takes is for somebody to start using std.range.inputRangeObject (and there are cases where this is necessary), and this problem will surface. Anytime you mix by-value and by-reference ranges, be ready for problems of this sort to arise.
Yep, range semantics just got even trickier. (And we thought transient
ranges were bad...)
T
--
"Computer Science is no more about computers than astronomy is about telescopes." -- E.W. Dijkstra
| ||||
January 15, 2015 Re: More tricky range semantics | ||||
|---|---|---|---|---|
| ||||
Posted in reply to H. S. Teoh | On 1/15/15 12:53 PM, H. S. Teoh via Digitalmars-d wrote:
> Passing a range to a
> function may invalidate it unless you use .save. Therefore, one should
> *always* use .save.
That's right. To simplify the problem space we might decree that forward (or better) ranges with reference semantics are not allowed. -- Andrei
| |||
January 15, 2015 Re: More tricky range semantics | ||||
|---|---|---|---|---|
| ||||
Posted in reply to H. S. Teoh | On 1/15/15 12:53 PM, H. S. Teoh via Digitalmars-d wrote:
> This issue is already latent in almost*all* Phobos algorithms. We
> only haven't discovered it yet because most people just use arrays for
> ranges. But all it takes is for somebody to start using
> std.range.inputRangeObject (and there are cases where this is
> necessary), and this problem will surface.
There's a distinction here. Input non-forward ranges can be considered "reference" because popFront()ing any copy is tantamount to popFront()int any other copy. -- Andrei
| |||
January 16, 2015 Re: More tricky range semantics | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On 16/01/15 00:24, Andrei Alexandrescu via Digitalmars-d wrote:
> That's right. To simplify the problem space we might decree that forward (or
> better) ranges with reference semantics are not allowed. -- Andrei
That would seem to place significant restrictions on the ability to define effective random number generators and related functionality ... ?
| |||
January 16, 2015 Re: More tricky range semantics | ||||
|---|---|---|---|---|
| ||||
On Fri, Jan 16, 2015 at 01:58:15AM +0100, Joseph Rushton Wakeling via Digitalmars-d wrote: > On 16/01/15 00:24, Andrei Alexandrescu via Digitalmars-d wrote: > >That's right. To simplify the problem space we might decree that forward (or better) ranges with reference semantics are not allowed. -- Andrei > > That would seem to place significant restrictions on the ability to define effective random number generators and related functionality ... ? Not to mention, I just realized that while the example I used relied on the interaction between reference and value types, a similar problem exists with just value types alone. Using the same wrapper range I posted, suppose now that R is not a reference type range, but an input (non-forward) range, and that .save is not implemented. Then consider this code: unittest { auto r = R(...); auto wrapped = WrapperRange(r); assert(wrapped.equal(expectedData)); bool b = wrapped.empty; } Quiz: what's the value of b? . . . Go on, guess. :-) . . . Yup, b == false. Now guess what happens if you then access wrapped.front? Well, that depends. If someCondition became true before r was exhausted, then wrapped.front would return a value that shouldn't be in the wrapped range, because it breaks the invariant that WrapperRange is supposed to stop when someCondition becomes true. However, if someCondition didn't become true and r was exhausted, accessing wrapped.front will call .front on an empty input range, which is UB. If you're lucky, you'll hit an assert; otherwise, anything could happen, like dereferencing a dangling pointer, reading invalid memory, etc.. So actually, my previous statement was not broad enough: this problem happens not just with reference type ranges; it happens with non-reference input ranges too! T -- If it breaks, you get to keep both pieces. -- Software disclaimer notice | ||||
January 16, 2015 Re: More tricky range semantics | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Joseph Rushton Wakeling | On 1/15/15 4:58 PM, Joseph Rushton Wakeling via Digitalmars-d wrote:
> On 16/01/15 00:24, Andrei Alexandrescu via Digitalmars-d wrote:
>> That's right. To simplify the problem space we might decree that
>> forward (or
>> better) ranges with reference semantics are not allowed. -- Andrei
>
> That would seem to place significant restrictions on the ability to
> define effective random number generators and related functionality ... ?
Agreed. Pseudo-random generators are an interesting kind of range because they are forward yet do not iterate a "real" container. -- Andrei
| |||
January 16, 2015 Re: More tricky range semantics | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On Thu, Jan 15, 2015 at 05:52:45PM -0800, Andrei Alexandrescu via Digitalmars-d wrote: > On 1/15/15 4:58 PM, Joseph Rushton Wakeling via Digitalmars-d wrote: > >On 16/01/15 00:24, Andrei Alexandrescu via Digitalmars-d wrote: > >>That's right. To simplify the problem space we might decree that forward (or better) ranges with reference semantics are not allowed. -- Andrei > > > >That would seem to place significant restrictions on the ability to define effective random number generators and related functionality ... ? > > Agreed. Pseudo-random generators are an interesting kind of range because they are forward yet do not iterate a "real" container. -- Andrei I've been wondering about that. Must ranges have an "underlying" container? Or are they allowed to be more abstract entities that basically function as generators, producing data on demand? (This distinction is also somewhat related to transient ranges, in that the difference between providing a "view" of some static underlying data vs. computing something on-the-fly in a buffer that gets reused, could serve as a deciding factor on how to deal with transient ranges.) My feeling is that allowing the latter is much more powerful, and more encompassing. Constructs like iota() and recurrence() belong to the latter category, for example, and they do provide forward range like functionality -- or they *could*, if they don't already, as it would be trivial to implement. Allowing generating functions to be ranges also allows one to plug in programmatic data generators as data sources in UFCS chains, without needing to deal with an artificial distinction between "view of underlying data" ranges and "generated-on-the-fly" ranges. In this sense, an RNG could be thought of as a complex variant of recurrence(), except with a far more complicated generating expression than what one would normally use with recurrence(). If recurrence() qualifies as a range (and a forward range, no less), then an RNG ought to qualify too. T -- Those who don't understand Unix are condemned to reinvent it, poorly. | |||
January 16, 2015 Re: More tricky range semantics | ||||
|---|---|---|---|---|
| ||||
Posted in reply to H. S. Teoh | > > Yep, range semantics just got even trickier. (And we thought transient > ranges were bad...) > > > T This is related: https://issues.dlang.org/show_bug.cgi?id=11951 | |||
January 16, 2015 Re: More tricky range semantics | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Joseph Rushton Wakeling | On Friday, 16 January 2015 at 00:58:34 UTC, Joseph Rushton Wakeling via Digitalmars-d wrote:
> On 16/01/15 00:24, Andrei Alexandrescu via Digitalmars-d wrote:
>> That's right. To simplify the problem space we might decree that forward (or
>> better) ranges with reference semantics are not allowed. -- Andrei
>
> That would seem to place significant restrictions on the ability to define effective random number generators and related functionality ... ?
Why RNG would be a forward range as opposed to just input range?
| |||
January 16, 2015 Re: More tricky range semantics | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Dicebot | On Friday, 16 January 2015 at 14:58:09 UTC, Dicebot wrote:
> On Friday, 16 January 2015 at 00:58:34 UTC, Joseph Rushton Wakeling via Digitalmars-d wrote:
>> On 16/01/15 00:24, Andrei Alexandrescu via Digitalmars-d wrote:
>>> That's right. To simplify the problem space we might decree that forward (or
>>> better) ranges with reference semantics are not allowed. -- Andrei
>>
>> That would seem to place significant restrictions on the ability to define effective random number generators and related functionality ... ?
>
> Why RNG would be a forward range as opposed to just input range?
To specify : the way I see it you either want PRNG to be a forward range and that fits with value semantics. Or you want reference semantics and it naturally becomes input range.
| |||
Copyright © 1999-2021 by the D Language Foundation
Permalink
Reply