range.save (page 3)

On Friday, 27 November 2015 at 11:31:14 UTC, Jonathan M Davis wrote: > Another piece of this puzzle to consider is that unless a range is a value type (or at least acts like a value type as long as you don't mutate its elements) or has save called on it, then it fundamentally doesn't work with lazy ranges. So, at minimum, we need to consider making it so that lazy ranges require forward ranges (and then, assuming that we continue to have save, the lazy ranges need to always call save on the range that they're given). Ah, interesting you should bring that up, as it's exactly the challenge of doing random number generation via a range interface ;-) I'm looking at this problem from a slightly different angle, which is that for a non-deterministic range (which is a subset of possible InputRanges) to be lazy, it matters that the value of .front is not evaluated until the first call to .front; and this "not yet determined" property needs to be restored after .popFront() is called. Basically, you require _true_ laziness rather than the kind of pseudo-laziness that most Phobos ranges display, where the initial value of .front is determined in the constructor, and .popFront() determines the next value of .front "eagerly". (N.B. "Non-deterministic" here includes pseudo-non-deterministic ranges, such as pseudo-RNGs.)

November 27, 2015

Re: range.save

Posted by Jonathan M Davis
in reply to Joseph Rushton Wakeling

Permalink

Jonathan M Davis

Posted in reply to Joseph Rushton Wakeling

Permalink

On Friday, 27 November 2015 at 11:45:38 UTC, Joseph Rushton Wakeling wrote:
> On Friday, 27 November 2015 at 11:31:14 UTC, Jonathan M Davis wrote:
>> Another piece of this puzzle to consider is that unless a range is a value type (or at least acts like a value type as long as you don't mutate its elements) or has save called on it, then it fundamentally doesn't work with lazy ranges. So, at minimum, we need to consider making it so that lazy ranges require forward ranges (and then, assuming that we continue to have save, the lazy ranges need to always call save on the range that they're given).
>
> Ah, interesting you should bring that up, as it's exactly the challenge of doing random number generation via a range interface ;-)
>
> I'm looking at this problem from a slightly different angle, which is that for a non-deterministic range (which is a subset of possible InputRanges) to be lazy, it matters that the value of .front is not evaluated until the first call to .front; and this "not yet determined" property needs to be restored after .popFront() is called.
>
> Basically, you require _true_ laziness rather than the kind of pseudo-laziness that most Phobos ranges display, where the initial value of .front is determined in the constructor, and .popFront() determines the next value of .front "eagerly".
>
> (N.B. "Non-deterministic" here includes pseudo-non-deterministic ranges, such as pseudo-RNGs.)

Well, you can have a pure input range which is lazy, but what you can't do is wrap it in another lazy range. A prime example would be something like

    auto first5 = range.take(5);
    range.popFront();
    range.popFront();
    // first5 now refers to elements 2 through 6 rather than 0 through 4

Either take needs to actually get a separate copy of the range (i.e. use save), or the range can't be used after take has been called. So, wrapping the range in a lazy range does still work on some level - but only as long as you don't use the range for anything else other than through that lazy range, and I don't know of any way to restrict that except by either disallowing pure input ranges with lazy range wrappers (which is arguably over restrictive) or by simply telling people not to use a pure input range after passing it to a lazy range (which is obviously error-prone, because it's not enforced in any way).

Whether the original range was lazy or not doesn't really matter. It's the fact that it's not a value type that screws things.

- Jonathan M Davis

On Friday, 27 November 2015 at 11:57:37 UTC, Jonathan M Davis wrote: > Well, you can have a pure input range which is lazy, but what you can't do is wrap it in another lazy range. A prime example would be something like > > auto first5 = range.take(5); > range.popFront(); > range.popFront(); > // first5 now refers to elements 2 through 6 rather than 0 through 4 Hmm well, I think it depends on how you approach the question of what is "correct" there. If range is a RNG then that behaviour could arguably be OK; the 5 numbers extracted from the RNG are evaluated as you consume them, and that's all right. This is where I'm wishing I knew Haskell better, because I'm increasingly suspecting that InputRanges ought to be thought of in much the same way as Haskell considers IO.

On Friday, 27 November 2015 at 12:06:02 UTC, Joseph Rushton Wakeling wrote: > On Friday, 27 November 2015 at 11:57:37 UTC, Jonathan M Davis wrote: >> Well, you can have a pure input range which is lazy, but what you can't do is wrap it in another lazy range. A prime example would be something like >> >> auto first5 = range.take(5); >> range.popFront(); >> range.popFront(); >> // first5 now refers to elements 2 through 6 rather than 0 through 4 > > Hmm well, I think it depends on how you approach the question of what is "correct" there. If range is a RNG then that behaviour could arguably be OK; the 5 numbers extracted from the RNG are evaluated as you consume them, and that's all right. Well, if you're dealing with a pseudo-random generator with a specific seed, I'm not sure that it's okay, though obviously for a fully random number generator, it wouldn't matter. The real problem is that this affects _any_ input range that's a reference type, not just random number generators, and the order very much matters for most range types. The problem can be fixed for forward ranges via save, but not for pure input ranges. And it's even worse with pure input ranges if they're not required to be full-on reference types, since then who knows what the state of the range is after it's copied to be passed into take. Right now, generic code can't use any range that's passed to take (or any function like it) after it's been passed in, because it's undefined behavior given the varying, possible semantics when copying a range, though calling save first obviously gets around that problem for forward ranges. But it's pretty certain that there's lots of code out there that actually depends on that undefined behavior acting like it does with dynamic arrays, because that's what most ranges do. > This is where I'm wishing I knew Haskell better, because I'm increasingly suspecting that InputRanges ought to be thought of in much the same way as Haskell considers IO. Possibly, but because almost everything in Haskell is both functional and lazy, you don't really get the problem of popFront being called after the call to take. - Jonathan M Davis

On Friday, 27 November 2015 at 12:16:34 UTC, Jonathan M Davis wrote: > Well, if you're dealing with a pseudo-random generator with a specific seed, I'm not sure that it's okay, though obviously for a fully random number generator, it wouldn't matter. I think the distinction here is artificial. If `range` is a random number generator, then `take` has no right to expect its output to be deterministic or consistent in any way. When we come to input ranges in general, that's surely a factor we need to take into account. Can the InputRange actually be relied on as a deterministic iteration, but one where we can't save our state? Or should it be treated as simply a source of data, about which one can make no assumptions regarding its consistency? I think answering that question -- and maybe distinguishing the two cases if possible -- feeds into how to address the subsequent problems you describe. > Possibly, but because almost everything in Haskell is both functional and lazy, you don't really get the problem of popFront being called after the call to take. I'm not sure that's really true. I'm reasonably sure that I can (given some source of IO) do something like (pseudo-code because my Haskell is rusty), do lazyData = something_that_lazily_reads_5_entries_from_the_IO_entity read_some_data_from_IO read_some_data_from_IO do_something_with_lazyData If the source of IO is the standard random number generator of Haskell, for example, it'll behave like the input range/popFront example.

On Friday, 27 November 2015 at 13:46:37 UTC, Joseph Rushton Wakeling wrote: > I think the distinction here is artificial. If `range` is a random number generator ... should read, "If `range` is a random number generator (even a pseudo-random one)".

Forums