June 18, 2013 Re: Ranges and random numbers -- again | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On 06/18/2013 12:30 PM, Andrei Alexandrescu wrote:
> I'm not sure I understand what you are proposing. If a true random range is to be an input range with additional constraints, one simple thing is once .empty returns true there's no more extracting elements from it.
Sorry, that very brief email was sent in error -- I accidentally clicked the "send" button before I finished writing. The longer follow-up has more detail, though I think monarch_dodra's feedback has convinced me that my proposal is not necessary.
|
June 18, 2013 Re: Ranges and random numbers -- again | ||||
---|---|---|---|---|
| ||||
On Tue, Jun 18, 2013 at 11:15:51AM +0100, Joseph Rushton Wakeling wrote: > On 06/18/2013 10:30 AM, Joseph Rushton Wakeling wrote: > > I don't come to that conclusion because I _want_ random ranges to be un-.save-able, but because I think without that design choice, there will simply be too many ways to unknowingly generate unwanted correlations in random-number-using programs. > > > > I'll follow up on that point later today. > > Just as a simple example -- and this involving purely pseudo-random number generation, not "random ranges" as I've conceived them [*] -- consider the following: > > auto t1 = rndGen.take(5); > writeln(t1); > > auto t2 = rndGen.take(5); > writeln(t2); > > I'd expect that to produce two distinct sequences. In fact it produces two identical sequences. > > I think this must be because, passed a forward range like Mt19937, std.range.Take uses this constructor: > > this(R input) { _original = input; _current = input.save; } > > ... and so when rndGen.take(5) is consumed, rndGen itself is not > iterated forward. No, I just looked at the source code, and there is no call to .save in the ctor. This is just another symptom of RNGs being passed by value: t1 and t2 are operating on two copies of rndGen passed by value. Were they passed by reference, this wouldn't have been a problem. I say again that RNGs being passed by value is a major BUG. The above situation is a prime example of this problem. We *need* to make RNGs passed by reference. For situations where you *want* to duplicate a pseudo random sequence, an explicit method should be provided to clone the RNG. Duplication of RNGs should never be implicit. [...] > I'd be very happy to see a way in which this issue can be resolved while preserving the opportunity to .save, but I don't personally see one that doesn't rely on a heavy amount of user virtue to avoid these statistical pitfalls. [...] Just make RNGs passed by reference, and the above problem will not happen. T -- Be in denial for long enough, and one day you'll deny yourself of things you wish you hadn't. |
June 18, 2013 Re: Ranges and random numbers -- again | ||||
---|---|---|---|---|
| ||||
On Tue, Jun 18, 2013 at 12:06:41AM -0700, Jonathan M Davis wrote: > On Tuesday, June 18, 2013 00:16:24 Joseph Rushton Wakeling wrote: > > On 06/18/2013 12:10 AM, H. S. Teoh wrote: > > > What were the reasons it was not accepted? > > > > I think mostly that, as a patch to std.random, it was a breaking change. There was a proposal to make a std.random2 but that would have been a rather more major piece of work :-( > > Yeah. Changing std.random to use reference types would be a breaking change unless you just created new classes for everything. It would just be cleaner to do std.random2 instead. [...] The name "std.random2" still makes me cringe. I wish there was a good alternative, but the only names I can think of are along the lines of std.rnd or std.rand, which are inferior names we wouldn't want to be stuck with. :-( T -- Why waste time learning, when ignorance is instantaneous? -- Hobbes, from Calvin & Hobbes |
June 18, 2013 Re: Ranges and random numbers -- again | ||||
---|---|---|---|---|
| ||||
On 06/18/2013 03:16 PM, H. S. Teoh wrote: > No, I just looked at the source code, and there is no call to .save in the ctor. This is just another symptom of RNGs being passed by value: t1 and t2 are operating on two copies of rndGen passed by value. Were they passed by reference, this wouldn't have been a problem. You're right, it was my mistake -- I was scanning/searching the source code too quickly and a search for 'this(' took me into another struct without me realizing. > Just make RNGs passed by reference, and the above problem will not happen. That was the conclusion I also came to after monarch_dodra's examples. However, I'm still concerned about circumstances where an algorithm might choose to operate on a .save copy rather than on the original range -- this could result in unnoticed statistical correlations. Can I reasonably assume that no D language/runtime/standard library algorithm would do that? And if not, what are the examples where a .save is consumed instead of the original range? |
June 18, 2013 Re: Ranges and random numbers -- again | ||||
---|---|---|---|---|
| ||||
On Tue, Jun 18, 2013 at 11:27:03AM +0100, Joseph Rushton Wakeling wrote: > On 06/18/2013 11:00 AM, monarch_dodra wrote: > > The last issue is input vs forward. IMO, PRNG's should be input by *default*, and forward (when possible) on demand. +1. [...] > > In the above example with dual call to "array", one can *hope* to have two different ranges... but there is nothing preventing array from calling "save" just to troll the user. fill does (used to) do it :D > > > > I was also able to implement this pretty easily: Just give the saveable ranges the "dup" primitive. Users of the PRNG can then explicitly dup if they want to very safely and explicitly. If they really want a ForwardPRNG, then a simple adaptor that adds save in terms of dup is provided. +1. We should follow D's dictum of being safe/correct by default, but allow the user to get under the hood if they need to. Copying a PRNG state is a potentially unsafe (in the probabilistic sense) operation, so it should be made explicit -- call .dup if the user really wants a copy of the PRNG sequence, pass by reference otherwise. > I think in principle one could have a .forward() method/property that would return a forward-range wrapper of the range in question. > > So, with this concept, one would expect, > > auto t1 = rndGen.take(5); > writeln(t1); > > auto t2 = rndGen.take(5); > writeln(t2); > > ... to produce _different_ sequences (as rndGen would be merely an input range, and not .save-able), whereas > > auto gen = rndGen.forward; > > auto t1 = gen.take(5); > writeln(t1); > > auto t2 = gen.take(5); > writeln(t2); > > .... would produce identical sequences as before. [...] Actually, take doesn't (and shouldn't!) call .save, so both cases should return different sequences. The cause of the original problem was that RNGs are being passed by value, so you're actually working on copies of the PRNG state. Also, the above code is obscure; someone reading the code may have missed the call to .forward and thus misunderstand the intent of the code. I much rather provide .dup and force the user to write this: auto gen = rndGen; auto t1 = gen.dup.take(5); // Explicit call to .dup auto t2 = gen.take(5); This states up-front that we're expecting two identical sequences from the (P)RNG, so someone reading the code wouldn't be misled to think two different sequences are being generated. Self-documenting code is always better than code whose semantics rely on obscure contextual information that is easily missed. T -- Debian GNU/Linux: Cray on your desktop. |
June 18, 2013 Re: Ranges and random numbers -- again | ||||
---|---|---|---|---|
| ||||
Posted in reply to H. S. Teoh | On 6/18/13 10:16 AM, H. S. Teoh wrote: > I say again that RNGs being passed by value is a major BUG. The above > situation is a prime example of this problem. We *need* to make RNGs > passed by reference. For situations where you *want* to duplicate a > pseudo random sequence, an explicit method should be provided to clone > the RNG. Duplication of RNGs should never be implicit. Occasionally copying a RNG's state is useful, but I agree most of the time you want to just take a reference to it. I think a good way toward a solution is http://d.puremagic.com/issues/show_bug.cgi?id=10404. Andrei |
June 18, 2013 Re: Ranges and random numbers -- again | ||||
---|---|---|---|---|
| ||||
On Tue, Jun 18, 2013 at 03:32:02PM +0100, Joseph Rushton Wakeling wrote: > On 06/18/2013 03:16 PM, H. S. Teoh wrote: > > No, I just looked at the source code, and there is no call to .save in the ctor. This is just another symptom of RNGs being passed by value: t1 and t2 are operating on two copies of rndGen passed by value. Were they passed by reference, this wouldn't have been a problem. > > You're right, it was my mistake -- I was scanning/searching the source code too quickly and a search for 'this(' took me into another struct without me realizing. > > > Just make RNGs passed by reference, and the above problem will not happen. > > That was the conclusion I also came to after monarch_dodra's examples. However, I'm still concerned about circumstances where an algorithm might choose to operate on a .save copy rather than on the original range -- this could result in unnoticed statistical correlations. > > Can I reasonably assume that no D language/runtime/standard library algorithm would do that? And if not, what are the examples where a .save is consumed instead of the original range? Well, .save *is* used in a few places; the few that I know of include cartesianProduct (at least one of the ranges need to be iterated over repeatedly in order to compute the product), and find(), I believe, where the needle needs to be at least a forward range so that multiple candidate subranges can be checked repeatedly. And joiner forwards .save to its wrapper range, though it doesn't actually call it as part of its operation. There may be a few other places where .save is used. W.r.t. (P)RNGs, the places where .save is used are where random ranges wouldn't really make sense anyway -- if I *really* wanted to search a range for a random needle, I'd make an array out of prng.take(n) first, and then pass that to find(); it'd be more efficient, if nothing else. And I can't think of any useful application of taking a cartesian product of a random range. T -- "Hi." "'Lo." |
June 18, 2013 Re: Ranges and random numbers -- again | ||||
---|---|---|---|---|
| ||||
Posted in reply to H. S. Teoh | On Tuesday, 18 June 2013 at 14:18:19 UTC, H. S. Teoh wrote:
> [...]
>
> Just make RNGs passed by reference, and the above problem will not
> happen.
>
>
> T
Well... it's more than *just* pass-by-reference, its actual store-by-reference (eg "reference semantic").
In the case of "take", take store a copy of the passed range, so a pass-by-reference wouldn't really change much.
I'm not sure if that's what you meant, but just in case. We don't want any ambiguity on the words we are using.
|
June 18, 2013 Re: Ranges and random numbers -- again | ||||
---|---|---|---|---|
| ||||
Attachments: | On 06/18/2013 03:38 PM, H. S. Teoh wrote: > W.r.t. (P)RNGs, the places where .save is used are where random ranges wouldn't really make sense anyway -- if I *really* wanted to search a range for a random needle, I'd make an array out of prng.take(n) first, and then pass that to find(); it'd be more efficient, if nothing else. And I can't think of any useful application of taking a cartesian product of a random range. I'll have a look through these things and see if there are any ways in which they might reasonably offer a way to come unstuck statistics-wise. I've attached a little example. Here we call: auto gen = new MtClass19937(unpredictableSeed); auto r = simpleRandomRange(0.0L, 1.0L, gen); auto t1 = r.take(5); auto s = r.save; auto t2 = s.take(5); writeln(t1); writeln(t2); auto t3 = r.take(5); auto t4 = s.take(5); writeln(t3); writeln(t4); Note that the first pair and last pair produce the same output, but the successive takes from r (t1 and t3) are different from each other, as are the successive takes from the save s (t2 and t4). This is what you'd expect, but an algorithm might assume that repeatedly consuming r.save would result in the same output again and again and again. |
June 18, 2013 Re: Ranges and random numbers -- again | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On Tuesday, 18 June 2013 at 14:39:38 UTC, Andrei Alexandrescu wrote: > On 6/18/13 10:16 AM, H. S. Teoh wrote: >> I say again that RNGs being passed by value is a major BUG. The above >> situation is a prime example of this problem. We *need* to make RNGs >> passed by reference. For situations where you *want* to duplicate a >> pseudo random sequence, an explicit method should be provided to clone >> the RNG. Duplication of RNGs should never be implicit. > > Occasionally copying a RNG's state is useful, but I agree most of the time you want to just take a reference to it. > > I think a good way toward a solution is http://d.puremagic.com/issues/show_bug.cgi?id=10404. > > > Andrei That sounds nice, but is it possible? The only way to truly forward everything is via an alias this. However, while all the calls are properly forwarded, it tends to fail for "is" traits. EG: struct S { int s; alias INT = int; void foo() {writeln("foo()");} void foo(int) {writeln("foo(int)");} void foo2(Args)(int) {writeln("foo!" ~ Args.stringof ~ "(int)");} @property bool empty(){return true;} @property int front(){return 1;} void popFront(){} } class Class(S) { private S __s; this(Args...)(auto ref Args args){__s = S(args);} alias __s this; } void main() { auto c = new Class!S(5); c.foo(); c.foo(5); c.foo2!short(5); c.s = 5; //it's a trap! Class!S.INT a = 5; assert(isInputRange!S); assert(isInputRange!(Class!S)); } Here, this compiles, but fails at the last line... Or maybe I'm not creative enough... ? |
Copyright © 1999-2021 by the D Language Foundation