June 18, 2013 Re: Ranges and random numbers -- again | ||||
---|---|---|---|---|
| ||||
On Tuesday, June 18, 2013 10:27:11 Joseph Rushton Wakeling wrote:
> On 06/18/2013 08:06 AM, Jonathan M Davis wrote:
> > It would probably be a pretty easy sell though, since it can probably
> > stay
> > mostly the same aside from the struct -> class change (though at that
> > point, we might as well take the opportunity to make sure that anything
> > else that should be redesigned about it gets redesigned appropriately).
>
> Yea, this is also my feeling, which is part of why I'm pushing this concept of "random ranges" -- I want to ensure that the related issues are properly understood and discussed and some well-thought-out design patterns are prepared in order to ensure good and statistically reliable functionality in std.random2.
>
> One small note -- I'd have thought that a struct with an internal pointer-to-payload (allocated using manual memory management, not GC) would have been a superior design for pseudo-random number generators compared to making them final classes. The latter is just the easiest thing to do for simple tests of PRNG-as-reference-type.
An internal payload might make more sense, but it probably needs to be done in a way that it can be controlled (which may require custom allocators), because even allocating with malloc all the time might be a problem for the guys who are really picky about memory stuff - though maybe that level of customization could be added later. I don't know. I also don't work in that kind of environment, so I don't know exactly what such programmers find acceptable.
- Jonathan M Davis
|
June 18, 2013 Re: Ranges and random numbers -- again | ||||
---|---|---|---|---|
| ||||
On 06/18/2013 10:32 AM, Jonathan M Davis wrote:
> An internal payload might make more sense, but it probably needs to be done in a way that it can be controlled (which may require custom allocators), because even allocating with malloc all the time might be a problem for the guys who are really picky about memory stuff - though maybe that level of customization could be added later. I don't know. I also don't work in that kind of environment, so I don't know exactly what such programmers find acceptable.
Ahh, very good point. I guess we should probably invite Manu and Adam Ruppe to comment on this ...
|
June 18, 2013 Re: Ranges and random numbers -- again | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jonathan M Davis | On Tuesday, 18 June 2013 at 09:32:35 UTC, Jonathan M Davis wrote:
> On Tuesday, June 18, 2013 10:27:11 Joseph Rushton Wakeling wrote:
>> On 06/18/2013 08:06 AM, Jonathan M Davis wrote:
>> > It would probably be a pretty easy sell though, since it can probably
>> > stay
>> > mostly the same aside from the struct -> class change (though at that
>> > point, we might as well take the opportunity to make sure that anything
>> > else that should be redesigned about it gets redesigned appropriately).
>>
>> Yea, this is also my feeling, which is part of why I'm pushing this concept
>> of "random ranges" -- I want to ensure that the related issues are properly
>> understood and discussed and some well-thought-out design patterns are
>> prepared in order to ensure good and statistically reliable functionality
>> in std.random2.
>>
>> One small note -- I'd have thought that a struct with an internal
>> pointer-to-payload (allocated using manual memory management, not GC) would
>> have been a superior design for pseudo-random number generators compared to
>> making them final classes. The latter is just the easiest thing to do for
>> simple tests of PRNG-as-reference-type.
>
> An internal payload might make more sense, but it probably needs to be done in
> a way that it can be controlled (which may require custom allocators), because
> even allocating with malloc all the time might be a problem for the guys who
> are really picky about memory stuff - though maybe that level of customization
> could be added later. I don't know. I also don't work in that kind of
> environment, so I don't know exactly what such programmers find acceptable.
>
> - Jonathan M Davis
Memory wise, classes or pointer to payload will both have a memory allocation cost anyways. classes will be on the GC, whereas structs *may* use malloc + ref count.
The advantage of classes is that you can default initialize them, which is a real + given that some random ranges may need priming. (or we could bring up yet again the subject of allowing structs to have explicitly callable constructors with no arguments).
What I remember, is that there were some people who did numerics that didn't want to have reference types, because of the overhead to access the payload. They *really* wanted to be able to declare their types on stack.
What I did find though is that if you implement all of the PRNGs as value types, you can then easily wrap them all inside a reference types or class. This:
1: Makes implementation easier
2: Leaves a door open for users that want explicitly value types (at their own responsibility).
The last issue is input vs forward. IMO, PRNG's should be input by *default*, and forward (when possible) on demand. Making a PRNG forward by default means you can't prevent an algorithm from saving your range. That means that *even* if you have a reference type, you could still have duplicate series. In the above example with dual call to "array", one can *hope* to have two different ranges... but there is nothing preventing array from calling "save" just to troll the user. fill does (used to) do it :D
I was also able to implement this pretty easily: Just give the saveable ranges the "dup" primitive. Users of the PRNG can then explicitly dup if they want to very safely and explicitly. If they really want a ForwardPRNG, then a simple adaptor that adds save in terms of dup is provided.
|
June 18, 2013 Re: Ranges and random numbers -- again | ||||
---|---|---|---|---|
| ||||
On 06/18/2013 10:30 AM, Joseph Rushton Wakeling wrote:
> I don't come to that conclusion because I _want_ random ranges to be un-.save-able, but because I think without that design choice, there will simply be too many ways to unknowingly generate unwanted correlations in random-number-using programs.
>
> I'll follow up on that point later today.
Just as a simple example -- and this involving purely pseudo-random number generation, not "random ranges" as I've conceived them [*] -- consider the following:
auto t1 = rndGen.take(5);
writeln(t1);
auto t2 = rndGen.take(5);
writeln(t2);
I'd expect that to produce two distinct sequences. In fact it produces two identical sequences.
I think this must be because, passed a forward range like Mt19937, std.range.Take uses this constructor:
this(R input) { _original = input; _current = input.save; }
... and so when rndGen.take(5) is consumed, rndGen itself is not iterated forward.
To me that's a very serious potential source of statistical errors in a program, and it's particularly bad because it's innocuous -- absent the writeln of the results, it would be easy to never realize what's happening here.
Now imagine the potential consequences for D's use in scientific simulation -- I don't want to see D get raked over the coals because some researcher's published simulation results turned out to be spurious ... :-)
I'd be very happy to see a way in which this issue can be resolved while preserving the opportunity to .save, but I don't personally see one that doesn't rely on a heavy amount of user virtue to avoid these statistical pitfalls.
[* Actually, rndGen.take(n) is arguably a random range according to my
definition, because rndGen.take(n).popFront() will request a (pseudo-)random
number.]
|
June 18, 2013 Re: Ranges and random numbers -- again | ||||
---|---|---|---|---|
| ||||
Posted in reply to monarch_dodra | On 06/18/2013 11:00 AM, monarch_dodra wrote: > The last issue is input vs forward. IMO, PRNG's should be input by *default*, and forward (when possible) on demand. I think that's potentially a very good design concept, and might offer a friendly way round the issues I mentioned with rndGen.take() in my email of a few minutes ago. > Making a PRNG forward by default means you can't prevent an algorithm from saving your range. That means that *even* if you have a reference type, you could still have duplicate series. Indeed -- I was about to follow up on my rndGen.take example with one involving a final class version of MersenneTwisterEngine, but I think it's obvious without the example :-) > In the above example with dual call to "array", one can *hope* to have two different > ranges... but there is nothing preventing array from calling "save" just to > troll the user. fill does (used to) do it :D > > I was also able to implement this pretty easily: Just give the saveable ranges the "dup" primitive. Users of the PRNG can then explicitly dup if they want to very safely and explicitly. If they really want a ForwardPRNG, then a simple adaptor that adds save in terms of dup is provided. I think in principle one could have a .forward() method/property that would return a forward-range wrapper of the range in question. So, with this concept, one would expect, auto t1 = rndGen.take(5); writeln(t1); auto t2 = rndGen.take(5); writeln(t2); ... to produce _different_ sequences (as rndGen would be merely an input range, and not .save-able), whereas auto gen = rndGen.forward; auto t1 = gen.take(5); writeln(t1); auto t2 = gen.take(5); writeln(t2); .... would produce identical sequences as before. It could in turn be a good design principle that pseudo-random sequences (and random ranges deriving from pseudo-random sequences) should define the .forward property, whereas "true" random sequences and their derivatives would lack it. |
June 18, 2013 Re: Ranges and random numbers -- again | ||||
---|---|---|---|---|
| ||||
Posted in reply to Joseph Rushton Wakeling | On Tuesday, 18 June 2013 at 10:16:02 UTC, Joseph Rushton Wakeling wrote:
> std.range.Take uses this constructor:
>
> this(R input) { _original = input; _current = input.save; }
It shouldn't. It is the caller who should chose to (or not to) save.
|
June 18, 2013 Re: Ranges and random numbers -- again | ||||
---|---|---|---|---|
| ||||
Posted in reply to monarch_dodra | On 06/18/2013 11:31 AM, monarch_dodra wrote:
> On Tuesday, 18 June 2013 at 10:16:02 UTC, Joseph Rushton Wakeling wrote:
>> std.range.Take uses this constructor:
>>
>> this(R input) { _original = input; _current = input.save; }
>
> It shouldn't. It is the caller who should chose to (or not to) save.
Ack, you're quite right. Take doesn't even have a constructor -- that's a constructor for Cycle.
I agree with your principle about it being the caller's decision whether or not to .save, but can that be assumed in general? I thought there were Phobos algorithms or ranges that deliberately took saved copies for internal use.
|
June 18, 2013 Re: Ranges and random numbers -- again | ||||
---|---|---|---|---|
| ||||
Attachments: | On 06/18/2013 11:54 AM, Joseph Rushton Wakeling wrote: > On 06/18/2013 11:31 AM, monarch_dodra wrote: >> On Tuesday, 18 June 2013 at 10:16:02 UTC, Joseph Rushton Wakeling wrote: >>> std.range.Take uses this constructor: >>> >>> this(R input) { _original = input; _current = input.save; } >> >> It shouldn't. It is the caller who should chose to (or not to) save. > > Ack, you're quite right. Take doesn't even have a constructor -- that's a constructor for Cycle. Proof that I was wrong -- please find attached a test module that implements a final class (hence, reference type) version of Mersenne Twister. If we then do, auto gen = new MtClass19937(unpredictableSeed); auto t1 = gen.take(5); auto t2 = gen.take(5); writeln(t1); writeln(t2); ... we get different sequences. However, the .save as implemented needs rewriting (I haven't touched it yet) as currently it simply returns a reference copy. I'll get onto that. :-) Anyway, _assuming_ that we can rely on algorithms and ranges to not surreptitiously .save the random number generator, this does seem to offer a genuine alternative to my more problematic proposal. I'm very happy about this as I had feared my own idea was "least bad" rather than "best", never a nice position to be in. |
June 18, 2013 Re: Ranges and random numbers -- again | ||||
---|---|---|---|---|
| ||||
Posted in reply to Joseph Rushton Wakeling | On 6/18/13 4:57 AM, Joseph Rushton Wakeling wrote:
> On 06/18/2013 05:00 AM, Andrei Alexandrescu wrote:
>> Once you consume an input range, there's no way to consume it again. It's done.
>
> Sure, but what I'm proposing is a stronger constraint than "random ranges should
> be input ranges".
I'm not sure I understand what you are proposing. If a true random range is to be an input range with additional constraints, one simple thing is once .empty returns true there's no more extracting elements from it.
Andrei
|
June 18, 2013 Re: Ranges and random numbers -- again | ||||
---|---|---|---|---|
| ||||
Attachments: | On 06/18/2013 12:09 PM, Joseph Rushton Wakeling wrote: > ... we get different sequences. However, the .save as implemented needs rewriting (I haven't touched it yet) as currently it simply returns a reference copy. I'll get onto that. :-) Tweaked version attached. This adds a new constructor which copies a PRNG of the same type: this(typeof(this) source) { this.mt[] = source.mt[]; this.mti = source.mti; this._y = source._y; } ... and .save then makes use of this to return a duplicate: @property typeof(this) save() { return new typeof(this)(this); } With this in hand, we can do, auto gen = new MtClass19937(unpredictableSeed); auto t1 = gen.take(5); auto t2 = gen.take(5); writeln(t1); writeln(t2); ... and get two independent sequences, whereas, auto t3 = gen.take(5); auto t4 = gen.save.take(5); writeln(t3); writeln(t4); ... produces two identical sequences. This still requires some intelligence/virtue on behalf of the programmer, though: if we do, auto t5 = gen.save.take(5); writeln(t5); ... and don't also take 5 from gen itself, then subsequent uses of gen will be correlated with t5, probably unintentionally. However, I think that's a fair tradeoff for the ability to .save at all. It's also now trivial to rewrite things like my SimpleRandomRange such that, auto gen = new MtClass19937(unpredictableSeed); auto r = simpleRandomRange(0.0L, 1.0L, gen); auto t1 = r5.take(5); auto t2 = r5.take(5); writeln(t1); writeln(t2); ... produces different sequences, whereas, auto t3 = r5.take(5); auto t4 = r5.save.take(5); writeln(t3); writeln(t4); ... produce the same. Ditto for "real" random ranges, such as std.random.RandomSample. So we have a solution to the first set of problems identified, that is much nicer than the one I had in mind. Very good! However, at least one more niggle remains -- to be followed up on. |
Copyright © 1999-2021 by the D Language Foundation