June 18, 2013
On Tuesday, June 18, 2013 10:27:11 Joseph Rushton Wakeling wrote:
> On 06/18/2013 08:06 AM, Jonathan M Davis wrote:
> > It would probably be a pretty easy sell  though, since it can probably
> > stay
> > mostly the same aside from the struct -> class change (though at that
> > point, we might as well take the opportunity to  make sure that anything
> > else that should be redesigned about it gets  redesigned appropriately).
> 
> Yea, this is also my feeling, which is part of why I'm pushing this concept of "random ranges" -- I want to ensure that the related issues are properly understood and discussed and some well-thought-out design patterns are prepared in order to ensure good and statistically reliable functionality in std.random2.
> 
> One small note -- I'd have thought that a struct with an internal pointer-to-payload (allocated using manual memory management, not GC) would have been a superior design for pseudo-random number generators compared to making them final classes.  The latter is just the easiest thing to do for simple tests of PRNG-as-reference-type.

An internal payload might make more sense, but it probably needs to be done in a way that it can be controlled (which may require custom allocators), because even allocating with malloc all the time might be a problem for the guys who are really picky about memory stuff - though maybe that level of customization could be added later. I don't know. I also don't work in that kind of environment, so I don't know exactly what such programmers find acceptable.

- Jonathan M Davis
June 18, 2013
On 06/18/2013 10:32 AM, Jonathan M Davis wrote:
> An internal payload might make more sense, but it probably needs to be done in a way that it can be controlled (which may require custom allocators), because even allocating with malloc all the time might be a problem for the guys who are really picky about memory stuff - though maybe that level of customization could be added later. I don't know. I also don't work in that kind of environment, so I don't know exactly what such programmers find acceptable.

Ahh, very good point.  I guess we should probably invite Manu and Adam Ruppe to comment on this ...

June 18, 2013
On Tuesday, 18 June 2013 at 09:32:35 UTC, Jonathan M Davis wrote:
> On Tuesday, June 18, 2013 10:27:11 Joseph Rushton Wakeling wrote:
>> On 06/18/2013 08:06 AM, Jonathan M Davis wrote:
>> > It would probably be a pretty easy sell  though, since it can probably
>> > stay
>> > mostly the same aside from the struct -> class change (though at that
>> > point, we might as well take the opportunity to  make sure that anything
>> > else that should be redesigned about it gets  redesigned appropriately).
>> 
>> Yea, this is also my feeling, which is part of why I'm pushing this concept
>> of "random ranges" -- I want to ensure that the related issues are properly
>> understood and discussed and some well-thought-out design patterns are
>> prepared in order to ensure good and statistically reliable functionality
>> in std.random2.
>> 
>> One small note -- I'd have thought that a struct with an internal
>> pointer-to-payload (allocated using manual memory management, not GC) would
>> have been a superior design for pseudo-random number generators compared to
>> making them final classes.  The latter is just the easiest thing to do for
>> simple tests of PRNG-as-reference-type.
>
> An internal payload might make more sense, but it probably needs to be done in
> a way that it can be controlled (which may require custom allocators), because
> even allocating with malloc all the time might be a problem for the guys who
> are really picky about memory stuff - though maybe that level of customization
> could be added later. I don't know. I also don't work in that kind of
> environment, so I don't know exactly what such programmers find acceptable.
>
> - Jonathan M Davis

Memory wise, classes or pointer to payload will both have a memory allocation cost anyways. classes will be on the GC, whereas structs *may* use malloc + ref count.

The advantage of classes is that you can default initialize them, which is a real + given that some random ranges may need priming. (or we could bring up yet again the subject of allowing structs to have explicitly callable constructors with no arguments).

What I remember, is that there were some people who did numerics that didn't want to have reference types, because of the overhead to access the payload. They *really* wanted to be able to declare their types on stack.

What I did find though is that if you implement all of the PRNGs as value types, you can then easily wrap them all inside a reference types or class. This:
1: Makes implementation easier
2: Leaves a door open for users that want explicitly value types (at their own responsibility).

The last issue is input vs forward. IMO, PRNG's should be input by *default*, and forward (when possible) on demand. Making a PRNG forward by default means you can't prevent an algorithm from saving your range. That means that *even* if you have a reference type, you could still have duplicate series. In the above example with dual call to "array", one can *hope* to have two different ranges... but there is nothing preventing array from calling "save" just to troll the user. fill does (used to) do it :D

I was also able to implement this pretty easily: Just give the saveable ranges the "dup" primitive. Users of the PRNG can then explicitly dup if they want to very safely and explicitly. If they really want a ForwardPRNG, then a simple adaptor that adds save in terms of dup is provided.
June 18, 2013
On 06/18/2013 10:30 AM, Joseph Rushton Wakeling wrote:
> I don't come to that conclusion because I _want_ random ranges to be un-.save-able, but because I think without that design choice, there will simply be too many ways to unknowingly generate unwanted correlations in random-number-using programs.
> 
> I'll follow up on that point later today.

Just as a simple example -- and this involving purely pseudo-random number generation, not "random ranges" as I've conceived them [*] -- consider the following:

    auto t1 = rndGen.take(5);
    writeln(t1);

    auto t2 = rndGen.take(5);
    writeln(t2);

I'd expect that to produce two distinct sequences.  In fact it produces two identical sequences.

I think this must be because, passed a forward range like Mt19937, std.range.Take uses this constructor:

    this(R input) { _original = input; _current = input.save; }

... and so when rndGen.take(5) is consumed, rndGen itself is not iterated forward.

To me that's a very serious potential source of statistical errors in a program, and it's particularly bad because it's innocuous -- absent the writeln of the results, it would be easy to never realize what's happening here.

Now imagine the potential consequences for D's use in scientific simulation -- I don't want to see D get raked over the coals because some researcher's published simulation results turned out to be spurious ... :-)

I'd be very happy to see a way in which this issue can be resolved while preserving the opportunity to .save, but I don't personally see one that doesn't rely on a heavy amount of user virtue to avoid these statistical pitfalls.


[* Actually, rndGen.take(n) is arguably a random range according to my
definition, because rndGen.take(n).popFront() will request a (pseudo-)random
number.]
June 18, 2013
On 06/18/2013 11:00 AM, monarch_dodra wrote:
> The last issue is input vs forward. IMO, PRNG's should be input by *default*, and forward (when possible) on demand.

I think that's potentially a very good design concept, and might offer a friendly way round the issues I mentioned with rndGen.take() in my email of a few minutes ago.

> Making a PRNG forward by default means you can't prevent an algorithm from saving your range. That means that *even* if you have a reference type, you could still have duplicate series.

Indeed -- I was about to follow up on my rndGen.take example with one involving a final class version of MersenneTwisterEngine, but I think it's obvious without the example :-)

> In the above example with dual call to "array", one can *hope* to have two different
> ranges... but there is nothing preventing array from calling "save" just to
> troll the user. fill does (used to) do it :D
> 
> I was also able to implement this pretty easily: Just give the saveable ranges the "dup" primitive. Users of the PRNG can then explicitly dup if they want to very safely and explicitly. If they really want a ForwardPRNG, then a simple adaptor that adds save in terms of dup is provided.

I think in principle one could have a .forward() method/property that would return a forward-range wrapper of the range in question.

So, with this concept, one would expect,

    auto t1 = rndGen.take(5);
    writeln(t1);

    auto t2 = rndGen.take(5);
    writeln(t2);

... to produce _different_ sequences (as rndGen would be merely an input range, and not .save-able), whereas

    auto gen = rndGen.forward;

    auto t1 = gen.take(5);
    writeln(t1);

    auto t2 = gen.take(5);
    writeln(t2);

.... would produce identical sequences as before.

It could in turn be a good design principle that pseudo-random sequences (and random ranges deriving from pseudo-random sequences) should define the .forward property, whereas "true" random sequences and their derivatives would lack it.
June 18, 2013
On Tuesday, 18 June 2013 at 10:16:02 UTC, Joseph Rushton Wakeling wrote:
> std.range.Take uses this constructor:
>
>     this(R input) { _original = input; _current = input.save; }

It shouldn't. It is the caller who should chose to (or not to) save.
June 18, 2013
On 06/18/2013 11:31 AM, monarch_dodra wrote:
> On Tuesday, 18 June 2013 at 10:16:02 UTC, Joseph Rushton Wakeling wrote:
>> std.range.Take uses this constructor:
>>
>>     this(R input) { _original = input; _current = input.save; }
> 
> It shouldn't. It is the caller who should chose to (or not to) save.

Ack, you're quite right.  Take doesn't even have a constructor -- that's a constructor for Cycle.

I agree with your principle about it being the caller's decision whether or not to .save, but can that be assumed in general?  I thought there were Phobos algorithms or ranges that deliberately took saved copies for internal use.
June 18, 2013
On 06/18/2013 11:54 AM, Joseph Rushton Wakeling wrote:
> On 06/18/2013 11:31 AM, monarch_dodra wrote:
>> On Tuesday, 18 June 2013 at 10:16:02 UTC, Joseph Rushton Wakeling wrote:
>>> std.range.Take uses this constructor:
>>>
>>>     this(R input) { _original = input; _current = input.save; }
>>
>> It shouldn't. It is the caller who should chose to (or not to) save.
> 
> Ack, you're quite right.  Take doesn't even have a constructor -- that's a constructor for Cycle.

Proof that I was wrong -- please find attached a test module that implements a final class (hence, reference type) version of Mersenne Twister.

If we then do,

    auto gen = new MtClass19937(unpredictableSeed);

    auto t1 = gen.take(5);
    auto t2 = gen.take(5);
    writeln(t1);
    writeln(t2);

... we get different sequences.  However, the .save as implemented needs rewriting (I haven't touched it yet) as currently it simply returns a reference copy.  I'll get onto that. :-)

Anyway, _assuming_ that we can rely on algorithms and ranges to not surreptitiously .save the random number generator, this does seem to offer a genuine alternative to my more problematic proposal.  I'm very happy about this as I had feared my own idea was "least bad" rather than "best", never a nice position to be in.


June 18, 2013
On 6/18/13 4:57 AM, Joseph Rushton Wakeling wrote:
> On 06/18/2013 05:00 AM, Andrei Alexandrescu wrote:
>> Once you consume an input range, there's no way to consume it again. It's done.
>
> Sure, but what I'm proposing is a stronger constraint than "random ranges should
> be input ranges".

I'm not sure I understand what you are proposing. If a true random range is to be an input range with additional constraints, one simple thing is once .empty returns true there's no more extracting elements from it.

Andrei
June 18, 2013
On 06/18/2013 12:09 PM, Joseph Rushton Wakeling wrote:
> ... we get different sequences.  However, the .save as implemented needs rewriting (I haven't touched it yet) as currently it simply returns a reference copy.  I'll get onto that. :-)

Tweaked version attached.  This adds a new constructor which copies a PRNG of the same type:

    this(typeof(this) source)
    {
        this.mt[] = source.mt[];
        this.mti = source.mti;
        this._y = source._y;
    }

... and .save then makes use of this to return a duplicate:

    @property typeof(this) save()
    {
        return new typeof(this)(this);
    }

With this in hand, we can do,

    auto gen = new MtClass19937(unpredictableSeed);

    auto t1 = gen.take(5);
    auto t2 = gen.take(5);
    writeln(t1);
    writeln(t2);

... and get two independent sequences, whereas,

    auto t3 = gen.take(5);
    auto t4 = gen.save.take(5);
    writeln(t3);
    writeln(t4);

... produces two identical sequences.

This still requires some intelligence/virtue on behalf of the programmer, though: if we do,

    auto t5 = gen.save.take(5);
    writeln(t5);

... and don't also take 5 from gen itself, then subsequent uses of gen will be correlated with t5, probably unintentionally.  However, I think that's a fair tradeoff for the ability to .save at all.

It's also now trivial to rewrite things like my SimpleRandomRange such that,

    auto gen = new MtClass19937(unpredictableSeed);
    auto r = simpleRandomRange(0.0L, 1.0L, gen);

    auto t1 = r5.take(5);
    auto t2 = r5.take(5);
    writeln(t1);
    writeln(t2);

... produces different sequences, whereas,

    auto t3 = r5.take(5);
    auto t4 = r5.save.take(5);
    writeln(t3);
    writeln(t4);

... produce the same.  Ditto for "real" random ranges, such as std.random.RandomSample.

So we have a solution to the first set of problems identified, that is much nicer than the one I had in mind.  Very good!

However, at least one more niggle remains -- to be followed up on.