Ranges and random numbers -- again (page 2)

On 06/17/2013 09:36 PM, monarch_dodra wrote: > Being able to *save* a random range (which your proposal would prevent) can have useful applications too. It means you can iterate on the same random number sequence several times, lazily, without having to store the results in a buffer. One further remark on this. I agree that it would be nice to be able to .save where possible -- that is, if one has a pseudo-random number sequence one should be able to save, and only if the source of randomness is "truly" random should the adapter range be an InputRange rather than ForwardRange. I concluded that this wasn't feasible more out of despair than desire, because I felt that deterministic, .save-able behaviour had its own traps that were potentially severe, because they would involve generating unintended statistical correlations that the user probably wouldn't notice.

On Mon, Jun 17, 2013 at 11:18:36PM +0100, Joseph Rushton Wakeling wrote: > On 06/17/2013 09:36 PM, monarch_dodra wrote: > > Good analysis but (sorry) I think you are on the wrong track. > > > > One of the major problems in std.random is that the ranges use value semantics. This means they are *saved* whenever they are copied, as opposed to just referenced. This creates the problems you have mentioned, and even more. > > I agree that the fact that pseudo-random number generators use value semantics is a serious problem, and I was thinking of our previous discussions in preparing these remarks. Yeah we need to change RNGs to have reference semantics. I consider that a major design flaw in std.random. [...] > > I have tried to fix it before: http://forum.dlang.org/thread/oiczxzkzxketxitncghl@forum.dlang.org FWI, I gave up on the project, because it was too complex for me to handle an entire module. But there were no reasons for it to not work. > > I remember your work and was sad to see that it was not accepted -- actually one reason to start this discussion was to try and push awareness back to your contributions :-) [...] What were the reasons it was not accepted? T -- If it breaks, you get to keep both pieces. -- Software disclaimer notice

On 06/18/2013 12:10 AM, H. S. Teoh wrote: > What were the reasons it was not accepted? I think mostly that, as a patch to std.random, it was a breaking change. There was a proposal to make a std.random2 but that would have been a rather more major piece of work :-(

On 06/17/2013 11:36 PM, monarch_dodra wrote: > I could even add: > > SomeRandomForwardRange r; > auto arr1 = array(r.save); > auto arr2 = array(r); > assert(x == y); // This time both are the same Try: auto gen2 = new MtClass19937(unpredictableSeed); /* the above is just MersenneTwister tweaked to be * a final class, hence reference type */ auto r5 = simpleRandomRange(0.0L, 1.0L, gen2); auto arr1 = array(r5.save.take(5)); auto arr2 = array(r5.take(5)); writeln(arr1); writeln(arr2); ... and you get two different sequences of numbers.

On 6/17/13 6:29 PM, Joseph Rushton Wakeling wrote: > On 06/17/2013 11:18 PM, Joseph Rushton Wakeling wrote: >>> A random range should be viewed (IMO) as nothing more than a range that "was" >>> (conceptually) simply filled with random numbers. Calling front on the same >>> range twice in a row *should* produce the same result: No call to popFront => no >>> change to the range. If it did change, it'd be a blatant violation of the range >>> concept. It also means you can't have safe/nothrow/pure/const "front". >> >> Completely agree, and I don't think this is in contradiction with what I've >> proposed. My proposed "rule" might be better stated to clarify this. > > Perhaps this would be a better statement: > > ************************************************************************ > * Iterating fully over a given random range should produce a different * > * sequence for each such complete iteration. * > ************************************************************************ > > So, if you do, > > SomeRandomRange r; > x = r.front; > y = r.front; > assert(x == y); // Passes!! > > But > > SomeRandomRange r; > arr1 = array(r); > arr2 = array(r); > assert(x != y); // the two arrays are filled with different sequences. Once you consume an input range, there's no way to consume it again. It's done. Andrei

On Tuesday, June 18, 2013 00:16:24 Joseph Rushton Wakeling wrote: > On 06/18/2013 12:10 AM, H. S. Teoh wrote: > > What were the reasons it was not accepted? > > I think mostly that, as a patch to std.random, it was a breaking change. There was a proposal to make a std.random2 but that would have been a rather more major piece of work :-( Yeah. Changing std.random to use reference types would be a breaking change unless you just created new classes for everything. It would just be cleaner to do std.random2 instead. But that means that someone has to take the time to prepare it for the review queue. It would probably be a pretty easy sell though, since it can probably stay mostly the same aside from the struct -> class change (though at that point, we might as well take the opportunity to make sure that anything else that should be redesigned about it gets redesigned appropriately). - Jonathan M Davis

On 06/18/2013 05:00 AM, Andrei Alexandrescu wrote: > Once you consume an input range, there's no way to consume it again. It's done. Sure, but what I'm proposing is a stronger constraint than "random ranges should be input ranges".

On 06/18/2013 08:06 AM, Jonathan M Davis wrote: > It would probably be a pretty easy sell though, since it can probably stay mostly the same aside from the struct -> class change (though at that point, we might as well take the opportunity to make sure that anything else that should be redesigned about it gets redesigned appropriately). Yea, this is also my feeling, which is part of why I'm pushing this concept of "random ranges" -- I want to ensure that the related issues are properly understood and discussed and some well-thought-out design patterns are prepared in order to ensure good and statistically reliable functionality in std.random2. One small note -- I'd have thought that a struct with an internal pointer-to-payload (allocated using manual memory management, not GC) would have been a superior design for pseudo-random number generators compared to making them final classes. The latter is just the easiest thing to do for simple tests of PRNG-as-reference-type.

June 18, 2013

Re: Ranges and random numbers -- again

Posted by Joseph Rushton Wakeling
in reply to Andrei Alexandrescu

Permalink

Joseph Rushton Wakeling

Posted in reply to Andrei Alexandrescu

Permalink

On 06/18/2013 05:00 AM, Andrei Alexandrescu wrote:
> On 6/17/13 6:29 PM, Joseph Rushton Wakeling wrote:
>> On 06/17/2013 11:18 PM, Joseph Rushton Wakeling wrote:
>>>> A random range should be viewed (IMO) as nothing more than a range that "was"
>>>> (conceptually) simply filled with random numbers. Calling front on the same
>>>> range twice in a row *should* produce the same result: No call to popFront
>>>> =>  no
>>>> change to the range. If it did change, it'd be a blatant violation of the range
>>>> concept. It also means you can't have safe/nothrow/pure/const "front".
>>>
>>> Completely agree, and I don't think this is in contradiction with what I've proposed.  My proposed "rule" might be better stated to clarify this.
>>
>> Perhaps this would be a better statement:
>>
>>      ************************************************************************
>>      * Iterating fully over a given random range should produce a different *
>>      * sequence for each such complete iteration.                           *
>>      ************************************************************************
>>
>> So, if you do,
>>
>>      SomeRandomRange r;
>>      x = r.front;
>>      y = r.front;
>>      assert(x == y);  // Passes!!
>>
>> But
>>
>>      SomeRandomRange r;
>>      arr1 = array(r);
>>      arr2 = array(r);
>>      assert(x != y);  // the two arrays are filled with different sequences.
> 
> Once you consume an input range, there's no way to consume it again. It's done.

Let me be certain I understand what you're saying.  Is it that something like this:

    SomeInputRange r;
    arr1 = array(r);
    arr2 = array(r);

... is not legit?  Or are you saying that the properties I've described are simply the properties of input ranges?

If the latter, then note that what I'm proposing is something stronger than just, "Random ranges should be input ranges."  Once again I should rephrase: "Iterating fully over a given random range should produce a different and statistically independent sequence for each such complete iteration."

I don't come to that conclusion because I _want_ random ranges to be un-.save-able, but because I think without that design choice, there will simply be too many ways to unknowingly generate unwanted correlations in random-number-using programs.

I'll follow up on that point later today.

Forums