Thread overview
std.random2
Jan 08, 2013
ixid
Jan 08, 2013
H. S. Teoh
Jan 08, 2013
monarch_dodra
Jan 08, 2013
Dmitry Olshansky
Jan 09, 2013
Fawzi Mohamed
January 08, 2013
Hello all,

Following discussion on the pull request adding normal random number generation to std.random [ https://github.com/D-Programming-Language/phobos/pull/1029 ], some issues were raised which are best discussed with the whole community.

The heart of this is the design of pseudo-random number generators (PRNGs) in Phobos.  Currently these are implemented as value types, which has a number of unpleasant effects:

   -- They're expensive to pass around (some PRNGs have a size of a MB or more)

   -- Passing by value is statistically unsafe, as it can result in identical
      random sequences being generated in different parts of the code.  This
      already affects at least one part of std.random itself: see,
      http://d.puremagic.com/issues/show_bug.cgi?id=8247
      http://forum.dlang.org/thread/oiczxzkzxketxitncghl@forum.dlang.org

   -- Simply passing by reference isn't an adequate solution, as there will be
      cases (as with RandomSample, detailed in bug 8247) where you have to
      store the RNG.  Storing e.g. a pointer or reference would be unsafe; the
      only adequate solution is that PRNGs be (safe) reference types.

monarch_dodra did some work on this which was set aside in the short term because it would (unavoidably) be a breaking change [ https://github.com/D-Programming-Language/phobos/pull/893 ].  To avoid this, the proposed solution is to create a std.random2.

However, these issues seem to me to have broader implications for the design of random-number functionality in Phobos, so if we're going to do a std.random2, it's worth mapping them out so as to get them right.

The most obvious (to me) is that these issues which apply to PRNGs apply equally well to random number distributions.  For example the Ziggurat algorithm requires storing several hundred constants, so passing by value is expensive; and several different algorithms generate and store multiple random variates at a time, so copying/passing by value will result in unintended correlations in sequences of variates.

This has further implications if for example we want to create a VariateGenerator (or perhaps in D-ish terms, a VariateRange) which couples a random distribution with a PRNG -- this is unlikely to work unless both the random distribution and the PRNG are reference types.

Finally, there are more general issues about how new functionality should be implemented.  C++11 is given as a model in the std.random documentation, but this is clearly a guide rather than something to copy blindly -- existing functions and structs already deviate from it in ways that reflect D's preference for ranges and its superior generics.  We need a clear picture of how to do this for functionality that has not yet been implemented.

For example: in the case of random distributions, the current example of uniform() offers only a function interface and therefore little guidance about how to create struct implementations for more complex algorithms which require persistent storage (e.g. Ziggurat or Box-Muller).  Should they follow C++11/Boost.Random in returning variates via opCall, or should they be coupled with a PRNG at construction time (as with RandomSample) and implement a range of variates?

My inclination here is to take some time to map out the different interface/design options and to present the choices to the community for review as a precursor to creating a std.random2.  It seems the only really sensible choice to ensure that we get a good future-proof design.

What does everyone think?

Best wishes,

    -- Joe
January 08, 2013
On Tuesday, 8 January 2013 at 17:52:24 UTC, Joseph Rushton Wakeling wrote:
> Hello all,
>
> Following discussion on the pull request adding normal random number generation to std.random [ https://github.com/D-Programming-Language/phobos/pull/1029 ], some issues were raised which are best discussed with the whole community.
>
> The heart of this is the design of pseudo-random number generators (PRNGs) in Phobos.  Currently these are implemented as value types, which has a number of unpleasant effects:
>
>    -- They're expensive to pass around (some PRNGs have a size of a MB or more)
>
>    -- Passing by value is statistically unsafe, as it can result in identical
>       random sequences being generated in different parts of the code.  This
>       already affects at least one part of std.random itself: see,
>       http://d.puremagic.com/issues/show_bug.cgi?id=8247
>       http://forum.dlang.org/thread/oiczxzkzxketxitncghl@forum.dlang.org
>
>    -- Simply passing by reference isn't an adequate solution, as there will be
>       cases (as with RandomSample, detailed in bug 8247) where you have to
>       store the RNG.  Storing e.g. a pointer or reference would be unsafe; the
>       only adequate solution is that PRNGs be (safe) reference types.
>
> monarch_dodra did some work on this which was set aside in the short term because it would (unavoidably) be a breaking change [ https://github.com/D-Programming-Language/phobos/pull/893 ].  To avoid this, the proposed solution is to create a std.random2.
>
> However, these issues seem to me to have broader implications for the design of random-number functionality in Phobos, so if we're going to do a std.random2, it's worth mapping them out so as to get them right.
>
> The most obvious (to me) is that these issues which apply to PRNGs apply equally well to random number distributions.  For example the Ziggurat algorithm requires storing several hundred constants, so passing by value is expensive; and several different algorithms generate and store multiple random variates at a time, so copying/passing by value will result in unintended correlations in sequences of variates.
>
> This has further implications if for example we want to create a VariateGenerator (or perhaps in D-ish terms, a VariateRange) which couples a random distribution with a PRNG -- this is unlikely to work unless both the random distribution and the PRNG are reference types.
>
> Finally, there are more general issues about how new functionality should be implemented.  C++11 is given as a model in the std.random documentation, but this is clearly a guide rather than something to copy blindly -- existing functions and structs already deviate from it in ways that reflect D's preference for ranges and its superior generics.  We need a clear picture of how to do this for functionality that has not yet been implemented.
>
> For example: in the case of random distributions, the current example of uniform() offers only a function interface and therefore little guidance about how to create struct implementations for more complex algorithms which require persistent storage (e.g. Ziggurat or Box-Muller).  Should they follow C++11/Boost.Random in returning variates via opCall, or should they be coupled with a PRNG at construction time (as with RandomSample) and implement a range of variates?
>
> My inclination here is to take some time to map out the different interface/design options and to present the choices to the community for review as a precursor to creating a std.random2.  It seems the only really sensible choice to ensure that we get a good future-proof design.
>
> What does everyone think?
>
> Best wishes,
>
>     -- Joe

I imagine there has been some detailed discussion of the std.nameX idea of libraries so forgive me if this has been discussed. Using this as an approach to essentially replacing libraries instead of the depreciation route would seem to potentially lead to a situation where you have std.name1, 2, 3, 4 ad infinitum which may not have exactly the same feature set or use cases and would lead to people hunting around multiple libraries which on the face of it are for the same thing and using them in projects. Would it not be better to bite the bullet and depreciate? random2 simply sounds like random done properly, with random and random2 people would often use random, and so still suffer the consequences of its issues, and possibly overlook random2. You're depreciating because it's broken, not because people want to mess around with trivialities.
January 08, 2013
On 01/08/2013 07:38 PM, ixid wrote:
> I imagine there has been some detailed discussion of the std.nameX idea of
> libraries so forgive me if this has been discussed.

I appreciate your concern on this point, but I don't think it's the right thing to focus on in this specific discussion.

What I really want address is: how do we get the design of std.random _right_?

How we go about incorporating that new design into Phobos with minimal hassle for users is a different issue and one we can face when the time comes.

January 08, 2013
On Tue, Jan 08, 2013 at 07:46:46PM +0100, Joseph Rushton Wakeling wrote:
> On 01/08/2013 07:38 PM, ixid wrote:
> >I imagine there has been some detailed discussion of the std.nameX idea of libraries so forgive me if this has been discussed.
> 
> I appreciate your concern on this point, but I don't think it's the right thing to focus on in this specific discussion.
> 
> What I really want address is: how do we get the design of std.random _right_?
> 
> How we go about incorporating that new design into Phobos with minimal hassle for users is a different issue and one we can face when the time comes.

For one thing, I'd say definitely make RNGs have reference semantics. Passing by value just doesn't make sense; for large generators it consumes too much stack space and risks stack overflow, and in any case copying RNGs unintentionally causes duplicated sequences, which is very bad.

For those cases where you *want* to copy an RNG, it should be made into a forward range and then you use .save explicitly.

I wonder if it even makes sense to force RNGs to inherit from a common base class, so that reference semantics are enforced even for user-defined types. But this may be a bit too heavy-handed (and it will alienate the no-GC crowd).


T

-- 
The peace of mind---from knowing that viruses which exploit Microsoft system vulnerabilities cannot touch Linux---is priceless. -- Frustrated system administrator.
January 08, 2013
On Tuesday, 8 January 2013 at 20:40:00 UTC, H. S. Teoh wrote:
> On Tue, Jan 08, 2013 at 07:46:46PM +0100, Joseph Rushton Wakeling wrote:
>> On 01/08/2013 07:38 PM, ixid wrote:
>> >I imagine there has been some detailed discussion of the std.nameX
>> >idea of libraries so forgive me if this has been discussed.
>> 
>> I appreciate your concern on this point, but I don't think it's the
>> right thing to focus on in this specific discussion.
>> 
>> What I really want address is: how do we get the design of std.random
>> _right_?
>> 
>> How we go about incorporating that new design into Phobos with
>> minimal hassle for users is a different issue and one we can face
>> when the time comes.
>
> For one thing, I'd say definitely make RNGs have reference semantics.
> Passing by value just doesn't make sense; for large generators it
> consumes too much stack space and risks stack overflow, and in any case
> copying RNGs unintentionally causes duplicated sequences, which is very
> bad.
>
> For those cases where you *want* to copy an RNG, it should be made into
> a forward range and then you use .save explicitly.

I'd argue PRNG's should even be forward ranges. For one, the actual saving would be duplicating the PRNG payload itself (very costly). Further more, it would mean the danger of duplicate sequence still exist: a lot of algorithms, such a "fill", start by saving their input, and then iterating on the copy...

I'm all for them having "dup" though.

Worst case scenario, I had written an adaptor that turns a InputRangePRNG into a ForwardRangePRNG, but this is at the *explicit* request of the *user*, and not by any underlying algorithm.

And even then, I'd *still* advise against using this adaptor. If you *must* store and iterate on random values several time, just copy them in an array doing prng.take(N).array(). Clean, efficient, safe.

@jmdavis was against making them input ranges, saying input ranges just aren't useful compared to a forward range. I think that:
1. A PRNG naturally models input, and not forward.
2. Algorithms that operate on actual random sequenes shouldn't rely on the PRNG to be saveable anyways: A *true* RNG simply cannot provide save.

> I wonder if it even makes sense to force RNGs to inherit from a common
> base class, so that reference semantics are enforced even for
> user-defined types. But this may be a bit too heavy-handed (and it will
> alienate the no-GC crowd).

Force: No. However, we *can* provide tools to help with the operation though.

Since the recommended implementation of a PRNG is "pointer to payload", I had written an adapter template which took a "Stack payload implementation" and turned it into a reference semantic type. This meant I pretty much transformed all our PRNGs into reference ranges, without even touching any of their implementations.

If a user doesn't want it, fine.
January 08, 2013
09-Jan-2013 00:38, H. S. Teoh пишет:
> On Tue, Jan 08, 2013 at 07:46:46PM +0100, Joseph Rushton Wakeling wrote:
>> On 01/08/2013 07:38 PM, ixid wrote:
>>> I imagine there has been some detailed discussion of the std.nameX
>>> idea of libraries so forgive me if this has been discussed.
>>
>> I appreciate your concern on this point, but I don't think it's the
>> right thing to focus on in this specific discussion.
>>
>> What I really want address is: how do we get the design of std.random
>> _right_?
>>
>> How we go about incorporating that new design into Phobos with
>> minimal hassle for users is a different issue and one we can face
>> when the time comes.
>
> For one thing, I'd say definitely make RNGs have reference semantics.
> Passing by value just doesn't make sense; for large generators it
> consumes too much stack space and risks stack overflow, and in any case
> copying RNGs unintentionally causes duplicated sequences, which is very
> bad.
>
> For those cases where you *want* to copy an RNG, it should be made into
> a forward range and then you use .save explicitly.
>
> I wonder if it even makes sense to force RNGs to inherit from a common
> base class, so that reference semantics are enforced even for
> user-defined types. But this may be a bit too heavy-handed (and it will
> alienate the no-GC crowd).

I'd push a well working (from my POV) design of polymorphic adapters. The idea is that wrapping is doable but buing the speed back is unsolved problem. BTW there is one for input ranges - InputRangeObject.

Then each generator/distribution defines specific structs and there is one templated polymorphic wrapper that has a common base class.

IMO this gives the best of both worlds (this is how std.digest was designed) at no implementation cost.

The no-GC, performance and "just give me this one little PRGN" needs are served with specific structs.

Then polymorphic behavior with hot-swappable PRNG, "I'm coming from Java/Ruby/..." etc. needs are served with a wrapper + base class or interface. It may even give a per struct aliases of the respective wrapper to be more user-friendly.


-- 
Dmitry Olshansky
January 09, 2013
On 01/08/2013 10:05 PM, monarch_dodra wrote:
> I'd argue PRNG's should even be forward ranges. For one, the actual saving would
> be duplicating the PRNG payload itself (very costly). Further more, it would
> mean the danger of duplicate sequence still exist: a lot of algorithms, such a
> "fill", start by saving their input, and then iterating on the copy...

That's a good point; I'd not really thought about the implications of .save in that respect.

> I'm all for them having "dup" though.

Got to be able to have the option to save the random state to stop the player reloading to get a different random outcome ... :-)

> @jmdavis was against making them input ranges, saying input ranges just aren't
> useful compared to a forward range. I think that:
> 1. A PRNG naturally models input, and not forward.
> 2. Algorithms that operate on actual random sequenes shouldn't rely on the PRNG
> to be saveable anyways: A *true* RNG simply cannot provide save.

I would say that it's not useful if a PRNG makes it easy for you to shoot yourself in the foot with flawed statistics.

Now, that said, the problem with something like a take(N) approach is that some algorithms might not take a predictable number of random variates.  RandomSample is a good example of that, although fortunately in this case the ForwardRange characteristics are not required.
January 09, 2013
On 01/08/13 22:23, Dmitry Olshansky wrote:
> 09-Jan-2013 00:38, H. S. Teoh пишет:
>> On Tue, Jan 08, 2013 at 07:46:46PM +0100, Joseph Rushton Wakeling wrote:
>>> On 01/08/2013 07:38 PM, ixid wrote:
>>>> I imagine there has been some detailed discussion of the std.nameX
>>>> idea of libraries so forgive me if this has been discussed.
>>>
>>> I appreciate your concern on this point, but I don't think it's the
>>> right thing to focus on in this specific discussion.
>>>
>>> What I really want address is: how do we get the design of std.random
>>> _right_?
>>>
>>> How we go about incorporating that new design into Phobos with
>>> minimal hassle for users is a different issue and one we can face
>>> when the time comes.
>>
>> For one thing, I'd say definitely make RNGs have reference semantics.
>> Passing by value just doesn't make sense; for large generators it
>> consumes too much stack space and risks stack overflow, and in any case
>> copying RNGs unintentionally causes duplicated sequences, which is very
>> bad.
>>
>> For those cases where you *want* to copy an RNG, it should be made into
>> a forward range and then you use .save explicitly.
>>
>> I wonder if it even makes sense to force RNGs to inherit from a common
>> base class, so that reference semantics are enforced even for
>> user-defined types. But this may be a bit too heavy-handed (and it will
>> alienate the no-GC crowd).
>
> I'd push a well working (from my POV) design of polymorphic adapters. The idea is that wrapping is doable but buing the speed back is unsolved problem. BTW there is one for input ranges - InputRangeObject.
>
> Then each generator/distribution defines specific structs and there is one templated polymorphic wrapper that has a common base class.
>
> IMO this gives the best of both worlds (this is how std.digest was designed) at no implementation cost.
>
> The no-GC, performance and "just give me this one little PRGN" needs are served with specific structs.
>
> Then polymorphic behavior with hot-swappable PRNG, "I'm coming from Java/Ruby/..." etc. needs are served with a wrapper + base class or interface. It may even give a per struct aliases of the respective wrapper to be more user-friendly.

I think that is the correct approach, and was what I wanted to write yestarday, but I had no internet... a good idea is bound to crop up again :).
>
>