View mode: basic / threaded / horizontal-split · Log in · Help
January 08, 2013
std.random2
Hello all,

Following discussion on the pull request adding normal random number generation 
to std.random [ https://github.com/D-Programming-Language/phobos/pull/1029 ], 
some issues were raised which are best discussed with the whole community.

The heart of this is the design of pseudo-random number generators (PRNGs) in 
Phobos.  Currently these are implemented as value types, which has a number of 
unpleasant effects:

   -- They're expensive to pass around (some PRNGs have a size of a MB or more)

   -- Passing by value is statistically unsafe, as it can result in identical
      random sequences being generated in different parts of the code.  This
      already affects at least one part of std.random itself: see,
      http://d.puremagic.com/issues/show_bug.cgi?id=8247
      http://forum.dlang.org/thread/oiczxzkzxketxitncghl@forum.dlang.org

   -- Simply passing by reference isn't an adequate solution, as there will be
      cases (as with RandomSample, detailed in bug 8247) where you have to
      store the RNG.  Storing e.g. a pointer or reference would be unsafe; the
      only adequate solution is that PRNGs be (safe) reference types.

monarch_dodra did some work on this which was set aside in the short term 
because it would (unavoidably) be a breaking change [ 
https://github.com/D-Programming-Language/phobos/pull/893 ].  To avoid this, the 
proposed solution is to create a std.random2.

However, these issues seem to me to have broader implications for the design of 
random-number functionality in Phobos, so if we're going to do a std.random2, 
it's worth mapping them out so as to get them right.

The most obvious (to me) is that these issues which apply to PRNGs apply equally 
well to random number distributions.  For example the Ziggurat algorithm 
requires storing several hundred constants, so passing by value is expensive; 
and several different algorithms generate and store multiple random variates at 
a time, so copying/passing by value will result in unintended correlations in 
sequences of variates.

This has further implications if for example we want to create a 
VariateGenerator (or perhaps in D-ish terms, a VariateRange) which couples a 
random distribution with a PRNG -- this is unlikely to work unless both the 
random distribution and the PRNG are reference types.

Finally, there are more general issues about how new functionality should be 
implemented.  C++11 is given as a model in the std.random documentation, but 
this is clearly a guide rather than something to copy blindly -- existing 
functions and structs already deviate from it in ways that reflect D's 
preference for ranges and its superior generics.  We need a clear picture of how 
to do this for functionality that has not yet been implemented.

For example: in the case of random distributions, the current example of 
uniform() offers only a function interface and therefore little guidance about 
how to create struct implementations for more complex algorithms which require 
persistent storage (e.g. Ziggurat or Box-Muller).  Should they follow 
C++11/Boost.Random in returning variates via opCall, or should they be coupled 
with a PRNG at construction time (as with RandomSample) and implement a range of 
variates?

My inclination here is to take some time to map out the different 
interface/design options and to present the choices to the community for review 
as a precursor to creating a std.random2.  It seems the only really sensible 
choice to ensure that we get a good future-proof design.

What does everyone think?

Best wishes,

    -- Joe
January 08, 2013
Re: std.random2
On Tuesday, 8 January 2013 at 17:52:24 UTC, Joseph Rushton 
Wakeling wrote:
> Hello all,
>
> Following discussion on the pull request adding normal random 
> number generation to std.random [ 
> https://github.com/D-Programming-Language/phobos/pull/1029 ], 
> some issues were raised which are best discussed with the whole 
> community.
>
> The heart of this is the design of pseudo-random number 
> generators (PRNGs) in Phobos.  Currently these are implemented 
> as value types, which has a number of unpleasant effects:
>
>    -- They're expensive to pass around (some PRNGs have a size 
> of a MB or more)
>
>    -- Passing by value is statistically unsafe, as it can 
> result in identical
>       random sequences being generated in different parts of 
> the code.  This
>       already affects at least one part of std.random itself: 
> see,
>       http://d.puremagic.com/issues/show_bug.cgi?id=8247
>       
> http://forum.dlang.org/thread/oiczxzkzxketxitncghl@forum.dlang.org
>
>    -- Simply passing by reference isn't an adequate solution, 
> as there will be
>       cases (as with RandomSample, detailed in bug 8247) where 
> you have to
>       store the RNG.  Storing e.g. a pointer or reference would 
> be unsafe; the
>       only adequate solution is that PRNGs be (safe) reference 
> types.
>
> monarch_dodra did some work on this which was set aside in the 
> short term because it would (unavoidably) be a breaking change 
> [ https://github.com/D-Programming-Language/phobos/pull/893 ].  
> To avoid this, the proposed solution is to create a std.random2.
>
> However, these issues seem to me to have broader implications 
> for the design of random-number functionality in Phobos, so if 
> we're going to do a std.random2, it's worth mapping them out so 
> as to get them right.
>
> The most obvious (to me) is that these issues which apply to 
> PRNGs apply equally well to random number distributions.  For 
> example the Ziggurat algorithm requires storing several hundred 
> constants, so passing by value is expensive; and several 
> different algorithms generate and store multiple random 
> variates at a time, so copying/passing by value will result in 
> unintended correlations in sequences of variates.
>
> This has further implications if for example we want to create 
> a VariateGenerator (or perhaps in D-ish terms, a VariateRange) 
> which couples a random distribution with a PRNG -- this is 
> unlikely to work unless both the random distribution and the 
> PRNG are reference types.
>
> Finally, there are more general issues about how new 
> functionality should be implemented.  C++11 is given as a model 
> in the std.random documentation, but this is clearly a guide 
> rather than something to copy blindly -- existing functions and 
> structs already deviate from it in ways that reflect D's 
> preference for ranges and its superior generics.  We need a 
> clear picture of how to do this for functionality that has not 
> yet been implemented.
>
> For example: in the case of random distributions, the current 
> example of uniform() offers only a function interface and 
> therefore little guidance about how to create struct 
> implementations for more complex algorithms which require 
> persistent storage (e.g. Ziggurat or Box-Muller).  Should they 
> follow C++11/Boost.Random in returning variates via opCall, or 
> should they be coupled with a PRNG at construction time (as 
> with RandomSample) and implement a range of variates?
>
> My inclination here is to take some time to map out the 
> different interface/design options and to present the choices 
> to the community for review as a precursor to creating a 
> std.random2.  It seems the only really sensible choice to 
> ensure that we get a good future-proof design.
>
> What does everyone think?
>
> Best wishes,
>
>     -- Joe

I imagine there has been some detailed discussion of the 
std.nameX idea of libraries so forgive me if this has been 
discussed. Using this as an approach to essentially replacing 
libraries instead of the depreciation route would seem to 
potentially lead to a situation where you have std.name1, 2, 3, 4 
ad infinitum which may not have exactly the same feature set or 
use cases and would lead to people hunting around multiple 
libraries which on the face of it are for the same thing and 
using them in projects. Would it not be better to bite the bullet 
and depreciate? random2 simply sounds like random done properly, 
with random and random2 people would often use random, and so 
still suffer the consequences of its issues, and possibly 
overlook random2. You're depreciating because it's broken, not 
because people want to mess around with trivialities.
January 08, 2013
Re: std.random2
On 01/08/2013 07:38 PM, ixid wrote:
> I imagine there has been some detailed discussion of the std.nameX idea of
> libraries so forgive me if this has been discussed.

I appreciate your concern on this point, but I don't think it's the right thing 
to focus on in this specific discussion.

What I really want address is: how do we get the design of std.random _right_?

How we go about incorporating that new design into Phobos with minimal hassle 
for users is a different issue and one we can face when the time comes.
January 08, 2013
Re: std.random2
On Tue, Jan 08, 2013 at 07:46:46PM +0100, Joseph Rushton Wakeling wrote:
> On 01/08/2013 07:38 PM, ixid wrote:
> >I imagine there has been some detailed discussion of the std.nameX
> >idea of libraries so forgive me if this has been discussed.
> 
> I appreciate your concern on this point, but I don't think it's the
> right thing to focus on in this specific discussion.
> 
> What I really want address is: how do we get the design of std.random
> _right_?
> 
> How we go about incorporating that new design into Phobos with
> minimal hassle for users is a different issue and one we can face
> when the time comes.

For one thing, I'd say definitely make RNGs have reference semantics.
Passing by value just doesn't make sense; for large generators it
consumes too much stack space and risks stack overflow, and in any case
copying RNGs unintentionally causes duplicated sequences, which is very
bad.

For those cases where you *want* to copy an RNG, it should be made into
a forward range and then you use .save explicitly.

I wonder if it even makes sense to force RNGs to inherit from a common
base class, so that reference semantics are enforced even for
user-defined types. But this may be a bit too heavy-handed (and it will
alienate the no-GC crowd).


T

-- 
The peace of mind---from knowing that viruses which exploit Microsoft
system vulnerabilities cannot touch Linux---is priceless. -- Frustrated
system administrator.
January 08, 2013
Re: std.random2
On Tuesday, 8 January 2013 at 20:40:00 UTC, H. S. Teoh wrote:
> On Tue, Jan 08, 2013 at 07:46:46PM +0100, Joseph Rushton 
> Wakeling wrote:
>> On 01/08/2013 07:38 PM, ixid wrote:
>> >I imagine there has been some detailed discussion of the 
>> >std.nameX
>> >idea of libraries so forgive me if this has been discussed.
>> 
>> I appreciate your concern on this point, but I don't think 
>> it's the
>> right thing to focus on in this specific discussion.
>> 
>> What I really want address is: how do we get the design of 
>> std.random
>> _right_?
>> 
>> How we go about incorporating that new design into Phobos with
>> minimal hassle for users is a different issue and one we can 
>> face
>> when the time comes.
>
> For one thing, I'd say definitely make RNGs have reference 
> semantics.
> Passing by value just doesn't make sense; for large generators 
> it
> consumes too much stack space and risks stack overflow, and in 
> any case
> copying RNGs unintentionally causes duplicated sequences, which 
> is very
> bad.
>
> For those cases where you *want* to copy an RNG, it should be 
> made into
> a forward range and then you use .save explicitly.

I'd argue PRNG's should even be forward ranges. For one, the 
actual saving would be duplicating the PRNG payload itself (very 
costly). Further more, it would mean the danger of duplicate 
sequence still exist: a lot of algorithms, such a "fill", start 
by saving their input, and then iterating on the copy...

I'm all for them having "dup" though.

Worst case scenario, I had written an adaptor that turns a 
InputRangePRNG into a ForwardRangePRNG, but this is at the 
*explicit* request of the *user*, and not by any underlying 
algorithm.

And even then, I'd *still* advise against using this adaptor. If 
you *must* store and iterate on random values several time, just 
copy them in an array doing prng.take(N).array(). Clean, 
efficient, safe.

@jmdavis was against making them input ranges, saying input 
ranges just aren't useful compared to a forward range. I think 
that:
1. A PRNG naturally models input, and not forward.
2. Algorithms that operate on actual random sequenes shouldn't 
rely on the PRNG to be saveable anyways: A *true* RNG simply 
cannot provide save.

> I wonder if it even makes sense to force RNGs to inherit from a 
> common
> base class, so that reference semantics are enforced even for
> user-defined types. But this may be a bit too heavy-handed (and 
> it will
> alienate the no-GC crowd).

Force: No. However, we *can* provide tools to help with the 
operation though.

Since the recommended implementation of a PRNG is "pointer to 
payload", I had written an adapter template which took a "Stack 
payload implementation" and turned it into a reference semantic 
type. This meant I pretty much transformed all our PRNGs into 
reference ranges, without even touching any of their 
implementations.

If a user doesn't want it, fine.
January 08, 2013
Re: std.random2
09-Jan-2013 00:38, H. S. Teoh пишет:
> On Tue, Jan 08, 2013 at 07:46:46PM +0100, Joseph Rushton Wakeling wrote:
>> On 01/08/2013 07:38 PM, ixid wrote:
>>> I imagine there has been some detailed discussion of the std.nameX
>>> idea of libraries so forgive me if this has been discussed.
>>
>> I appreciate your concern on this point, but I don't think it's the
>> right thing to focus on in this specific discussion.
>>
>> What I really want address is: how do we get the design of std.random
>> _right_?
>>
>> How we go about incorporating that new design into Phobos with
>> minimal hassle for users is a different issue and one we can face
>> when the time comes.
>
> For one thing, I'd say definitely make RNGs have reference semantics.
> Passing by value just doesn't make sense; for large generators it
> consumes too much stack space and risks stack overflow, and in any case
> copying RNGs unintentionally causes duplicated sequences, which is very
> bad.
>
> For those cases where you *want* to copy an RNG, it should be made into
> a forward range and then you use .save explicitly.
>
> I wonder if it even makes sense to force RNGs to inherit from a common
> base class, so that reference semantics are enforced even for
> user-defined types. But this may be a bit too heavy-handed (and it will
> alienate the no-GC crowd).

I'd push a well working (from my POV) design of polymorphic adapters. 
The idea is that wrapping is doable but buing the speed back is unsolved 
problem. BTW there is one for input ranges - InputRangeObject.

Then each generator/distribution defines specific structs and there is 
one templated polymorphic wrapper that has a common base class.

IMO this gives the best of both worlds (this is how std.digest was 
designed) at no implementation cost.

The no-GC, performance and "just give me this one little PRGN" needs are 
served with specific structs.

Then polymorphic behavior with hot-swappable PRNG, "I'm coming from 
Java/Ruby/..." etc. needs are served with a wrapper + base class or 
interface. It may even give a per struct aliases of the respective 
wrapper to be more user-friendly.


-- 
Dmitry Olshansky
January 09, 2013
Re: std.random2
On 01/08/2013 10:05 PM, monarch_dodra wrote:
> I'd argue PRNG's should even be forward ranges. For one, the actual saving would
> be duplicating the PRNG payload itself (very costly). Further more, it would
> mean the danger of duplicate sequence still exist: a lot of algorithms, such a
> "fill", start by saving their input, and then iterating on the copy...

That's a good point; I'd not really thought about the implications of .save in 
that respect.

> I'm all for them having "dup" though.

Got to be able to have the option to save the random state to stop the player 
reloading to get a different random outcome ... :-)

> @jmdavis was against making them input ranges, saying input ranges just aren't
> useful compared to a forward range. I think that:
> 1. A PRNG naturally models input, and not forward.
> 2. Algorithms that operate on actual random sequenes shouldn't rely on the PRNG
> to be saveable anyways: A *true* RNG simply cannot provide save.

I would say that it's not useful if a PRNG makes it easy for you to shoot 
yourself in the foot with flawed statistics.

Now, that said, the problem with something like a take(N) approach is that some 
algorithms might not take a predictable number of random variates.  RandomSample 
is a good example of that, although fortunately in this case the ForwardRange 
characteristics are not required.
January 09, 2013
Re: std.random2
On 01/08/13 22:23, Dmitry Olshansky wrote:
> 09-Jan-2013 00:38, H. S. Teoh пишет:
>> On Tue, Jan 08, 2013 at 07:46:46PM +0100, Joseph Rushton Wakeling wrote:
>>> On 01/08/2013 07:38 PM, ixid wrote:
>>>> I imagine there has been some detailed discussion of the std.nameX
>>>> idea of libraries so forgive me if this has been discussed.
>>>
>>> I appreciate your concern on this point, but I don't think it's the
>>> right thing to focus on in this specific discussion.
>>>
>>> What I really want address is: how do we get the design of std.random
>>> _right_?
>>>
>>> How we go about incorporating that new design into Phobos with
>>> minimal hassle for users is a different issue and one we can face
>>> when the time comes.
>>
>> For one thing, I'd say definitely make RNGs have reference semantics.
>> Passing by value just doesn't make sense; for large generators it
>> consumes too much stack space and risks stack overflow, and in any case
>> copying RNGs unintentionally causes duplicated sequences, which is very
>> bad.
>>
>> For those cases where you *want* to copy an RNG, it should be made into
>> a forward range and then you use .save explicitly.
>>
>> I wonder if it even makes sense to force RNGs to inherit from a common
>> base class, so that reference semantics are enforced even for
>> user-defined types. But this may be a bit too heavy-handed (and it will
>> alienate the no-GC crowd).
>
> I'd push a well working (from my POV) design of polymorphic adapters. 
> The idea is that wrapping is doable but buing the speed back is 
> unsolved problem. BTW there is one for input ranges - InputRangeObject.
>
> Then each generator/distribution defines specific structs and there is 
> one templated polymorphic wrapper that has a common base class.
>
> IMO this gives the best of both worlds (this is how std.digest was 
> designed) at no implementation cost.
>
> The no-GC, performance and "just give me this one little PRGN" needs 
> are served with specific structs.
>
> Then polymorphic behavior with hot-swappable PRNG, "I'm coming from 
> Java/Ruby/..." etc. needs are served with a wrapper + base class or 
> interface. It may even give a per struct aliases of the respective 
> wrapper to be more user-friendly.

I think that is the correct approach, and was what I wanted to write 
yestarday, but I had no internet... a good idea is bound to crop up 
again :).
>
>
Top | Discussion index | About this forum | D home