March 20, 2014
Joseph Rushton Wakeling:

> Thanks for pointing me to the bug report; I'd forgotten that this was an open issue :-)

In Bugzilla probably there are many bug reports/enhancement requests about std.random, so I suggest you to read them. Some of them can be useful, while other are probably already addressed in the current (or planned) std.random2.

Another random one that was just commented by Infiltrator:
https://d.puremagic.com/issues/show_bug.cgi?id=5901

Bye,
bearophile
March 20, 2014
On Wednesday, 19 March 2014 at 23:49:41 UTC, Joseph Rushton Wakeling wrote:
> Hello all,
>
> As some of you may already know, monarch_dodra and I have spent quite a lot of time over the last year discussing the state of std.random.  To cut a long story short, there are significant problems that arise because the current RNGs are value types rather than reference types.

Any chance that you could describe them? I was about to resume porting the dcrypt library into Phobos, and had intended to flip the classes into structs, to match what the rest of the library was doing.
March 20, 2014
On Thursday, 20 March 2014 at 01:32:41 UTC, Chris Williams wrote:
> On Wednesday, 19 March 2014 at 23:49:41 UTC, Joseph Rushton Wakeling wrote:
>> Hello all,
>>
>> As some of you may already know, monarch_dodra and I have spent quite a lot of time over the last year discussing the state of std.random.  To cut a long story short, there are significant problems that arise because the current RNGs are value types rather than reference types.
>
> Any chance that you could describe them? I was about to resume porting the dcrypt library into Phobos, and had intended to flip the classes into structs, to match what the rest of the library was doing.

The issue isn't class vs struct, but rather "value semantic" vs "reference semantic" (classes are always ref, but structs can be either). Basically, if you make a copy, and modify the copy, will the original range be modified?

The problem with "value semantics" is that it always un-expected duplication of the range, which is a critical blocker problem as far as random goes.

The tell-tale usecase is:

//----
    auto g = rndGen();
    g.take(10).writeln();
    g.take(10).writeln();
//----

This will write the same sequence... TWICE!
March 20, 2014
On Wednesday, 19 March 2014 at 23:49:41 UTC, Joseph Rushton Wakeling wrote:
>    * std.random2.distribution, random distributions such as uniform,
>      normal, etc.;

Related: please consider using parts of SimpleRNG the excellent work of John D. Cook which provides many random distributions in a compact and documented way.

https://github.com/p0nce/gfm/blob/master/math/gfm/math/simplerng.d (here a port)
March 20, 2014
On Thursday, 20 March 2014 at 00:09:51 UTC, bearophile wrote:
> Joseph Rushton Wakeling:
>>   * std.random2.adaptor, random "adaptors" such as randomShuffle,
>>     randomSample, etc.
>
> Please don't use stuttering names like "std.random2.randomShuffle". "std.random2.shuffle" is enough.

Agreed.

`randomShuffle` can be made a deprecated alias: This way, "random2" should still be mostly "drop in replacement", but we won't drag along the bad names.

>> My own feeling is that ultimately it is a responsibility of the language to offer nice ways to allocate classes without necessarily relying on new or the GC.
>
> I don't think the language is yet there. So I think currently this is not a good idea.

I think there is 0 doubt that reference semantics is the way to go. An advantage of using class is that it is still *possible* to place them on the stack with Scoped, or with some future language mechanic. On the other hand, if we implement as reference structs, then that's that.

Furthermore, even in terms of performance, I think a heap allocated PRNG will still flat-out beat the value based one, if only because of the size of the damn thing.

That said, being able to allocate them on the malloc heap, and not the GC heap, would be (IMO) also a valid design.

A simple and dumb design might be to still implement them with value semantic but:
1. Disable postblit.
2. Make .save() return a "Random*"
This would mean
1. No dangers of accidental copy.
2. Range* is a ForwardRange.
3. Trivially allows GC/malloc/stack allocation.
With good aliases ("alias Random = RadomImpl*;"), and a "make!" template we could make the "default useage" transparent to this mechanism yet make it easy to get our hands under the hood.

But at this point, we are really beating around the bush on this issue. There are two things for sure:
1. Reference semantics by default.
2. There comes a point where we have to move forward.

I didn't check the code yet, but a "middle ground" could be to make all constructors private, and disable T.init. Then, we force construction through a make! template.

This might not be what's most convenient, but it would allow us to potentially change the design at a later stage, without breaking user code.

> Do you have a simple but very fast function that generates uniforms in [0.0, 1.0]? :-)

AFAIK, the allocation issue is only for ranges? "uniform" is just a function, I don't think it affected by the issue. Even if you are operating on a "passed range", either ranges are reference semantics, and you take by value, or they are value semantic, and you take by ref. Either way, you have to pay for the indirection.
March 20, 2014
monarch_dodra:

> I think there is 0 doubt that reference semantics is the way to go.

I agree.


> Furthermore, even in terms of performance, I think a heap allocated PRNG will still flat-out beat the value based one, if only because of the size of the damn thing.

OK.


>> Do you have a simple but very fast function that generates uniforms in [0.0, 1.0]? :-)
>
> AFAIK, the allocation issue is only for ranges?

Here I was not talking about allocations:
https://d.puremagic.com/issues/show_bug.cgi?id=5240

Bye,
bearophile
March 20, 2014
On Thursday, 20 March 2014 at 08:22:37 UTC, monarch_dodra wrote:
> The issue isn't class vs struct, but rather "value semantic" vs "reference semantic" (classes are always ref, but structs can be either).

That's only completely true if structs are referred to by pointer. ref parameters/returns aren't quite sufficient to keep a struct acting as a reference for all purposes.

But good example. I'll have to consider that when I port the cryptographic prngs.
March 20, 2014
On Wednesday, 19 March 2014 at 23:49:41 UTC, Joseph Rushton Wakeling wrote:
> Hello all,
>
> As some of you may already know, monarch_dodra and I have spent quite a lot of time over the last year discussing the state of std.random.  To cut a long story short, there are significant problems that arise because the current RNGs are value types rather than reference types.  We had quite a lot of back and forth on different design ideas, with a lot of helpful input from others in the community, but at the end of the day there are really only two broad approaches: create structs that implement reference semantics internally, or use classes.  So, as an exercise, I decided to create a class-based std.random.
>
> The preliminary (but comprehensive) results of this are now available here:
> https://github.com/WebDrake/std.random2
>
> Besides re-implementing random number generators as classes rather than structs, the new code splits std.random2 into a package of several different modules:
>
>    * std.random2.generator, pseudo-random number generators;
>
>    * std.random2.device, non-deterministic random sources;
>
>    * std.random2.distribution, random distributions such as uniform,
>      normal, etc.;
>
>    * std.random2.adaptor, random "adaptors" such as randomShuffle,
>      randomSample, etc.
>
>    * std.random2.traits, RNG-specific traits such as isUniformRNG
>      and isSeedable.
>
> A package.d file groups them together so one can still import all together via "import std.random2".  I've also taken the liberty of following the new guideline to place import statements as locally as possible; it was striking how easy and clean this made things, and it should be easy to port that particular change back to std.random.
>
> The new package implements all of the functions, templates and range objects from std.random except for the old std.random.uniformDistribution, whose name I have cannibalized for better purposes.  Some have been updated: the MersenneTwisterEngine has been tweaked to match the corresponding code from Boost.Random, and this in turn has allowed the definition of a 64-bit Mersenne Twister (Mt19937_64) and an alternative 32-bit one (Mt11213b).
>
> There are also a number of entirely new entries.  std.random2.distribution contains not just existing functions such as dice and uniform, but also range-based random distribution classes UniformDistribution, NormalDistribution and DiscreteDistribution; the last of these is effectively a range-based version of dice, and is based on Chris Cain's excellent work here: https://github.com/D-Programming-Language/phobos/pull/1702
>
> The principal weak point in terms of functionality is std.random2.device, where the implemented random devices (based on Posix' /std/random and /std/urandom) are really very primitive and just there to illustrate the principle.  However, since their API is pretty simple (they're just input ranges with min and max defined) there should be plenty of opportunity to improve and extend the internals in future.  Advice and patches are welcome for everything, but particularly here :-)
>
> What's become quite apparent in the course of writing this package is how much more natural it is for ranges implementing randomness to be class objects.  The basic fact that another range can store a copy of an RNG internally without creating a copy-by-value is merely the start: for example, in the case of the class implementation of RandomSample, we no longer need to have complications like,
>
>     @property auto ref front()
>     {
>         assert(!empty);
>         // The first sample point must be determined here to avoid
>         // having it always correspond to the first element of the
>         // input.  The rest of the sample points are determined each
>         // time we call popFront().
>         if (_skip == Skip.None)
>         {
>             initializeFront();
>         }
>         return _input.front;
>     }
>
> that were necessary to avoid bugs like https://d.puremagic.com/issues/show_bug.cgi?id=7936; because the class-based implementation copies by reference, we can just initialize everything in the constructor.  Similarly, issues like https://d.puremagic.com/issues/show_bug.cgi?id=7067 and https://d.puremagic.com/issues/show_bug.cgi?id=8247 just vanish.
>
> Obvious caveats about the approach include the fact that classes need to be new'd, and questions over whether allocation on the heap might create speed issues.  The benchmarks I've run (code available in the repo) seem to suggest that at least the latter is not a worry, but these are obviously things that need to be considered.  My own feeling is that ultimately it is a responsibility of the language to offer nice ways to allocate classes without necessarily relying on new or the GC.
>
> A few remarks on design and other factors:
>
>    * The new range objects have been implemented as final classes for
>      speed purposes.  However, I tried another approach where the RNG
>      class templates were abstract classes, and the individual
>      parameterizations were final-class subclasses of those rather
>      than aliases.  This was noticeably slower.  My OO-fu is not really
>      sufficient to explain this, so if anybody can offer a reason, I'd
>      be happy to learn it.
>
>    * A design question I considered but have not yet pursued: since at
>      least two functions require passing the RNG as the first parameter
>      (dice and discreteDistribution), perhaps this should be made a
>      general design pattern for everything?  It would make it harder to
>      adapt code using the existing std.random but would create a useful
>      uniformity.
>
>    * I would have liked to ensure that every random distribution had
>      both a range- and function-based version.  However, I came to the
>      conclusion that solely function-based versions should be avoided
>      if either (i) the function would need to maintain internal state
>      between calls, or (ii) the function would need to allocate memory
>      per call.  The first is why for example NormalDistribution exists
>      only as a class/range.  The second might in principle raise some
>      objections to dice, but as dice seems to be a reasonably standard
>      function, I kept it in.
>
>    * It might be good to implement helper functions for the individual
>      RNGs (e.g. just as RandomSample has a randomSample helper function
>      to deliver instances, so Mt19937 could have a corresponding
>      mt19937 helper function returning Mt19937 instances seeded in line
>      with helper function parameters).
>
>    * Those with long memories may recall that when I originally wrote
>      up my NormalDistribution code, it was written to allow various
>      "normal engines" to be plugged in; mine was Box-Muller, but jerro
>      also contributed a Ziggurat-based engine.  This could still be
>      provided here, although my own inclination is that it's probably
>      best for Phobos to provide one single good-for-general-purpose-use
>      implementation.
>
> Known issues:
>
>    * While every bugfix I've made in the course of implementing this
>      package has been propagated back to std.random where possible,
>      this package is missing some of the more recent improvements to
>      std.random by other people (e.g. I think it's missing Chris Cain's
>      update to integer-based uniform()).
>
>    * The unittest coverage is overall pretty damn good, but there are
>      weak spots in std.random.distribution and std.random2.device.
>      Some of the "unittests" in these cases are no more than basic
>      developer sanity checks that print results to console, and need
>      to be replaced by well-defined, silent-unless-failed alternatives.
>
>    * Some of the .save functions are implemented with the help of rather
>      odd private constructors; it would probably be much better to redo
>      these in terms of public this(typeof(this)) constructors.
>
>    * The random devices _really_ need to be better.  Consider the current
>      versions as placeholders ... :-)
>
> Finally, a note on authorship: since this is still based very substantially on std.random, I've made an effort to check git logs and ensure that authors and copyright records (and dates of contribution) are correct.  My general principle here has been that listed authors should only include those who've made a substantial contribution (i.e. whole functions, large numbers of unittests, ...), not just various 1-line tweaks.  But if anyone has any objection to any of the names, dates or other credits given, or if anybody would like their name removed (!), just let me know.
>
> I owe a great debt of gratitude to many people here on the forums, and monarch_dodra in particular, for a huge amount of useful discussion, advice and feedback that has made its way into the current code.  Thank you all for your time, thoughts, ideas and patience.
>
> Anyway, please feel free to review, destroy and otherwise do fun stuff with this module.  I hope that some of you will find it immediately useful, but please note that feedback and advice may result in breaking changes -- this is intended to wind up in Phobos, so it really needs to be perfect when it does so.  Let's review it really well and make it happen!
>
> Thanks and best wishes,
>
>     -- Joe

It should be std.pseudorandom (except for /dev/random) :)

Still no cmwc rng... IMO cmwc should replace mt as default RNG. Faster. Looooonger period. More passed tests (if i'm right MT didn't pass testu01). And it is parametric to get faster result or longer period.

http://en.wikipedia.org/wiki/Multiply-with-carry#Complementary-multiply-with-carry_generators
March 20, 2014
On Thursday, 20 March 2014 at 19:04:01 UTC, Andrea Fontana wrote:
> Still no cmwc rng... IMO cmwc should replace mt as default RNG. Faster. Looooonger period. More passed tests (if i'm right MT didn't pass testu01). And it is parametric to get faster result or longer period.
>
> http://en.wikipedia.org/wiki/Multiply-with-carry#Complementary-multiply-with-carry_generators

Would a Lagged Fibonacci generator instead fit your needs? I wrote one, but it was held of until `random` was updated.

It's goal, first, is to replace the old module. It'll add new stuff once it has achieved that goal.
March 20, 2014
On Thursday, 20 March 2014 at 01:07:54 UTC, bearophile wrote:
> In Bugzilla probably there are many bug reports/enhancement requests about std.random, so I suggest you to read them. Some of them can be useful, while other are probably already addressed in the current (or planned) std.random2.

Yes, indeed.  Quite a few of them _are_ addressed, I think, but now that I've got the essentials of the design laid out, I should be systematic and go through them.

> Another random one that was just commented by Infiltrator:
> https://d.puremagic.com/issues/show_bug.cgi?id=5901

Well, you already have the NormalDistribution in std.random2.distribution ;-)  I clearly can also implement function-only Box-Muller variant that spends 2 random variates to generate a single normally-distributed value, as this doesn't have the problem of needing to store state or allocate memory, so I will add that at some stage.

I'm reluctant to add a specific fastRandom because I think here the better option is a really nice range-based algorithm that can generate high quality variates at speed (e.g. the Ziggurat algorithm is a good candidate here).  There's a quite good review of different algorithms here:
http://www.cse.cuhk.edu.hk/~phwl/mt/public/archives/papers/grng_acmcs07.pdf

But of course I'm open to arguments here :-)