Jump to page: 1 25  
Page
Thread overview
1st draft of complete class-based std.random successor
Mar 19, 2014
Rikki Cattermole
Mar 20, 2014
Rikki Cattermole
Mar 20, 2014
bearophile
Mar 20, 2014
bearophile
Mar 20, 2014
bearophile
Mar 21, 2014
bearophile
Mar 22, 2014
bearophile
Mar 22, 2014
bearophile
Mar 22, 2014
bearophile
Mar 23, 2014
bearophile
Mar 23, 2014
bearophile
Mar 23, 2014
Philippe Sigaud
Mar 20, 2014
monarch_dodra
Mar 20, 2014
bearophile
Mar 25, 2014
bearophile
Mar 20, 2014
Chris Williams
Mar 20, 2014
monarch_dodra
Mar 20, 2014
Chris Williams
Mar 20, 2014
Chris Williams
Mar 20, 2014
ponce
Mar 21, 2014
ponce
Mar 21, 2014
Andrea Fontana
Mar 20, 2014
Andrea Fontana
Mar 20, 2014
monarch_dodra
March 19, 2014
Hello all,

As some of you may already know, monarch_dodra and I have spent quite a lot of time over the last year discussing the state of std.random.  To cut a long story short, there are significant problems that arise because the current RNGs are value types rather than reference types.  We had quite a lot of back and forth on different design ideas, with a lot of helpful input from others in the community, but at the end of the day there are really only two broad approaches: create structs that implement reference semantics internally, or use classes.  So, as an exercise, I decided to create a class-based std.random.

The preliminary (but comprehensive) results of this are now available here:
https://github.com/WebDrake/std.random2

Besides re-implementing random number generators as classes rather than structs, the new code splits std.random2 into a package of several different modules:

   * std.random2.generator, pseudo-random number generators;

   * std.random2.device, non-deterministic random sources;

   * std.random2.distribution, random distributions such as uniform,
     normal, etc.;

   * std.random2.adaptor, random "adaptors" such as randomShuffle,
     randomSample, etc.

   * std.random2.traits, RNG-specific traits such as isUniformRNG
     and isSeedable.

A package.d file groups them together so one can still import all together via "import std.random2".  I've also taken the liberty of following the new guideline to place import statements as locally as possible; it was striking how easy and clean this made things, and it should be easy to port that particular change back to std.random.

The new package implements all of the functions, templates and range objects from std.random except for the old std.random.uniformDistribution, whose name I have cannibalized for better purposes.  Some have been updated: the MersenneTwisterEngine has been tweaked to match the corresponding code from Boost.Random, and this in turn has allowed the definition of a 64-bit Mersenne Twister (Mt19937_64) and an alternative 32-bit one (Mt11213b).

There are also a number of entirely new entries.  std.random2.distribution contains not just existing functions such as dice and uniform, but also range-based random distribution classes UniformDistribution, NormalDistribution and DiscreteDistribution; the last of these is effectively a range-based version of dice, and is based on Chris Cain's excellent work here: https://github.com/D-Programming-Language/phobos/pull/1702

The principal weak point in terms of functionality is std.random2.device, where the implemented random devices (based on Posix' /std/random and /std/urandom) are really very primitive and just there to illustrate the principle.  However, since their API is pretty simple (they're just input ranges with min and max defined) there should be plenty of opportunity to improve and extend the internals in future.  Advice and patches are welcome for everything, but particularly here :-)

What's become quite apparent in the course of writing this package is how much more natural it is for ranges implementing randomness to be class objects.  The basic fact that another range can store a copy of an RNG internally without creating a copy-by-value is merely the start: for example, in the case of the class implementation of RandomSample, we no longer need to have complications like,

    @property auto ref front()
    {
        assert(!empty);
        // The first sample point must be determined here to avoid
        // having it always correspond to the first element of the
        // input.  The rest of the sample points are determined each
        // time we call popFront().
        if (_skip == Skip.None)
        {
            initializeFront();
        }
        return _input.front;
    }

that were necessary to avoid bugs like https://d.puremagic.com/issues/show_bug.cgi?id=7936; because the class-based implementation copies by reference, we can just initialize everything in the constructor.  Similarly, issues like https://d.puremagic.com/issues/show_bug.cgi?id=7067 and https://d.puremagic.com/issues/show_bug.cgi?id=8247 just vanish.

Obvious caveats about the approach include the fact that classes need to be new'd, and questions over whether allocation on the heap might create speed issues.  The benchmarks I've run (code available in the repo) seem to suggest that at least the latter is not a worry, but these are obviously things that need to be considered.  My own feeling is that ultimately it is a responsibility of the language to offer nice ways to allocate classes without necessarily relying on new or the GC.

A few remarks on design and other factors:

   * The new range objects have been implemented as final classes for
     speed purposes.  However, I tried another approach where the RNG
     class templates were abstract classes, and the individual
     parameterizations were final-class subclasses of those rather
     than aliases.  This was noticeably slower.  My OO-fu is not really
     sufficient to explain this, so if anybody can offer a reason, I'd
     be happy to learn it.

   * A design question I considered but have not yet pursued: since at
     least two functions require passing the RNG as the first parameter
     (dice and discreteDistribution), perhaps this should be made a
     general design pattern for everything?  It would make it harder to
     adapt code using the existing std.random but would create a useful
     uniformity.

   * I would have liked to ensure that every random distribution had
     both a range- and function-based version.  However, I came to the
     conclusion that solely function-based versions should be avoided
     if either (i) the function would need to maintain internal state
     between calls, or (ii) the function would need to allocate memory
     per call.  The first is why for example NormalDistribution exists
     only as a class/range.  The second might in principle raise some
     objections to dice, but as dice seems to be a reasonably standard
     function, I kept it in.

   * It might be good to implement helper functions for the individual
     RNGs (e.g. just as RandomSample has a randomSample helper function
     to deliver instances, so Mt19937 could have a corresponding
     mt19937 helper function returning Mt19937 instances seeded in line
     with helper function parameters).

   * Those with long memories may recall that when I originally wrote
     up my NormalDistribution code, it was written to allow various
     "normal engines" to be plugged in; mine was Box-Muller, but jerro
     also contributed a Ziggurat-based engine.  This could still be
     provided here, although my own inclination is that it's probably
     best for Phobos to provide one single good-for-general-purpose-use
     implementation.

Known issues:

   * While every bugfix I've made in the course of implementing this
     package has been propagated back to std.random where possible,
     this package is missing some of the more recent improvements to
     std.random by other people (e.g. I think it's missing Chris Cain's
     update to integer-based uniform()).

   * The unittest coverage is overall pretty damn good, but there are
     weak spots in std.random.distribution and std.random2.device.
     Some of the "unittests" in these cases are no more than basic
     developer sanity checks that print results to console, and need
     to be replaced by well-defined, silent-unless-failed alternatives.

   * Some of the .save functions are implemented with the help of rather
     odd private constructors; it would probably be much better to redo
     these in terms of public this(typeof(this)) constructors.

   * The random devices _really_ need to be better.  Consider the current
     versions as placeholders ... :-)

Finally, a note on authorship: since this is still based very substantially on std.random, I've made an effort to check git logs and ensure that authors and copyright records (and dates of contribution) are correct.  My general principle here has been that listed authors should only include those who've made a substantial contribution (i.e. whole functions, large numbers of unittests, ...), not just various 1-line tweaks.  But if anyone has any objection to any of the names, dates or other credits given, or if anybody would like their name removed (!), just let me know.

I owe a great debt of gratitude to many people here on the forums, and monarch_dodra in particular, for a huge amount of useful discussion, advice and feedback that has made its way into the current code.  Thank you all for your time, thoughts, ideas and patience.

Anyway, please feel free to review, destroy and otherwise do fun stuff with this module.  I hope that some of you will find it immediately useful, but please note that feedback and advice may result in breaking changes -- this is intended to wind up in Phobos, so it really needs to be perfect when it does so.  Let's review it really well and make it happen!

Thanks and best wishes,

    -- Joe
March 19, 2014
Out of interest but, shouldn't in the device module have a static assert(0, "Not implemented yet") type of deal with the version(Posix) block?
March 20, 2014
On Wednesday, 19 March 2014 at 23:58:36 UTC, Rikki Cattermole wrote:
> Out of interest but, shouldn't in the device module have a static assert(0, "Not implemented yet") type of deal with the version(Posix) block?

Not really.  There's still usable functionality in there for all architectures (although I'm not sure how practically useful).
March 20, 2014
Joseph Rushton Wakeling:

Few first comments:

>    * std.random2.adaptor, random "adaptors" such as randomShuffle,
>      randomSample, etc.

Please don't use stuttering names like "std.random2.randomShuffle". "std.random2.shuffle" is enough.


> My own feeling is that ultimately it is a responsibility of the language to offer nice ways to allocate classes without necessarily relying on new or the GC.

I don't think the language is yet there. So I think currently this is not a good idea.

Do you have a simple but very fast function that generates uniforms in [0.0, 1.0]? :-)

Bye,
bearophile
March 20, 2014
On Thursday, 20 March 2014 at 00:05:20 UTC, Joseph Rushton Wakeling wrote:
> Not really.  There's still usable functionality in there for all architectures (although I'm not sure how practically useful).

Just to expand on that remark: my impression is that individual random devices are inevitably going to be architecture-dependent.  /dev/random and /dev/urandom are Posix devices; Windows AFAIK has its own alternative.  So the broad idea is that you'd have as much generic functionality as possible available to all architectures (mostly related to what sources you read from; a file, a socket, something else?), and then individual architecture-dependent aliases would map this to particular random sources available to them.

Then, finally, you'd have some default alias RandomDevice that would point to an appropriate architectural default; so e.g.

    version (Posix)
    {
        alias RandomDevice = DevURandom!uint;
    }
    else version (Windows)
    {
        alias RandomDevice = ...
    }
    // etc.

... so, unless you were quite specific about your requirements, 90% of the time you could just use RandomDevice and expect it to Just Work whatever your platform.

But as random devices are not my strongest area of expertise, I'll happily take advice here.
March 20, 2014
On Thursday, 20 March 2014 at 00:09:51 UTC, bearophile wrote:
> Do you have a simple but very fast function that generates uniforms in [0.0, 1.0]? :-)

No, but it's planned.  Jerro wrote quite a nice one in the course of his work on the Ziggurat algorithm, and I'm sure he'd be happy for me to adapt it accordingly.
March 20, 2014
On Thursday, 20 March 2014 at 00:09:51 UTC, bearophile wrote:
> Please don't use stuttering names like "std.random2.randomShuffle". "std.random2.shuffle" is enough.

I don't object to rewriting the names if there's a valid case for it, but it does seem to me to be important to try and match as much as possible the names that are already out there in std.random.  The idea is to minimize the amount of rewriting anyone will have to do to adapt their code, and as far as I can tell where the contents of std.random2.adaptor are concerned (randomShuffle, randomCover, randomSample) it should require no rewriting at all.

Besides, while std.random2.adaptor.randomShuffle may be the fully-qualified name, in practice, no one will write all that out, so the redundancy is less bad; and in any case, as any magician will tell you, a shuffle is not necessarily random ;-)

> I don't think the language is yet there. So I think currently this is not a good idea.

If the aim were to overwrite std.random, I would agree with you, but there is no need to do that.  It's named std.random2 for a reason :-)

However, I do think that merging it into Phobos (assuming all other factors are OK) may have to be conditional on improvements in the available allocation strategies.
March 20, 2014
Joseph Rushton Wakeling:

> No, but it's planned.  Jerro wrote quite a nice one in the course of his work on the Ziggurat algorithm, and I'm sure he'd be happy for me to adapt it accordingly.

Note: I meant a simple but very fast function that generates just one value in [0.0, 1.0] (not a range).


> I don't object to rewriting the names if there's a valid case for it, but it does seem to me to be important to try and match as much as possible the names that are already out there in std.random.

It's the best chance to improve naming, so do not throw it away for nothing:
https://d.puremagic.com/issues/show_bug.cgi?id=9106


> The idea is to minimize the amount of rewriting anyone will have to do to adapt their code,

If you want you can keep a deprecated randomShuffle alias name for some time in std.random2.


> Besides, while std.random2.adaptor.randomShuffle may be the fully-qualified name, in practice, no one will write all that out, so the redundancy is less bad;

I agree. But better to improve names when you have a (the only) chance.


> However, I do think that merging it into Phobos (assuming all other factors are OK) may have to be conditional on improvements in the available allocation strategies.

We will probably have the nice Andrei's allocators in Phobos, but not in a short time. So I suggest to not rely on them for std.random2.

Bye,
bearophile
March 20, 2014
On Thursday, 20 March 2014 at 00:39:43 UTC, bearophile wrote:
> Note: I meant a simple but very fast function that generates just one value in [0.0, 1.0] (not a range).

There will be both. :-)

Off the top of my head I'm not sure whether the interval will be [0.0, 1.0], [0.0, 1.0) or whether it might be possible to make it work with arbitrary boundaries.  If I recall right, most uniform01 implementations are [0.0, 1.0) ... ?

> It's the best chance to improve naming, so do not throw it away for nothing:
> https://d.puremagic.com/issues/show_bug.cgi?id=9106
>
> If you want you can keep a deprecated randomShuffle alias name for some time in std.random2.

Yes, in that case, I'd be happy to make the change (and maintain the old names via aliases).  Thanks for pointing me to the bug report; I'd forgotten that this was an open issue :-)

> We will probably have the nice Andrei's allocators in Phobos, but not in a short time. So I suggest to not rely on them for std.random2.

No, I don't intend to rely on them arriving soon.  But while of course a random3 is always possible too, I'd rather not be faced with the situation of needing breaking changes to handle support for alternative allocation strategies.  So if necessary, I'd rather maintain std.random2 outside of Phobos for a while and get things right when it finally lands, than push it in too early and need to make breaking changes.
March 20, 2014
On Thursday, 20 March 2014 at 00:15:22 UTC, Joseph Rushton Wakeling wrote:
> On Thursday, 20 March 2014 at 00:05:20 UTC, Joseph Rushton Wakeling wrote:
>> Not really.  There's still usable functionality in there for all architectures (although I'm not sure how practically useful).
>
> Just to expand on that remark: my impression is that individual random devices are inevitably going to be architecture-dependent.
>  /dev/random and /dev/urandom are Posix devices; Windows AFAIK has its own alternative.  So the broad idea is that you'd have as much generic functionality as possible available to all architectures (mostly related to what sources you read from; a file, a socket, something else?), and then individual architecture-dependent aliases would map this to particular random sources available to them.
>
> Then, finally, you'd have some default alias RandomDevice that would point to an appropriate architectural default; so e.g.
>
>     version (Posix)
>     {
>         alias RandomDevice = DevURandom!uint;
>     }
>     else version (Windows)
>     {
>         alias RandomDevice = ...
>     }
>     // etc.
>
> ... so, unless you were quite specific about your requirements, 90% of the time you could just use RandomDevice and expect it to Just Work whatever your platform.
>
> But as random devices are not my strongest area of expertise, I'll happily take advice here.

For version blocks of code, I try to make sure they implement the same interfaces like this one. To limit the possible issues.
It just makes things a little more cleaner for API users.

In the case that this isn't production ready the static assert can be used as a TODO type of thing. Forcibly telling you it isn't done yet. Its better than silently going into production and finding out that a main platform isn't ready.

But this is just my preference.
« First   ‹ Prev
1 2 3 4 5