June 10, 2014
On Tuesday, 10 June 2014 at 10:37:17 UTC, Kagamin wrote:
> Pass it by reference, I see no reason why MT can't be pure.

For what it's worth, the Mersenne Twister in hap.random is already weakly pure (.front and .popFront are both pure methods).
June 10, 2014
On Tuesday, 10 June 2014 at 11:32:54 UTC, bearophile wrote:
> So can you can generate random values in strongly pure functions with this? You can allocate the RNG class inside the function... If that's right, then is this simple strongly pure random generator worth adding to std.random2?

Forgive me if I'm missing something obvious, but as it stands I don't see how the R250/521 algorithm you pointed me to can be strongly pure.  As it's defined in the link you pointed me to, it's accessing (and updating) global mutable state.

It would surely be possible to define it to take as input constant buffers, and to return constant buffers, which ought to allow purity -- but wouldn't that be a memory allocation nightmare?

Can you clarify what you're thinking of here it terms of D's strong purity?
June 10, 2014
Joseph Rushton Wakeling:

> Forgive me if I'm missing something obvious, but as it stands I don't see how the R250/521 algorithm you pointed me to can be strongly pure.

Sorry, the R250/521 idea and the strongly pure idea are unrelated to each other.


> but wouldn't that be a memory allocation nightmare?

For the strongly pure random generator we should choose a generator with a small internal state (let's say less than 5 CPU words, they get passed by immutable value).

Bye,
bearophile
June 10, 2014
On Tuesday, 10 June 2014 at 21:02:54 UTC, bearophile wrote:
> Sorry, the R250/521 idea and the strongly pure idea are unrelated to each other.

Ah, good.  That makes things simpler.  I'll implement R250/521 for you, though.

> For the strongly pure random generator we should choose a generator with a small internal state (let's say less than 5 CPU words, they get passed by immutable value).

We might be able to rework the Xorshift generators in this way -- they all rely on a very small internal state.  It'd be interesting to see if this has any speed implications.
June 10, 2014
Hey again Joe,

I had an opportunity to give the entire code a good once over read and I have a few comments.

1. Biggest thing about the new hap.random is how much nicer it is to actually READ. The first few times I went through the current std.random, I remember basically running out of breath. hap.random was almost a refreshing read, in contrast. I'm guessing it has a lot to do with breaking it down into smaller, more manageable pieces. Regardless, good work on that. I suspect it'll make it easier to contribute to in the future.
2. Something I'd really like to see is for the seed-by-range functions to take the range by reference instead of by value to ensure that the seed values used are less likely to be used in another RNG inadvertently later. Basically, I envision a similar problem with seedRanges as we currently have with RNGs where we have to make sure people are careful with what they do with the ranges in the end. This should cover use cases where users do things like `blah.seed(myEntropyRange.take(3))` as well, so that might take some investigation to figure out how realistic it would be to support.
3. I'd also REALLY like to see seed support ranges/values giving ANY type of integer and guarantee that few bytes are wasted (so, if it supplies 64-bit ints and the generator's internal state array only accepts 32-bit ints, it should spread the 64-bit int across two cells in the array). I have working code in another language that does this, and I wouldn't mind porting it to D for the standard library. I think this would greatly simplify the seeding process in user code (since they wouldn't have to care what the internal representation of the Random state is, then).
4. I'd just like to say the idea of using ranges for seeds gets me giddy because I could totally see a range that queries https://random.org for true random bits to seed with, wrapped by a range that zeroes out the memory on popFront. Convenient and safe (possibly? Needs review before I get excited, obviously) for crypto purposes!
5. Another possible improvement would be something akin to a "remix" function. It should work identically to reseeding, but instead of setting the internal state to match the seed (as I see in https://github.com/WebDrake/hap/blob/master/source/hap/random/generator.d#L485), remixing should probably be XOR'd into the current state. That way if you have a state based on some real entropy, you can slowly, over time, drip in more entropy into the state.
6. I'd like to see about supporting xorshift1024 as well (described here: http://xorshift.di.unimi.it/ and it's public domain code, so very convenient to port ... I'd do it too, of course, if that seems like an okay idea). This is a really small thing because xorshift1024 isn't really much better than xorshift128 (but some people might like the idea of it having significantly longer period).

> Why not write to the paper's author and ask about it?

Done :) ... if I get a response, I'll make sure to incorporate everything said.
June 10, 2014
Joseph Rushton Wakeling:

> I'll implement R250/521 for you, though.

Please stop, I am not worth that, and I don't even know how much good that generator is. So for you it's better to focus on more important matters of the new random module. Extra generators can be added later if needed.


>It'd be interesting to see if this has any speed implications.

Passing several cpu words by value for each generated value seems not very efficient. But this generator is for special situations, so a certain performance loss could be acceptable. And if the compiler is able to inline the functions, the data transfer overhead is removed, and most of the performance loss is restored (but I don't know if non-templated Phobos functions get inlined).

Bye,
bearophile
June 11, 2014
On Tuesday, 10 June 2014 at 23:08:33 UTC, Chris Cain wrote:
> I had an opportunity to give the entire code a good once over read and I have a few comments.

Thanks! :-)

> 1. Biggest thing about the new hap.random is how much nicer it is to actually READ. The first few times I went through the current std.random, I remember basically running out of breath. hap.random was almost a refreshing read, in contrast. I'm guessing it has a lot to do with breaking it down into smaller, more manageable pieces. Regardless, good work on that. I suspect it'll make it easier to contribute to in the future.

That's great to hear, as it was a design goal.  I think there will probably at some point be a need to separate things further (e.g. std.random.generator will probably have to be separated as will std.random.distribution) but always keeping the principle of implementing packages to make it possible to just "import hap.random" (or "import hap.random.generator", or whatever).

> 2. Something I'd really like to see is for the seed-by-range functions to take the range by reference instead of by value to ensure that the seed values used are less likely to be used in another RNG inadvertently later. Basically, I envision a similar problem with seedRanges as we currently have with RNGs where we have to make sure people are careful with what they do with the ranges in the end. This should cover use cases where users do things like `blah.seed(myEntropyRange.take(3))` as well, so that might take some investigation to figure out how realistic it would be to support.

Yea, that's an interesting point.  I mean, you'd hope that myEntropyRange would be a reference type anyway, but every little helps :-)

> 3. I'd also REALLY like to see seed support ranges/values giving ANY type of integer and guarantee that few bytes are wasted (so, if it supplies 64-bit ints and the generator's internal state array only accepts 32-bit ints, it should spread the 64-bit int across two cells in the array). I have working code in another language that does this, and I wouldn't mind porting it to D for the standard library. I think this would greatly simplify the seeding process in user code (since they wouldn't have to care what the internal representation of the Random state is, then).

That would be very cool.  Can you point me at your code examples?

> 4. I'd just like to say the idea of using ranges for seeds gets me giddy because I could totally see a range that queries https://random.org for true random bits to seed with, wrapped by a range that zeroes out the memory on popFront. Convenient and safe (possibly? Needs review before I get excited, obviously) for crypto purposes!

The paranoiac in me feels that anything that involves getting random data via HTTPS is probably insecure crypto-wise :-)  However, I think sourcing random.org is a perfect case for an entry in hap.random.device.  I think the best thing to do would probably be to offer a RandomOrgClient (which offers a very thin API around the random.org HTTP API) and then to wrap that in a range type that uses the client internally to generate random numbers with particular properties.

> 5. Another possible improvement would be something akin to a "remix" function. It should work identically to reseeding, but instead of setting the internal state to match the seed (as I see in https://github.com/WebDrake/hap/blob/master/source/hap/random/generator.d#L485), remixing should probably be XOR'd into the current state. That way if you have a state based on some real entropy, you can slowly, over time, drip in more entropy into the state.

Also a very interesting suggestion.  Is there a standard name for this kind of procedure?

> 6. I'd like to see about supporting xorshift1024 as well (described here: http://xorshift.di.unimi.it/ and it's public domain code, so very convenient to port ... I'd do it too, of course, if that seems like an okay idea). This is a really small thing because xorshift1024 isn't really much better than xorshift128 (but some people might like the idea of it having significantly longer period).

Fantastic, I will see about implementing those.  I wasn't previously aware of that work, but I _was_ aware that the standard Xorshift generators have some statistical flaws, so it's great to have some alternatives.  It should be straightforward to implement things like XorshiftP128 or XorshiftS1024 and XorshiftS4096 (using P and S in place of + and *).

With these in place we might even be able to deprecate the old Xorshift generators.

Just for clarity, here's how I see things rolling out for the future:

  * First goal is to ensure the existing codebase "plays nice" with
    people's programs and that it works OK with dub, rdmd, etc. and
    doesn't have any serious architectural or other bugs.  The 1.0.0
    release will not have any new functionality compared to what is
    in place now.

  * Once it seems to be reasonably stable then work can begin on a
    1.x release series that brings in successive pieces of new
    functionality.

> Done :) ... if I get a response, I'll make sure to incorporate everything said.

Great, let me know how that goes. :-)
June 11, 2014
On Tuesday, 10 June 2014 at 23:48:09 UTC, bearophile wrote:
> Please stop, I am not worth that, and I don't even know how much good that generator is. So for you it's better to focus on more important matters of the new random module. Extra generators can be added later if needed.

After all the advice and help you've given me (and the rest of this community) over the course of years, it's really a pleasure to be able to offer you a small favour like this.  But of course it could be fun to first run things through e.g. the TestU01 suite ...

> Passing several cpu words by value for each generated value seems not very efficient. But this generator is for special situations, so a certain performance loss could be acceptable. And if the compiler is able to inline the functions, the data transfer overhead is removed, and most of the performance loss is restored (but I don't know if non-templated Phobos functions get inlined).

Well, I think it's worth experimenting with.  For clarity, I wasn't suggesting modifying the existing Xorshift code, but creating a separate implementation in strongly pure style, and seeing how that differs performance-wise from what already exists.

I guess I might also consider finally getting my head round monads, and relating that to RNG design ... :-)
June 11, 2014
On Monday, 9 June 2014 at 18:09:21 UTC, Joseph Rushton Wakeling wrote:
> Hello all,

Incidentally, would it be a good idea to post a link to the blog post on r/programming?  Haven't done so yet, as generally I prefer to leave decisions about D publicity to others, but can do so if people would like.
June 11, 2014
On 6/10/2014 7:08 PM, Chris Cain wrote:
>
> 3. I'd also REALLY like to see seed support ranges/values giving ANY
> type of integer and guarantee that few bytes are wasted (so, if it
> supplies 64-bit ints and the generator's internal state array only
> accepts 32-bit ints, it should spread the 64-bit int across two cells in
> the array). I have working code in another language that does this, and
> I wouldn't mind porting it to D for the standard library. I think this
> would greatly simplify the seeding process in user code (since they
> wouldn't have to care what the internal representation of the Random
> state is, then).

Joseph and I have recently had some discussion on the idea of random streams which could work as you describe (The full discussion was in the digitalmars.D thread titled "isUniformRNG"). A finalized design would be dependent on Phobos's redesign of streams. But an unofficial design does exist, as it was needed for a crypto RNG I wrote[1][2]. An "RNG -> RNG stream" adapter could easily be written.

[1] Original version:
https://github.com/Abscissa/DAuth/blob/master/src/dauth/hashdrbg.d

[2] Phobos submission:
https://github.com/D-Programming-Language/phobos/pull/2208

> 4. I'd just like to say the idea of using ranges for seeds gets me giddy
> because I could totally see a range that queries https://random.org for
> true random bits to seed with, wrapped by a range that zeroes out the
> memory on popFront. Convenient and safe (possibly? Needs review before I
> get excited, obviously) for crypto purposes!

Personally, I wouldn't trust an internet-hosted RNG for crypto purposes as there's too many ways it could go wrong on either end. However, *mixing* it in as an additional source of entropy (together with a local source of non-determinism and a proper crypto-grade PRNG such as Hash_DRBG) sounds promising to me. Although I'm not a cryptography expert.

> 5. Another possible improvement would be something akin to a "remix"
> function. It should work identically to reseeding, but instead of
> setting the internal state to match the seed (as I see in
> https://github.com/WebDrake/hap/blob/master/source/hap/random/generator.d#L485),
> remixing should probably be XOR'd into the current state. That way if
> you have a state based on some real entropy, you can slowly, over time,
> drip in more entropy into the state.

Interesting that you mention that. Hash_DRBG does pretty much that (although it's a little more complicated than an simple XOR). While I'm not particularly familiar with any others, I'd imagine that's probably a typical behavior among cryptographic PRNGs in general.