November 23, 2016
On Wednesday, 23 November 2016 at 00:44:26 UTC, Joseph Rushton Wakeling wrote:
> On Tuesday, 22 November 2016 at 06:31:45 UTC, Ilya Yaroshenko wrote:
>>  - 64-bit Mt19937 is default for 64-bit targets
>
> This means that seemingly identical code will produce different results depending on whether it's compiled for 64-bit or 32-bit.  Is that really worth it, when anyone who cares about the difference between 32-bit vs. 64-bit random words is quite capable of specifying the RNG they want to use and not just relying on the default?
>
> Having a different default RNG would make sense for targets where there are serious performance issues at stake (e.g. minimal memory available for RNG state) but just for the sake of 32- vs. 64-bit Mersenne Twister seems an unnecessary complication.
>
> These days it's debatable whether Mersenne Twister of _any_ word size is the optimal choice for a default RNG, so if the default is going to be changed, it might as well be to something significantly better (rather than worrying about numbers of bits).

Mir Random is going to be a library with saturated uniform FP RNGs and almost saturated exponential FP RNGs. Comparing with all other libraries (any language) the basic uniform FP numbers will be generated in interval (-1, +1) and contains _all_ possible values including all subnormal numbers. 64 bit generators are 2 times faster for this task if you need to generate a 64 bit floating point number. Explanation of technique will be in my post/article. --Ilya
November 23, 2016
On Tuesday, 22 November 2016 at 23:55:01 UTC, Andrei Alexandrescu wrote:
> On 11/22/16 1:31 AM, Ilya Yaroshenko wrote:
>>  - `opCall` API instead of range interface is used (similar to C++)
>
> This seems like a gratuitous departure from common D practice. Random number generators are most naturally modeled in D as infinite ranges. -- Andrei

It is safe low level architecture without performance and API issues. It prevents users to do stupid things implicitly (like copying RNGs). A hight level range interface can be added in the future (it will hold a _pointer_ to an RNG). In additional, when you need to write algorithms or distributions opCall is much more convenient than range API. In additions, users would not use Engine API in 99% cases: they will just want to call `rand` or `uniform`, or other distribution.

I am sure that almost any library should have low level API that is fits to its implementation first. Addition API levels also may be added. Current Phobos evolution is generic degradation: more generic and "universal" code hide more uncovered bugs in the code. The std.range is good example of degradation, it has a lot of API and implementation  bugs.

### Example of API+implementation bug:

#### Bug: RNGs has min and max params (hello C++). But, they are not used when an uniform integer number is generated : `uniform!ulong` / `uniform!ulong(0, 100)`.

#### Solution: In Mir Rundom any RNGs must generate all 8/16/32/64 bits uniformly. It is RNG problem how to do it.

I will not fill this bug as well another dozen std.random bugs because the module should be rewritten anyway and I am working on it. std.random is a collection of bugs from C/C++ libraries extended with D generic idioms. For example, there is no reason in 64 bit Xorshift. It is 32 bit by design. Furthermore, 64 expansion of 32 bit algorithms must be proved theoretically before we allow it for end users. 64 bit analogs are exists, but they have another implementations. Phobos degrades because we add a lot of generic specializations and small utilities without understanding use cases. Phobos really follows stupid idealistic idea: more generic is better, more API is better, more universal algorithms is better. The problems is that Phobos/DRuntime is soup where all (because its "universality") interacts with everything.
November 23, 2016
On Wednesday, 23 November 2016 at 05:58:47 UTC, Ilya Yaroshenko wrote:
> On Tuesday, 22 November 2016 at 23:55:01 UTC, Andrei Alexandrescu wrote:
>> On 11/22/16 1:31 AM, Ilya Yaroshenko wrote:
>>>  - `opCall` API instead of range interface is used (similar to C++)
>>
>> This seems like a gratuitous departure from common D practice. Random number generators are most naturally modeled in D as infinite ranges. -- Andrei
>
> It is safe low level architecture without performance and API issues. It prevents users to do stupid things implicitly (like copying RNGs). A hight level range interface can be added in the future (it will hold a _pointer_ to an RNG). In additional, when you need to write algorithms or distributions opCall is much more convenient than range API. In additions, users would not use Engine API in 99% cases: they will just want to call `rand` or `uniform`, or other distribution.
>
> I am sure that almost any library should have low level API that is fits to its implementation first. Addition API levels also may be added. Current Phobos evolution is generic degradation: more generic and "universal" code hide more uncovered bugs in the code. The std.range is good example of degradation, it has a lot of API and implementation  bugs.
>

EDIT: std.range -> std.random


November 23, 2016
On Wednesday, 23 November 2016 at 05:58:47 UTC, Ilya Yaroshenko wrote:
> ### Example of API+implementation bug:
>
> #### Bug: RNGs has min and max params (hello C++). But, they are not used when an uniform integer number is generated : `uniform!ulong` / `uniform!ulong(0, 100)`.
>
> #### Solution: In Mir Rundom any RNGs must generate all 8/16/32/64 bits uniformly. It is RNG problem how to do it.

Alternative solution: functionality that expects the full unsigned integer word (UIntType) to be filled with random bits, should validate that the min/max values of the generator correspond to UIntType.min and UIntType.max.

That introduces less breaking change, creates less divergence with the C++11 standard, and preserves more flexibility for the future.
November 23, 2016
On Wednesday, 23 November 2016 at 10:27:00 UTC, Joseph Rushton Wakeling wrote:
> On Wednesday, 23 November 2016 at 05:58:47 UTC, Ilya Yaroshenko wrote:
>> ### Example of API+implementation bug:
>>
>> #### Bug: RNGs has min and max params (hello C++). But, they are not used when an uniform integer number is generated : `uniform!ulong` / `uniform!ulong(0, 100)`.
>>
>> #### Solution: In Mir Rundom any RNGs must generate all 8/16/32/64 bits uniformly. It is RNG problem how to do it.
>
> Alternative solution: functionality that expects the full unsigned integer word (UIntType) to be filled with random bits, should validate that the min/max values of the generator correspond to UIntType.min and UIntType.max.
>
> That introduces less breaking change, creates less divergence with the C++11 standard, and preserves more flexibility for the future.

Good point, will add this. --Ilya
November 23, 2016
On Wednesday, 23 November 2016 at 05:26:12 UTC, Ilya Yaroshenko wrote:
> Mir Random is going to be a library with saturated uniform FP RNGs and almost saturated exponential FP RNGs. Comparing with all other libraries (any language) the basic uniform FP numbers will be generated in interval (-1, +1) and contains _all_ possible values including all subnormal numbers. 64 bit generators are 2 times faster for this task if you need to generate a 64 bit floating point number. Explanation of technique will be in my post/article. --Ilya

All of which is fine in its own terms, but why prioritize the speed of the default behaviour over its reliability and reproducibility?

Anyone who cares about that combination of speed and statistical quality will have enough information in the documentation to know what to do.  By contrast producing different results for identical user code depending on whether you're making a 32- or 64-bit build is an unpleasant complication it could be better to avoid.

In any case, given what you say above, shouldn't the choice of 32- vs. 64-bit RNG depend on whether one is using a distribution that generates 32- vs. 64-bit floating-point variates, rather than whether one is building for a 32- or 64-bit target?  In which case it needs to be a user choice, since it's only the user who knows what distribution they're using.
November 23, 2016
On Wednesday, 23 November 2016 at 01:34:23 UTC, Andrei Alexandrescu wrote:
> I'm unclear on what that statistically unsafe default behavior is - my understanding is it has to do with RNGs being inadvertently copied. It would be great to formalize that in a well-explained issue.

I'll see if I can write that up in depth some time soon.  TBH though I think the problem is less about RNGs and more about stuff like RandomSample and RandomCover (and, in future, random distributions that have internal state, like a struct implementing a normal distribution using the Ziggurat algorithm internally).

It's not so difficult to stop RNG state being copied inadvertently, but when you have ranges wrapping ranges wrapping ranges, each containing their own extra state that cannot be copied by value, things get a bit more complicated.
November 23, 2016
On Wednesday, 23 November 2016 at 05:58:47 UTC, Ilya Yaroshenko wrote:
> It is safe low level architecture without performance and API issues. It prevents users to do stupid things implicitly (like copying RNGs). A hight level range interface can be added in the future (it will hold a _pointer_ to an RNG).

Note that if you want to do this, it's convenient to still preserve a separation between popping the RNG state versus getting the current variate.  Otherwise, the range interface will wind up having to separately cache the variate value, which is wasteful.

Something like this:

struct MyRNG
{
    void popFront() { /* update internal state */ }

    UIntType front() @property { return /* whatever part of internal state */; }

    UIntType opCall()
    {
        this.popFront();
        return this.front;
    }
}

(The above is basically just the input range API less the `empty` property, and the `front` and `popFront()` are arguably a lower-level API than `opCall`.)

> In additional, when you need to write algorithms or distributions opCall is much more convenient than range API.

Can you give some examples of what you mean here?

>  In additions, users would not use Engine API in 99% cases: they will just want to call `rand` or `uniform`, or other distribution.

I don't think that's necessarily true, but in any case, we shouldn't restrict the use-cases of the 1% unless we have to.
November 23, 2016
On Wednesday, 23 November 2016 at 10:33:21 UTC, Joseph Rushton Wakeling wrote:
> On Wednesday, 23 November 2016 at 05:26:12 UTC, Ilya Yaroshenko wrote:
>> [...]
>
> All of which is fine in its own terms, but why prioritize the speed of the default behaviour over its reliability and reproducibility?
>
> Anyone who cares about that combination of speed and statistical quality will have enough information in the documentation to know what to do.  By contrast producing different results for identical user code depending on whether you're making a 32- or 64-bit build is an unpleasant complication it could be better to avoid.
>
> In any case, given what you say above, shouldn't the choice of 32- vs. 64-bit RNG depend on whether one is using a distribution that generates 32- vs. 64-bit floating-point variates, rather than whether one is building for a 32- or 64-bit target?  In which case it needs to be a user choice, since it's only the user who knows what distribution they're using.

We have a Random alias. I think it is OK if it generates different numbers for different arch and library versions. If a user want to exact the same behaviour he can use explicit names like Mt19937_32 and Mt19937_64. Both are defined for all architectures.

64-bit has not significant speed degradation on 64-bit machines for 32-bit random number generation. Maybe only few %. So it is better for 64-bit machines.

It is only question of `Random` alias, which can be changed in the future anyway. Both Mt19937_32 and Mt19937_64 are defined.
November 23, 2016
On Wednesday, 23 November 2016 at 11:03:33 UTC, Ilya Yaroshenko wrote:
> It is only question of `Random` alias, which can be changed in the future anyway. Both Mt19937_32 and Mt19937_64 are defined.

I think we're at an impasse in terms of priorities, because that's exactly the reason that I think you should leave the Random alias pointing to the same generator, and let the people with speed/quality concerns select the alternative generator ;-)