Mir Random [WIP] (page 3) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Mir Random [WIP] (page 3)

November 23, 2016

Re: Mir Random [WIP]

Posted by Ilya Yaroshenko
in reply to Joseph Rushton Wakeling

Ilya Yaroshenko

Posted in reply to Joseph Rushton Wakeling

On Wednesday, 23 November 2016 at 10:57:04 UTC, Joseph Rushton Wakeling wrote:
> On Wednesday, 23 November 2016 at 05:58:47 UTC, Ilya Yaroshenko wrote:
>> It is safe low level architecture without performance and API issues. It prevents users to do stupid things implicitly (like copying RNGs). A hight level range interface can be added in the future (it will hold a _pointer_ to an RNG).
>
> Note that if you want to do this, it's convenient to still preserve a separation between popping the RNG state versus getting the current variate.  Otherwise, the range interface will wind up having to separately cache the variate value, which is wasteful.
>
> Something like this:
>
> struct MyRNG
> {
>     void popFront() { /* update internal state */ }
>
>     UIntType front() @property { return /* whatever part of internal state */; }
>
>     UIntType opCall()
>     {
>         this.popFront();
>         return this.front;
>     }
> }
>
> (The above is basically just the input range API less the `empty` property, and the `front` and `popFront()` are arguably a lower-level API than `opCall`.)
>
>> In additional, when you need to write algorithms or distributions opCall is much more convenient than range API.
>
> Can you give some examples of what you mean here?

For example, Mir Random basic utilities (more low level then distributions):
https://github.com/libmir/mir-random/blob/master/source/random/package.d

Also you can explore std.random code. opCall would almost always more convenient to use.

>>  In additions, users would not use Engine API in 99% cases: they will just want to call `rand` or `uniform`, or other distribution.
>
> I don't think that's necessarily true, but in any case, we shouldn't restrict the use-cases of the 1% unless we have to.

We don't restrict. It is 1 minute to write an Range wrapper. This wrapper can be added to the library, if we found a real world use case. Again, I have not seen a real world algorithm, which looks better with Range API for generators. RandomSample can be created without Range API, and it would look more convenient then it is now. I think that Range API is bad and useless overengineering for RNGs.

November 23, 2016

Re: Mir Random [WIP]

Posted by Joseph Rushton Wakeling
in reply to Ilya Yaroshenko

Joseph Rushton Wakeling

Posted in reply to Ilya Yaroshenko

On Wednesday, 23 November 2016 at 11:14:38 UTC, Ilya Yaroshenko wrote:
> For example, Mir Random basic utilities (more low level then distributions):
> https://github.com/libmir/mir-random/blob/master/source/random/package.d
>
> Also you can explore std.random code. opCall would almost always more convenient to use.

Yes, but as described, `opCall` can easily be created as a composition of `popFront` and `front` for these convenience purposes.

> We don't restrict. It is 1 minute to write an Range wrapper.

If all you have is `opCall` then the range wrapper is less efficient than it need be.

> This wrapper can be added to the library, if we found a real world use case. Again, I have not seen a real world algorithm, which looks better with Range API for generators. RandomSample can be created without Range API, and it would look more convenient then it is now. I think that Range API is bad and useless overengineering for RNGs.

Yes, most uses of RNGs in std.random involve calling `front` and then `popFront()` (although it would probably be better the other way round).  But it's readily possible to imagine range-based use-cases for random distributions along the lines of,

    myRNG.normalDistribution(0.0, 5.0).filter!(a => a > 0).somethingElse.take(20);

But what I'd say more broadly is that of what I've seen so far, mir.random is conflating too many breaking changes without giving thought for their impact (for example, converting the `isUniformRNG` check to rely on a UDA is IMO problematic; I can file a GitHub issue explaining the reasons for this).  Allowing for the wider goals of the exercise, it could be worth giving some thought to which of these breakages is really needed to support your use-cases, and which can be avoided.

November 23, 2016

Re: Mir Random [WIP]

Posted by Ilya Yaroshenko
in reply to Joseph Rushton Wakeling

Ilya Yaroshenko

Posted in reply to Joseph Rushton Wakeling

On Wednesday, 23 November 2016 at 11:44:44 UTC, Joseph Rushton Wakeling wrote:
> On Wednesday, 23 November 2016 at 11:14:38 UTC, Ilya Yaroshenko wrote:
>> For example, Mir Random basic utilities (more low level then distributions):
>> https://github.com/libmir/mir-random/blob/master/source/random/package.d
>>
>> Also you can explore std.random code. opCall would almost always more convenient to use.
>
> Yes, but as described, `opCall` can easily be created as a composition of `popFront` and `front` for these convenience purposes.
>
>> We don't restrict. It is 1 minute to write an Range wrapper.
>
> If all you have is `opCall` then the range wrapper is less efficient than it need be.

Inlining will solve this problem with data duplication (if any) for most real world cases.

>
>> This wrapper can be added to the library, if we found a real world use case. Again, I have not seen a real world algorithm, which looks better with Range API for generators. RandomSample can be created without Range API, and it would look more convenient then it is now. I think that Range API is bad and useless overengineering for RNGs.
>
> Yes, most uses of RNGs in std.random involve calling `front` and then `popFront()` (although it would probably be better the other way round).  But it's readily possible to imagine range-based use-cases for random distributions along the lines of,
>
>     myRNG.normalDistribution(0.0, 5.0).filter!(a => a > 0).somethingElse.take(20);
>
> But what I'd say more broadly is that of what I've seen so far, mir.random is conflating too many breaking changes without giving thought for their impact (for example, converting the `isUniformRNG` check to rely on a UDA is IMO problematic; I can file a GitHub issue explaining the reasons for this).  Allowing for the wider goals of the exercise, it could be worth giving some thought to which of these breakages is really needed to support your use-cases, and which can be avoided.

Yes, please file a GitHub issue.

November 23, 2016

Re: Mir Random [WIP]

Posted by Andrea Fontana
in reply to Ilya Yaroshenko

Andrea Fontana

Posted in reply to Ilya Yaroshenko

On Tuesday, 22 November 2016 at 06:31:45 UTC, Ilya Yaroshenko wrote:
>  - 64-bit Mt19937 initialization is fixed
>  - 64-bit Mt19937 is default for 64-bit targets

I wonder why Mersenne Twister is *always* choosen over other algorithms.
My vote goes for CMWC, of course.

Andrea

November 23, 2016

Re: Mir Random [WIP]

Posted by Ilya Yaroshenko
in reply to Joseph Rushton Wakeling

Ilya Yaroshenko

Posted in reply to Joseph Rushton Wakeling

On Wednesday, 23 November 2016 at 11:44:44 UTC, Joseph Rushton Wakeling wrote:
> Yes, most uses of RNGs in std.random involve calling `front` and then `popFront()` (although it would probably be better the other way round).  But it's readily possible to imagine range-based use-cases for random distributions along the lines of,
>
>     myRNG.normalDistribution(0.0, 5.0).filter!(a => a > 0).somethingElse.take(20);
>
> But what I'd say more broadly is that of what I've seen so far, mir.random is conflating too many breaking changes without giving thought for their impact (for example, converting the `isUniformRNG` check to rely on a UDA is IMO problematic; I can file a GitHub issue explaining the reasons for this).  Allowing for the wider goals of the exercise, it could be worth giving some thought to which of these breakages is really needed to support your use-cases, and which can be avoided.

Added RandomRangeAdaptor for URBGs:
https://github.com/libmir/mir-random/blob/master/source/random/algorithm.d

November 23, 2016

Re: Mir Random [WIP]

Posted by Joseph Rushton Wakeling
in reply to Ilya Yaroshenko

Joseph Rushton Wakeling

Posted in reply to Ilya Yaroshenko

On Wednesday, 23 November 2016 at 13:03:04 UTC, Ilya Yaroshenko wrote:
> Added RandomRangeAdaptor for URBGs:
> https://github.com/libmir/mir-random/blob/master/source/random/algorithm.d

This has exactly the problem I identified above, though: you're unnecessarily cacheing the latest variate rather than just using the RNG state directly.  Not the biggest deal in the world, but avoidable if you allow a separation between updating RNG state and accessing it.

November 23, 2016

Re: Mir Random [WIP]

Posted by Joseph Rushton Wakeling
in reply to Andrea Fontana

Joseph Rushton Wakeling

Posted in reply to Andrea Fontana

On Wednesday, 23 November 2016 at 13:01:22 UTC, Andrea Fontana wrote:
> I wonder why Mersenne Twister is *always* choosen over other algorithms.

The weight of history, I suspect.  Mersenne Twister was the major new high-quality RNG back when people started getting really concerned about having good defaults, and when the Diehard Tests were the state of the art in tests of randomness.  IIRC there's also a benefit in terms of dimensionality, which some more recent generators don't address, which can make it a safer "all-round default".

Agree that there are probably better choices for today, though.

November 23, 2016

Re: Mir Random [WIP]

Posted by Ilya Yaroshenko
in reply to Joseph Rushton Wakeling

Ilya Yaroshenko

Posted in reply to Joseph Rushton Wakeling

On Wednesday, 23 November 2016 at 13:26:41 UTC, Joseph Rushton Wakeling wrote:
> On Wednesday, 23 November 2016 at 13:03:04 UTC, Ilya Yaroshenko wrote:
>> Added RandomRangeAdaptor for URBGs:
>> https://github.com/libmir/mir-random/blob/master/source/random/algorithm.d
>
> This has exactly the problem I identified above, though: you're unnecessarily cacheing the latest variate rather than just using the RNG state directly.  Not the biggest deal in the world, but avoidable if you allow a separation between updating RNG state and accessing it.

1. Current default RNG (Mt19937) has not state for the latest value.
2. The structure is allocated on stack and compilers can recognise loop patterns and eliminate addition memory movements for many cases.

November 23, 2016

Re: Mir Random [WIP]

Posted by Ilya Yaroshenko
in reply to Andrea Fontana

Ilya Yaroshenko

Posted in reply to Andrea Fontana

On Wednesday, 23 November 2016 at 13:01:22 UTC, Andrea Fontana wrote:
> On Tuesday, 22 November 2016 at 06:31:45 UTC, Ilya Yaroshenko wrote:
>>  - 64-bit Mt19937 initialization is fixed
>>  - 64-bit Mt19937 is default for 64-bit targets
>
> I wonder why Mersenne Twister is *always* choosen over other algorithms.
> My vote goes for CMWC, of course.
>
> Andrea

A PR for CMWC is highly welcome!

November 23, 2016

Re: Mir Random [WIP]

Posted by Andrei Alexandrescu
in reply to Ilya Yaroshenko

Andrei Alexandrescu

Posted in reply to Ilya Yaroshenko

On 11/23/2016 12:58 AM, Ilya Yaroshenko wrote:
> On Tuesday, 22 November 2016 at 23:55:01 UTC, Andrei Alexandrescu wrote:
>> On 11/22/16 1:31 AM, Ilya Yaroshenko wrote:
>>>  - `opCall` API instead of range interface is used (similar to C++)
>>
>> This seems like a gratuitous departure from common D practice. Random
>> number generators are most naturally modeled in D as infinite ranges.
>> -- Andrei
>
> It is safe low level architecture without performance and API issues.

I don't understand this. Can you please be more specific? I don't see a major issue wrt offering opCall() vs. front/popFront. (empty is always true.)

> It
> prevents users to do stupid things implicitly (like copying RNGs).

An input range can be made noncopyable.

> A
> hight level range interface can be added in the future (it will hold a
> _pointer_ to an RNG).

Is there a reason to not have that now? Again, I'm literally talking about offering front/popFront in lieu of opCall(). The only implementation difference is you keep the currently-generated number as a member instead of returning it from opCall. I doubt one could measure a performance difference.

If you implement it as a noncopyable input range, you get a large support network working for you. With opCall, we have virtually no such support - you need to do everything once again.

> In additional, when you need to write algorithms
> or distributions opCall is much more convenient than range API.

Could you please be more specific? On the face of it I'd agree one call is less than two, but I don't see a major drawback here.

> In
> additions, users would not use Engine API in 99% cases: they will just
> want to call `rand` or `uniform`, or other distribution.
>
> I am sure that almost any library should have low level API that is fits
> to its implementation first. Addition API levels also may be added.

Is there a large difference between opCall and front/popFront?

Actually I can think of one - the matter of getting things started. Ranges have this awkwardness of starting the iteration: either you fill the current front eagerly in the constructor, or you have some sort of means to detect initialization has not yet been done and do it lazily upon the first use of front. The best strategy would depend on the actual generator, and admittedly would be a bit more of a headache compared to opCall. Was this the motivation?

> Current Phobos evolution is generic degradation: more generic and
> "universal" code hide more uncovered bugs in the code. The std.range is
> good example of degradation, it has a lot of API and implementation  bugs.

Do you have examples of issues outside random number generators?

> ### Example of API+implementation bug:
>
> #### Bug: RNGs has min and max params (hello C++). But, they are not
> used when an uniform integer number is generated : `uniform!ulong` /
> `uniform!ulong(0, 100)`.
>
> #### Solution: In Mir Rundom any RNGs must generate all 8/16/32/64 bits
> uniformly. It is RNG problem how to do it.

Min and max are not parameters, they are bounds provided by each generator. I agree their purpose is unclear. We could require all generators to provide min = 0 and max = UIntType.max without breaking APIs. In that case we only need to renounce LinearCongruentialEngine with c = 0 (see https://github.com/dlang/phobos/blob/master/std/random.d#L258) - in fact that's the main reason for introducing min and max in the first place. All other code stays unchanged, and we can easily deprecate min and max for RNGs.

(I do see min and max used by uniform at https://github.com/dlang/phobos/blob/master/std/random.d#L1281 so I'm not sure I get what you mean, but anyhow the idea that we require RNGs to fill an uint/ulong with all random bits simplifies a lot of matters.)

> I will not fill this bug as well another dozen std.random bugs because
> the module should be rewritten anyway and I am working on it. std.random
> is a collection of bugs from C/C++ libraries extended with D generic
> idioms. For example, there is no reason in 64 bit Xorshift. It is 32 bit
> by design. Furthermore, 64 expansion of 32 bit algorithms must be proved
> theoretically before we allow it for end users. 64 bit analogs are
> exists, but they have another implementations.

One matter that I see is there's precious little difference between mir.random and std.random. Much of the code seems copied, which is an inefficient way to go about things. We shouldn't fork everything if we don't like a bit of it, though admittedly the path toward making changes in std is more difficult. Is your intent to work on mir.random on the side and then submit it as a wholesale replacement of std.random under a different name? In that case you'd have my support, but you'd need to convince me the replacement is necessary. You'd probably have a good case for eliminating xorshift/64, but then we may simply deprecate that outright. You'd possibly have a more difficult time with opCall.

> Phobos degrades because
> we add a lot of generic specializations and small utilities without
> understanding use cases.

This is really difficult to parse. Are you using "degrades" the way it's meant? What is a "generic specialization"? What are examples of "small utilities without understanding use cases"?

> Phobos really follows stupid idealistic idea:
> more generic is better, more API is better, more universal algorithms is
> better. The problems is that Phobos/DRuntime is soup where all (because
> its "universality") interacts with everything.

I do think more generic is better, of course within reason. It would be a tenuous statement that generic designs in Phobos such as ranges, algorithms, and allocators are stupid and idealistic. So I'd be quite interested in hearing more about this. What's that bouillabaisse about?

Thanks,

Andrei

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation