Mir Random [WIP] (page 5)

On Wednesday, 23 November 2016 at 15:29:14 UTC, Kagamin wrote:
> On Wednesday, 23 November 2016 at 14:30:53 UTC, Andrei
>
> Consider this okayish looking code:
> consume(rng());
> consume(rng.take(2)); //reuses previous value
> consume(rng()); //discards unused value

Also consider the case of using 1 generator in your program, but calling it from different places. In one place, you use opCall with the popFront -> front order of calls, and in the other you use the range interface directly with the order reversed. You would re-use values there too.

By offering both interfaces together it makes it pretty easy to create silly bugs like this, especially in a large project.

Might be useful to have a basic RNG interface and then wrap it with a template that provides either, but not both, of the other interfaces at a time.

November 23, 2016

Re: Mir Random [WIP]

Posted by Ilya Yaroshenko
in reply to Andrei Alexandrescu

Permalink

Ilya Yaroshenko

Posted in reply to Andrei Alexandrescu

Permalink

On Wednesday, 23 November 2016 at 13:41:25 UTC, Andrei Alexandrescu wrote:
> On 11/23/2016 12:58 AM, Ilya Yaroshenko wrote:
>> On Tuesday, 22 November 2016 at 23:55:01 UTC, Andrei Alexandrescu wrote:
>>> On 11/22/16 1:31 AM, Ilya Yaroshenko wrote:
>>>>  - `opCall` API instead of range interface is used (similar to C++)
>>>
>>> This seems like a gratuitous departure from common D practice. Random
>>> number generators are most naturally modeled in D as infinite ranges.
>>> -- Andrei
>>
>> It is safe low level architecture without performance and API issues.
>
> I don't understand this. Can you please be more specific? I don't see a major issue wrt offering opCall() vs. front/popFront. (empty is always true.)

A range to use it with std.algorithm and std.range must be copyable (it is passed by value.

>> It
>> prevents users to do stupid things implicitly (like copying RNGs).
>
> An input range can be made noncopyable.

Ditto. A noncopyable input range is useless.

>> A
>> hight level range interface can be added in the future (it will hold a
>> _pointer_ to an RNG).
>
> Is there a reason to not have that now?

Done. See `RandomRangeAdaptor`:
https://github.com/libmir/mir-random/blob/master/source/random/algorithm.d

>
>> In additional, when you need to write algorithms
>> or distributions opCall is much more convenient than range API.
>
> Could you please be more specific? On the face of it I'd agree one call is less than two, but I don't see a major drawback here.

The main reason in implementation simplicity. Engines should be simple to create,
simple to maintain, and simple to use. opCall is more simple then range interface because
1. One declaration instead of 4 (3 range functions for plus latest generated value (optional))
2. Input range is useless if range is not copyable.
3. `randomRangeAdaptor` is implemented for Engines and will be done for Distributions too. So range API is supported better then in std.range (because Engines are copied).

>> In
>> additions, users would not use Engine API in 99% cases: they will just
>> want to call `rand` or `uniform`, or other distribution.
>>
>> I am sure that almost any library should have low level API that is fits
>> to its implementation first. Addition API levels also may be added.
>
> Is there a large difference between opCall and front/popFront?
>
> Actually I can think of one - the matter of getting things started. Ranges have this awkwardness of starting the iteration: either you fill the current front eagerly in the constructor, or you have some sort of means to detect initialization has not yet been done and do it lazily upon the first use of front. The best strategy would depend on the actual generator, and admittedly would be a bit more of a headache compared to opCall. Was this the motivation?

Simplicity is main motivation.

>> ### Example of API+implementation bug:
>>
>> #### Bug: RNGs has min and max params (hello C++). But, they are not
>> used when an uniform integer number is generated : `uniform!ulong` /
>> `uniform!ulong(0, 100)`.
>>
>> #### Solution: In Mir Rundom any RNGs must generate all 8/16/32/64 bits
>> uniformly. It is RNG problem how to do it.
>
> Min and max are not parameters, they are bounds provided by each generator. I agree their purpose is unclear. We could require all generators to provide min = 0 and max = UIntType.max without breaking APIs. In that case we only need to renounce LinearCongruentialEngine with c = 0 (see https://github.com/dlang/phobos/blob/master/std/random.d#L258) - in fact that's the main reason for introducing min and max in the first place. All other code stays unchanged, and we can easily deprecate min and max for RNGs.
>
> (I do see min and max used by uniform at https://github.com/dlang/phobos/blob/master/std/random.d#L1281 so I'm not sure I get what you mean, but anyhow the idea that we require RNGs to fill an uint/ulong with all random bits simplifies a lot of matters.)

Current Mir solution looks like pair  isURBG and isSURBG. `S` prefix means `T.max == ReturnType!T.max` where T is an Engine. So, functions use isSURBG now. The min property is not required: we can just subtract actual min from a returning value.

An adaptor can be added to convert URBG to Saturated URBG.

>> I will not fill this bug as well another dozen std.random bugs because
>> the module should be rewritten anyway and I am working on it. std.random
>> is a collection of bugs from C/C++ libraries extended with D generic
>> idioms. For example, there is no reason in 64 bit Xorshift. It is 32 bit
>> by design. Furthermore, 64 expansion of 32 bit algorithms must be proved
>> theoretically before we allow it for end users. 64 bit analogs are
>> exists, but they have another implementations.
>
> One matter that I see is there's precious little difference between mir.random and std.random. Much of the code seems copied, which is an inefficient way to go about things. We shouldn't fork everything if we don't like a bit of it, though admittedly the path toward making changes in std is more difficult. Is your intent to work on mir.random on the side and then submit it as a wholesale replacement of std.random under a different name? In that case you'd have my support, but you'd need to convince me the replacement is necessary. You'd probably have a good case for eliminating xorshift/64, but then we may simply deprecate that outright. You'd possibly have a more difficult time with opCall.

I started with Engines as basis. The library will be very different comparing with Phobos and _any_ other RNG libraries in terms of floating point generation quality. All FP generation I have seen are not saturated (amount of possible unique FP values are very small comparing with ideal situation because of IEEE arithmetic). I have not found the idea described by others, so it may be an article in the future.

A set of new modern Engines would be added (Nicholas Wilson, and may be Joseph). Also Seb and I will add a set of distributions.

>> Phobos degrades because
>> we add a lot of generic specializations and small utilities without
>> understanding use cases.
>
> This is really difficult to parse. Are you using "degrades" the way it's meant? What is a "generic specialization"? What are examples of "small utilities without understanding use cases"?

Sorry, my English is ... .
It is not clear to me what subset of generic code is nothrow (reduce, for example). The same true for BetterC concept: it is hard to predict when an algorithms requires DRuntime to be linked / initialised. It is not clear what  modules are imported by an module.

"small utilities without understanding use cases" -
Numeric code in std.algorithm:
minElement, sum. They should not be in std.algorithm. A user can use `reduce`. Or, if speed is required we need to move to numeric solution suitable for vectorization. And std.algorithm seems to be wrong module for vectorised numeric code.

>> Phobos really follows stupid idealistic idea:
>> more generic is better, more API is better, more universal algorithms is
>> better. The problems is that Phobos/DRuntime is soup where all (because
>> its "universality") interacts with everything.
>
> I do think more generic is better, of course within reason. It would be a tenuous statement that generic designs in Phobos such as ranges, algorithms, and allocators are stupid and idealistic. So I'd be quite interested in hearing more about this. What's that bouillabaisse about?

For example std.allocator. It is awesome! But I can not use it in GLAS, because I don't understand if it will work without linking with DRuntime.

So, I copy-pasted and modified your code for AlignedMallocator:
https://github.com/libmir/mir-glas/blob/master/source/glas/internal/memory.d

ranges, algorithms seem good to me except it is not clear when code is nothrow /BetterC. std.math is a problem: we are adding new API without solving existing API problems and C compatibility. std.complex prevents math optimisations (this can not be solved without a compiler hacks), GLAS migrated to native (old) complex numbers.

I like generics when they make D usage simpler. If one will add a random number generation for Phobos sorting algorithm it will make it useless for BetterC (because it will require to link RNG). Such issues are not reviewed during Phobos review process. Linking Phobos / DRuntime is not an option because it has not backward binary compatibility, so packages can not be distributed as precompiled libraries.

std.traits, std.meta, std.range.primitives, std.ndslice, and part of std.math is only modules I am using in Mir libraries.

It is very important to me to have BetterC guarainties between different Phobos versions. Super generic code when different modules imports each other is hard to review.

Best regards,
Ilya

Forums