New Lagged Fib. PRNG gen and random2.d (page 3)

August 29, 2013

Re: New Lagged Fib. PRNG gen and random2.d

Posted by Joseph Rushton Wakeling
in reply to monarch_dodra

Permalink

Joseph Rushton Wakeling

Posted in reply to monarch_dodra

Permalink

On 27/08/13 08:16, monarch_dodra wrote:
> What bothers me about these is that they have a fixed seed value,
> which (IMO) is even worst than default seeding to
> "unpredicatableSeed"
>
> C's "rand" also "defaults seed". I've "tutored" C++ on codeguru
> for years, and years in and years out, I'd get asked "why does my
> random program keep creating the same output". This behavior
> surprises people.
>
> A PRNG should either error-out on no-seed, or be run-time
> unpredictably seeded. The middle ground just gives you the worst
> of both...

For what it's worth, I think that your experience is a consequence of the particular case of C.  The "easy" way to generate a random number in C is via some variation on rand() / RAND_MAX , so that's what people do, and so they get hit by the default seed.

On the other hand with D the default random number generator is rndGen, which is unpredictably seeded, and which is used if no other RNG is specified -- so _default_ random behaviour in D mimics that in interpreted languages and is different per program run.  (Actually, I sometimes find myself seeding rndGen in order to get _the same_ results, which of course sometimes you do want:-)

Personally I think there may be some value in having an accepted default configuration for RNGs, so long as it's correctly signposted what will happen, and so long as the "easy" thing to do is not going to fall into the trap you described.

However, I think initialization is an important issue not just for RNGs but for the diversity of other entities that use them.  This also impacts on the class/struct discussion.

For example, with RNGs per se it's pretty trivial (with a class-based approach) that we have a constructor requiring a seed, and in addition (possibly) a default constructor that seeds with some default condition (let's leave aside preferences on this for now:-).

The latter default-seed approach cannot be implemented with structs -- you can't have a no-parameter constructor -- so struct-based RNGs have to work round this in one of two ways: either by having default settings for all the internal values of the RNG state data (which e.g. Xorshift can do because it's a small total number of parameters), or by having conditionals which get triggered when front(), popFront() etc. are called, as in the Mersenne Twister implementation, which calls seed() if the value of the internal parameter mti is equal to size_t.max.

The latter approach is very annoying because it means that e.g. front() cannot be const, which we'd like it to be.

So, all of that makes final classes a nice approach, albeit we might compromise for other reasons.  But it's not so simple for other random functions.  Now consider random sampling, and the following 3 versions of code:

    {   // 1
        auto gen = Random(unpredictableSeed);
        auto sample = randomSample(iota(100), 10, gen);
        writeln(sample);
        writeln(uniform(0.0, 1.0, gen));
    }

    {   // 2
        auto gen = Random(unpredictableSeed);
        auto sample = randomSample(iota(100), 10, gen);
        writeln(uniform(0.0, 1.0, gen));
        writeln(sample);
    }

    {   // 3
        auto gen = Random(unpredictableSeed);
        writeln(uniform(0.0, 1.0, gen));
        auto sample = randomSample(iota(100), 10, gen);
        writeln(sample);
    }

What would you _expect_ to happen to the values of the random number and the random sample in each case?

The most intuitive thing would be that the values of the sample are determined at the point it's created, so they would depend on where the line sample = ... occurs and on nothing else.  However, we know that random samples are lazily evaluated.

So, perhaps the second logical alternative is that its values should be determined _when it is read_, i.e. at the point where we writeln(sample).

Now consider -- when is the _first_ "front" value of the sample set?  If it's set at construction time (which would be normal for a range), the remaining values in the sample will depend on what calls to the RNG are put in in-between construction and reading.  So, in this case, the samples from programs 1 and 2 will have the same _first_ value, but different subsequent values.

That, to me, is both unintuitive and undesirable.

Now it gets more complicated.  In the case of RandomSample, we have a bunch of different public functions we can call.  When front() is called, we need to check if the sample has been initialized before we return a value.  Similarly, popFront() should behave differently depending on whether the first sample value has been determined yet.  Then there is a method index() that returns the numerical index of the sampled value (so, if you sample from 0, 1, 2, .... you should have sample.front == sample.index always), and this too clearly depends on the sample being initialized before it can return.

None of this can be solved with a struct vs. class difference, assuming we agree that it's inherently undesirable to initialize at construction time.

This problem is potentially a rich source of bugs, as we saw with RandomCover in our recent pull request discussion.  Relying on manual solutions seems undesirable -- you will get people (as I did) fixing front and popFront(), but forgetting about a function like RandomSample.index().  (Fix in the works:-)

So, I'm wondering if there is a potential for a syntax sugar like invariant(), but which will not get kicked out when compiled with -release and which will enable essential checking of the internal state of the system whenever a public method is called.

I'd suggest there actually be two, one to check entry condition, one exit.

On Thursday, 29 August 2013 at 10:16:17 UTC, Joseph Rushton Wakeling wrote:
> On 29/08/13 08:45, monarch_dodra wrote:
>> Yes, that'd be the idea, but I mean that if you do that, you close the door to
>> giving straight access to a struct-implementation.
>
> Understood, but I'm not sure that's as big an issue as you think.  I exchanged some private emails with Manu and with Adam Ruppe, on how RNGs-as-final-classes might play with games or embedded programming, and I think the consensus was that it wasn't a big deal, so long as all the allocation happened up front.

Alright. Nice to now.

> Adam suggested that e.g. one could avoid GC by allocating memory and using emplace(), so as long as the class is compatible with that approach, it's fine.

This is actually much trickier than it looks. Check out "Scoped".

AFAIK, it is the only type which does this, and it is not copyable.

Forums