January 26, 2016
On Tuesday, 26 January 2016 at 12:07:59 UTC, Andrei Alexandrescu wrote:
> I think a pseudo-rng as a forward range is useful. It's good in testing and experimentation to fork a sequence of pseudo-random numbers, turn the clock back, etc. Essentially I see no harm in it; it's always easy to make a forward range into an input range, it's the opposite that's hard. -- Andrei

Yes, but there are other ways to fork that sequence that don't involve implementing the .save primitive.  That's why in my talk (and here) I distinguish between the .save primitive versus some other possible primitive -- say, .dup -- whose promise is, "You can use me to duplicate the state of this range, but you must use me carefully and in an appropriate context."

This is important, because if you just define pseudo-RNGs as forward ranges, most people won't read the small print about when you need to wrap it in an input range (which is pretty much any time you want to pass it into phobos code and have it behave like an actual source of randomness).  As a result, they'll get unintended and possibly unnoticed statistical correlations -- annoying for something like a computer game, possibly a serious consequence if occurring in something like a research simulation.
February 07, 2016
On Monday, 25 January 2016 at 22:06:31 UTC, Era Scarecrow wrote:
> On Monday, 25 January 2016 at 21:22:13 UTC, Joseph Rushton Wakeling wrote:
>> I have been wondering about how allocators could help to deal with these problems.  Could you put forward a minimal example of how you would see it working?
>
>  Most likely alloca would have to be built into the compiler. Here's a crash course in how the stack memory management works. sp=stack pointer, bp=base pointer (more relevant pre 386).

Apologies for the delay in writing back about this.

What you describe makes sense, but I don't quite follow what you mean in one particular case:

>  Technically alloca simply returns the current sp, then adds to it the number of bytes you requested. This means you have to run it at the function stack where you want to use it (and not in a called function, otherwise corruption). So inlined functions where alloca's data would remain would be a must.

I don't quite follow your remark about inlined functions; do you mean that the function where the RNG instance is generated must be inlined?  (That would make sense in order to avoid the internal state being deallocated immediately.)

I think there might be more complications here than just allocating individual RNG instances, though (which can happen quite early on in the program); what about stuff like random algorithms (RandomCover, RandomSample) which might be generated deep in internal loops, passed to other functionality as rvalues, etc. etc.?

> Then it comes down to a simple:
>
> [code]
> struct RNG {
>   int *state; //add assert to functions that this isn't null
>   this(int seed) {
>     state = alloca(sizeof(int));
>     *state = seed;
>   }
> }
> [/code]

Yes, missing your understanding of the details of how it would have to happen, this is pretty much what I had in mind for random ranges; a pointer to internal state nevertheless allocated on the stack.  But the already-mentioned concerns about some of the ways that stack could be deallocated make for some concerns.

It might be simpler, in practice, to just have the state refcounted.

>>>  I suppose the alternate is an option to skip/throw-away some numbers that should've been consumed (assuming you want to keep using the same seed), or seeding each per use.
>>
>> I'm not sure I follow what you mean here or why you think this would work?  Could you give a concrete example?
>
> certainly.
>
> [code]
> struct RNG {
>   ...
>
>   void skip(int x) {assert(x>0); while(x--) popfront();}
> }
>
> RNG rnd = RND(seed);
> rnd.take(10).writeln;  //loosely based on talk examples
> rnd.skip(10);          //skips the 'consumed' numbers.
> rnd.take(10).writeln;  //won't have identical output
>
> [/code]

I'm afraid that's not really viable :-(  In the best case, it's just working around the fundamental design problem via programmer virtue.  But the problem is, in the general case, you can't anticipate how many random variates may be popped from your random number generator inside a function (it might depend on other input factors over which you as programmer have no control).
February 08, 2016
On Sunday, 7 February 2016 at 22:27:40 UTC, Joseph Rushton Wakeling wrote:
> On Monday, 25 January 2016 at 22:06:31 UTC, Era Scarecrow wrote:
> What you describe makes sense, but I don't quite follow what you mean in one particular case:
>
>>  Technically alloca simply returns the current sp, then adds to it the number of bytes you requested. This means you have to run it at the function stack where you want to use it (and not in a called function, otherwise corruption). So inlined functions where alloca's data would remain would be a must.

 That's the low level assembly language that's generated i'm referring to; At least based on what i've read and seen for code output from a compiler to a .s file or similar.

> I don't quite follow your remark about inlined functions; do you mean that the function where the RNG instance is generated must be inlined?  (That would make sense in order to avoid the internal state being deallocated immediately.)

Assuming alloca moves to the inlined function. Although i had another idea thrown in my head where the memory would be pre-allocated and you could just point to it when requested via an allocator. So assume

@alloca(sizeof(int)) struct RNG {

 During instantiation it would know the size ahead of time and just append that to the end of the structure. That extra padding space could be handled manually instead.

 this(int seed) {
   state = cast(void*)(this+1);
 }

 But this forced type breaking is quite clunky (and obviously the wrong way to write it).

 I recall in C there was suppose to be a way to attach an array (of unknown size) immediately following a struct by making the length of the array 0, then accessing it directly. But you'd still need to guarantee somehow that the access rights are in place and not referencing other data which you could screw up (via optimizations or something).

> I think there might be more complications here than just allocating individual RNG instances, though (which can happen quite early on in the program); what about stuff like random algorithms (RandomCover, RandomSample) which might be generated deep in internal loops, passed to other functionality as rvalues, etc. etc.?

 Either they use more stack space, or they act normally after their call is done and are deallocated normally (automatically, unless they are passed outside of the scope where they were generated).

> It might be simpler, in practice, to just have the state refcounted.
>
>>>>  I suppose the alternate is an option to skip/throw-away some numbers that should've been consumed
>>> I'm not sure I follow what you mean here or why you think this would work?  Could you give a concrete example?

>>   void skip(int x) {assert(x>0); while(x--) popfront();}

>> rnd.take(10).writeln;  //loosely based on talk examples
>> rnd.skip(10);          //skips the 'consumed' numbers.
>> rnd.take(10).writeln;  //won't have identical output


> I'm afraid that's not really viable :-(
> But the problem is, in the general case, you can't anticipate how many random variates may be popped from your random number generator inside a function.

 True; Perhaps have one RNG for seeding and one RNG for passing, then reseed after passing the function off, although how far deep some of this could go with it's deeper copying; I don't know.

 Perhaps RNG should be a class outright, which probably removes a lot of these problems.
February 08, 2016
On Monday, 8 February 2016 at 00:54:24 UTC, Era Scarecrow wrote:
>  Either they use more stack space, or they act normally after their call is done and are deallocated normally (automatically, unless they are passed outside of the scope where they were generated).

It's that "passed outside of the scope where they were generated" that is bothering me.

Consider:

    auto foo (Range) (Range r)
        if (isInputRange!Range)
    {
        return r.randomSample(100).take(10);
    }

    void main ()
    {
        iota(1000).foo.writeln;
    }

If the RandomSample internals are allocated on the stack using the technique you propose, what's going to happen here?


>  True; Perhaps have one RNG for seeding and one RNG for passing, then reseed after passing the function off, although how far deep some of this could go with it's deeper copying; I don't know.

No, I really don't think that's an approach that's worth pursuing, except as a desperate workaround for the faults of the existing design.  RNGs should as much as possible "just work" and the user shouldn't have to concern themselves like this with


>  Perhaps RNG should be a class outright, which probably removes a lot of these problems.

It does indeed give you the desired reference semantics very cheaply.  But now suppose you have lots of RandomSamples being instantiated in the inner loop of your program.  If _they_ are also a class, how do you handle their instantiation and deallocation?
February 08, 2016
On Monday, 8 February 2016 at 19:46:19 UTC, Joseph Rushton Wakeling wrote:
> [snip]

This might be a stupid idea, but perhaps there's something useful in it:

Determinism isn't the same thing as "one long chain of numbers that everybody reads from".

It can be acceptable to seed a set of reasonable pseudo-random number generators with consecutive integers (indeed seeding randomly can be dangerous because of the birthday problem). More generally, any change of the state of the rng in "seed-space" should produce an output equivalent to taking a sample from the output distribution.

Can you not have a random number generator make a copy of itself like this:

struct RNG
{
    State state;
    static State.ModifierT modifier;
    this(this)
    {
        this.state.alterBy(modifier++);
        //recalculate output sample etc...
    }
}

Then any time you copy a RNG, the copy is kicked on to a new path in state-space. Basically we're deterministically re-seeding on copy.
1 2 3
Next ›   Last »