January 25, 2016
On Monday, 25 January 2016 at 21:20:09 UTC, Joseph Rushton Wakeling wrote:
> I thought that too, for a long time, but I came to the conclusion it's not the case.
>
> <snip>
>
> That's fine if you're dealing with something whose behaviour is meant to be deterministic, but if you call this with a pseudo-random sequence, than every time you call it, you will get the same result.  It won't matter if what you pass is a  reference-type -- the .save (which is correct behaviour for handling a forward range) means you're just repeatedly using a copy of the random sequence.
>
> <snip>
>
> Right -- non-PRNGs must be input ranges by design.  I came to the conclusion that pseudo-RNGs need to be input ranges, but that implement an alternative to .save: a .dup method whose promise is, "You can duplicate the state and hence behaviour of this range, but you can't make any assumptions that this is a safe or sane thing to do."


 So in short, the RNG shouldn't be a range at all. Of course it could be a struct (for sanity and other reasons), but not a range.

 I wonder then, assuming we remove RNG from being a range, the a RNG could give out a delegate of it's current data, and that delegate get passed to a range-like wrapper? Or maybe the RNG returns a Voldemort range that referrs to the original RNG which isn't a range anymore... Actually that sounds promising. Aliasing could deal with it automatically converting/creating the range.
January 25, 2016
On Monday, 25 January 2016 at 21:30:47 UTC, Era Scarecrow wrote:
> On Monday, 25 January 2016 at 21:20:09 UTC, Joseph Rushton Wakeling wrote:
>> Right -- non-PRNGs must be input ranges by design.  I came to the conclusion that pseudo-RNGs need to be input ranges, but that implement an alternative to .save: a .dup method whose promise is, "You can duplicate the state and hence behaviour of this range, but you can't make any assumptions that this is a safe or sane thing to do."
>
>
>  So in short, the RNG shouldn't be a range at all. Of course it could be a struct (for sanity and other reasons), but not a range.

Why? They work fine as input ranges regardless of whether having them be forward ranges make sense. By its very nature, an input range cannot be saved, and it's not assumed that it's deterministic. The issue that we run into isn't that they're ranges but that they're not implemented as reference types, and copies of them accidentally get used, resulting in subtle bugs. But I'd strongly argue that we should at least consider requiring that all input ranges be reference types, since they don't make sense as value types, and pseudo-reference types are just plain buggy. That's a wider discussion than random number generator ranges though.

- Jonathan M Davis
January 25, 2016
On Monday, 25 January 2016 at 21:30:47 UTC, Era Scarecrow wrote:
>  So in short, the RNG shouldn't be a range at all. Of course it could be a struct (for sanity and other reasons), but not a range.
>
>  I wonder then, assuming we remove RNG from being a range, the a RNG could give out a delegate of it's current data, and that delegate get passed to a range-like wrapper? Or maybe the RNG returns a Voldemort range that referrs to the original RNG which isn't a range anymore... Actually that sounds promising. Aliasing could deal with it automatically converting/creating the range.

Well, an idea that was suggested to me at DConf (by several people; thanks in particular to Steve S. and to DiceBot!) was indeed to separate RNG state from the range interface, and to disable copy-by-value for the state data structure.

One option would be to implement the basic RNG data structor à la C++, as a functor: so it'd be something like:

    struct SomeRNG(UIntType, ...)
    {
      private:
        // the RNG state variables

      public:
        UIntType opCall()
        {
            // returns a random variate
        }
    }

... and again, you'd disable copy-by-value for this entity.  I had some fun a while ago playing with this and the appropriate technique to wrap it into a range (it's not as trivial as you think, because you need to use some tricks to ensure truly lazy evaluation of the range, where D ranges typically evaluate the first element eagerly).

Where I ran into trouble was considering how to extend this approach in a viable way to stuff like randomSample, i.e. the stuff which wraps randomness, which also needs to ensure its internal state is never copied by value, and yet which needs (ideally) to fit nicely into a UFCS chain of ranges:

    someInput.filter!(WHATEVER).randomSample(n, rng).map!(...).writeln;

I suppose it might be possible to implement a functor-based approach for this, that could be wrapped in a range, but it feels nasty for something which really ought to fit more naturally into the range syntax: a random sample is just an algorithm (similar to those in std.algorithm), but one whose elements need to be truly lazily evaluated and whose values are not deterministic but depend on a source of randomness.

The entire complication comes out of the fact that in order to play nice in a statistical sense, you need the RandomSample range to be a reference type, but in order to play nice with the kind of in-the-middle-of-the-inner-loop use above, you need it to be cheap and quick to instantiate and destroy (so classes and heap allocation are a problem).

Basically, what you probably want is for RandomSample to be a struct, but with a reference-type internal state that is nevertheless allocated on the stack and that is cheap to create and let go of when you're done with it.
January 25, 2016
On Monday, 25 January 2016 at 21:22:13 UTC, Joseph Rushton Wakeling wrote:
> On Monday, 25 January 2016 at 20:26:12 UTC, Era Scarecrow wrote:
>>  What about an alternative allocator? Specifically I'm thinking in C's equivalent which is alloca (allocates directly on the stack with the rest of the variables). If the constructor is a function then that won't work; but if it's inlined then it should work.
>
> I have been wondering about how allocators could help to deal with these problems.  Could you put forward a minimal example of how you would see it working?

 Most likely alloca would have to be built into the compiler. Here's a crash course in how the stack memory management works. sp=stack pointer, bp=base pointer (more relevant pre 386).

 When you prepare to enter a function (or block) the bp register is backed up, the sp is saved in the bp register, then you add your arguments, then the function is entered. After the function is entered it adds to the sp register which give it a block of space for all known local variables at once; then does any assignments it needs to. When you leave the function/block you simply replace the sp register with your backup the bp register. Once you return you restore the old bp register.

 So something close to this:
[code]
  ...
  push bp
  push calling_argument(s)
  call func
  sub sp, -N //size of total arguments pushed
  pop bp //completely done with function call.
  ...


  func: mov bp, sp
  add sp, +N //size of local variables
  // setup variables
  // code

  // cleanup/return
  mov sp, bp
  ret
[/code]

 Technically alloca simply returns the current sp, then adds to it the number of bytes you requested. This means you have to run it at the function stack where you want to use it (and not in a called function, otherwise corruption). So inlined functions where alloca's data would remain would be a must. Then it comes down to a simple:

[code]
struct RNG {
  int *state; //add assert to functions that this isn't null
  this(int seed) {
    state = alloca(sizeof(int));
    *state = seed;
  }
}
[/code]

>>  I suppose the alternate is an option to skip/throw-away some numbers that should've been consumed (assuming you want to keep using the same seed), or seeding each per use.
>
> I'm not sure I follow what you mean here or why you think this would work?  Could you give a concrete example?

certainly.

[code]
struct RNG {
  ...

  void skip(int x) {assert(x>0); while(x--) popfront();}
}

RNG rnd = RND(seed);
rnd.take(10).writeln;  //loosely based on talk examples
rnd.skip(10);          //skips the 'consumed' numbers.
rnd.take(10).writeln;  //won't have identical output

[/code]

January 25, 2016
On Monday, 25 January 2016 at 22:06:31 UTC, Era Scarecrow wrote:
>  Most likely alloca would have to be built into the compiler. Here's a crash course in how the stack memory management works. sp=stack pointer, bp=base pointer (more relevant pre 386).

An apology in advance: I have an early start tomorrow, and a busy few days coming up, so I probably won't be able to follow up on this discussion until the weekend.

In the meantime, thanks for the food for thought.  I'll try to be in touch about this topic soon :-)
January 25, 2016
On 01/25/2016 04:20 PM, Joseph Rushton Wakeling wrote:
> I thought that too, for a long time, but I came to the conclusion it's
> not the case.

I see no problem with adding a category of rngs that are not forward. Naming is of course the hardest problem in our community :o). A good stard would be a /dev/random wrapper. -- Andrei
January 25, 2016
On 01/25/2016 05:05 PM, Joseph Rushton Wakeling wrote:
> One option would be to implement the basic RNG data structor à la C++,
> as a functor

That's semantically the same as an input range. -- Andrei

January 26, 2016
On Monday, 25 January 2016 at 23:34:56 UTC, Andrei Alexandrescu wrote:
> I see no problem with adding a category of rngs that are not forward. Naming is of course the hardest problem in our community :o). A good stard would be a /dev/random wrapper. -- Andrei

It's not about different categories of RNG in this case -- really, NO RNGs should be forward ranges.

If ONLY it was just a naming problem... :-(
January 26, 2016
On Monday, 25 January 2016 at 23:37:25 UTC, Andrei Alexandrescu wrote:
> On 01/25/2016 05:05 PM, Joseph Rushton Wakeling wrote:
>> One option would be to implement the basic RNG data structor à la C++,
>> as a functor
>
> That's semantically the same as an input range. -- Andrei

“Yes, but...” :-P

There are actually some interesting subtleties required for the input range design, not just for RNGs but for ANY range whose elements are generated randomly.

I will try and write this up properly later, but the TL;DR is, it involves doing extra work to ensure every element is _truly_ lazily evaluated.

To some extent, splitting into functor and range wrapper can help clarify the code design there (even if the functor stays hidden as an implementation detail).
January 26, 2016
On 01/26/2016 02:07 AM, Joseph Rushton Wakeling wrote:
> On Monday, 25 January 2016 at 23:34:56 UTC, Andrei Alexandrescu wrote:
>> I see no problem with adding a category of rngs that are not forward.
>> Naming is of course the hardest problem in our community :o). A good
>> stard would be a /dev/random wrapper. -- Andrei
>
> It's not about different categories of RNG in this case -- really, NO
> RNGs should be forward ranges.

I think a pseudo-rng as a forward range is useful. It's good in testing and experimentation to fork a sequence of pseudo-random numbers, turn the clock back, etc. Essentially I see no harm in it; it's always easy to make a forward range into an input range, it's the opposite that's hard. -- Andrei