June 14, 2012
> Assuming API changes are permissible (a big IF), I think the way to do it might be something like
>
> auto randomSample(R, UniformRNG = Random, seed)(R r, size_t n, seed s = unpredictableSeed)
>
> ... meaning that the user has an option to specify an RNG _type_ without specifying a seed, and that the seed can be either unpredictable or specific.
>
> The user could also not specify a RNG type, in which case the default RNG type would be used, but would be seeded unpredictably.

Issues with copying rngs are not unique to randomSample - anyone
who uses random number generators will run into them
sooner or later. So the problem is not that randomSample takes
a rng as a parameter - it's that rngs are value types. The
proper solution would be to make them reference types. I don't
know if that can be done now as it would surely break some code,
but people can use pointers to rngs instead of rngs themselves.
I think not doing that will cause bugs irregardless of how
randomSample works. I know it would cause bugs in my code.

June 14, 2012
On 15.06.2012 1:37, jerro wrote:
>> Assuming API changes are permissible (a big IF), I think the way to do
>> it might be something like
>>
>> auto randomSample(R, UniformRNG = Random, seed)(R r, size_t n, seed s
>> = unpredictableSeed)
>>
>> ... meaning that the user has an option to specify an RNG _type_
>> without specifying a seed, and that the seed can be either
>> unpredictable or specific.
>>
>> The user could also not specify a RNG type, in which case the default
>> RNG type would be used, but would be seeded unpredictably.
>
> Issues with copying rngs are not unique to randomSample - anyone
> who uses random number generators will run into them
> sooner or later. So the problem is not that randomSample takes
> a rng as a parameter - it's that rngs are value types. The
> proper solution would be to make them reference types.

Good insight.

> I don't
> know if that can be done now as it would surely break some code,
> but people can use pointers to rngs instead of rngs themselves.
> I think not doing that will cause bugs irregardless of how
> randomSample works. I know it would cause bugs in my code.
>



-- 
Dmitry Olshansky
June 14, 2012
On Thursday, June 14, 2012 23:37:05 jerro wrote:
> Issues with copying rngs are not unique to randomSample - anyone
> who uses random number generators will run into them
> sooner or later. So the problem is not that randomSample takes
> a rng as a parameter - it's that rngs are value types. The
> proper solution would be to make them reference types.

Exactly. This has been discussed before (IIRC, there's a bug report which discusses it). The most straightforward solution is to change the random number generator ranges to classes, but that _will_ break code, because they're constructed completely differently.

However, it should be quite possible to make the structs reference types. The simplest way to do that would probably be to just move their member variables to a struct which they hold a pointer to rather than holding them directly as member variables. The only code that that would break would be code that relied on assigning one such range to another (or passing it to a function) effectively resulting in a save call. But since we're talking about random number generators, I have a hard time believing that that will actually break much of anything. It's more likely to _fix_ code (such as the problem being discussed in this thread).

The change could probably be effected in an hour or two without much difficulty. Doing something like making the state refcounted would be more of a pain, but it would also be quite doable if it were considered desirable enough.

- Jonathan M Davis
June 14, 2012
On 14/06/12 21:27, Jens Mueller wrote:
> What are the advantages of the above in regard to
> auto randomSample(R, Random)(R r, size_t n, Random r = Random(unpredictableSeed))
> ?
> Is it because you got bitten that the generators are passed by value?

It's because it relies on the user understanding that they have to _seed_ any RNG they pass -- they can't just e.g. pass the RNG of the current thread they are in, because then they will have correlations between the sample and any subsequent random numbers they generate.  So the possibility to do this:

    sample = randomSample(iota(0, 100), 5, rndGen);

... should probably be disallowed on grounds of safety.

Now, since you HAVE to seed the RNG in order to take a sample safely, you might as well make the _seed_ the thing which is passed, rather than the RNG.
June 14, 2012
On Thursday, 14 June 2012 at 22:17:44 UTC, Jonathan M Davis wrote:
> On Thursday, June 14, 2012 23:37:05 jerro wrote:
>> Issues with copying rngs are not unique to randomSample - anyone
>> who uses random number generators will run into them
>> sooner or later. So the problem is not that randomSample takes
>> a rng as a parameter - it's that rngs are value types. The
>> proper solution would be to make them reference types.
>
> Exactly. This has been discussed before (IIRC, there's a bug report which
> discusses it). The most straightforward solution is to change the random
> number generator ranges to classes, but that _will_ break code, because
> they're constructed completely differently.

I don't think the solution is to pass the RNGs as refs.  I'm not familiar with the precise bug report you mention, but the reasoning is clear: it's because if you do something like,

    auto getRandomSample(size_t N, size_t n)
    {
        auto rng = Random(unpredictableSeed);
        auto sample = randomSample(iota(0, N), n, rng);
    }

    void main()
    {
        writeln(getRandomSample(100, 5));
    }

... then if the RNG were stored in RandomSample by ref rather than value, this will fail, because the variable rng will no longer exist when the sample is evaluated in main().

As far as I can see the only _safe_ option is that randomSample must always be passed a uniquely seeded RNG.  And if you take that line then it's a small step to conclude that you can dispense with passing an RNG at all and just pass the seed, with the RNG type passed as a template parameter.
June 14, 2012
On Friday, June 15, 2012 01:04:15 Joseph Rushton Wakeling wrote:
> On Thursday, 14 June 2012 at 22:17:44 UTC, Jonathan M Davis wrote:
> > On Thursday, June 14, 2012 23:37:05 jerro wrote:
> >> Issues with copying rngs are not unique to randomSample -
> >> anyone
> >> who uses random number generators will run into them
> >> sooner or later. So the problem is not that randomSample takes
> >> a rng as a parameter - it's that rngs are value types. The
> >> proper solution would be to make them reference types.
> > 
> > Exactly. This has been discussed before (IIRC, there's a bug
> > report which
> > discusses it). The most straightforward solution is to change
> > the random
> > number generator ranges to classes, but that _will_ break code,
> > because
> > they're constructed completely differently.
> 
> I don't think the solution is to pass the RNGs as refs.

I'm not talking about passing them as refs. I'm talking about making them reference types. It arguably makes no sense for random number generators to be passed around by value. It makes it ludicrously easy to get the same value multiple times simply because you passed it around. If you want to guarantee that you get the same value multiple times, then you should have to use save.

> I'm not
> familiar with the precise bug report you mention, but the
> reasoning is clear:it's because if you do something like,
> 
> auto getRandomSample(size_t N, size_t n)
> {
> auto rng = Random(unpredictableSeed);
> auto sample = randomSample(iota(0, N), n, rng);
> }
> 
> void main()
> {
> writeln(getRandomSample(100, 5));
> }
> 
> ... then if the RNG were stored in RandomSample by ref rather than value, this will fail, because the variable rng will no longer exist when the sample is evaluated in main().

I don't believe that the bug report had anything to do with randomSample. It had to do with using random number generator ranges in general. The problem is that if do something like

auto func(R)(R r)
{
 r.popFront();
 auto value = r.front;
 ///...
}

func(generator);
generator.popFront();
auto value = generator.front;

then both of those values will be the same, because the generator was passed by value. You don't have that problem if the generator is a reference type. And if you really want the same value twice, then you can use save.

> As far as I can see the only _safe_ option is that randomSample must always be passed a uniquely seeded RNG. And if you take that line then it's a small step to conclude that you can dispense with passing an RNG at all and just pass the seed, with the RNG type passed as a template parameter.

You need to be able to pass a random number generator range, because otherwise, you can't use any number generator other than the default. But you could just reseed the thing at the beginning of randomSample as long as isSeedable!R is true if you really want to ensure that it's seeded right before randomSample does its thing.

- Jonathan M Davis
June 15, 2012
On Thursday, 14 June 2012 at 23:32:41 UTC, Jonathan M Davis wrote:
> I'm not talking about passing them as refs. I'm talking about making them
> reference types. It arguably makes no sense for random number generators to be
> passed around by value. It makes it ludicrously easy to get the same value
> multiple times simply because you passed it around.

I didn't have time to look at the specific issue with RandomSample yet, but I can confirm that the implications of the current RNG-as-a-value design were surprising for me as well when I first used std.random – I see the point in avoiding heap allocations whenever possible, though.

David
June 15, 2012
On 06/15/12 00:55, Joseph Rushton Wakeling wrote:
>     sample = randomSample(iota(0, 100), 5, rndGen);
> 
> ... should probably be disallowed on grounds of safety.

Considering the output of this program:

   import std.stdio;
   import std.random;

   void main() {
      foreach (i; 0..20)
         writeln(randomSample([0,1,2,4,5,6,7,8,9], 3, Random(unpredictableSeed)));
   }

I'd say the use of std.random should be disallowed on grounds of safety...

Does it work for someone else? (JIC it's only my old GDC installation that fails)

artur
June 15, 2012
Jonathan M Davis wrote:
> On Thursday, June 14, 2012 23:37:05 jerro wrote:
> > Issues with copying rngs are not unique to randomSample - anyone
> > who uses random number generators will run into them
> > sooner or later. So the problem is not that randomSample takes
> > a rng as a parameter - it's that rngs are value types. The
> > proper solution would be to make them reference types.
> 
> Exactly. This has been discussed before (IIRC, there's a bug report which discusses it). The most straightforward solution is to change the random number generator ranges to classes, but that _will_ break code, because they're constructed completely differently.

Is this
http://d.puremagic.com/issues/show_bug.cgi?id=7067
the bug report you are referring to?

Jens
June 15, 2012
On Friday, June 15, 2012 10:09:28 Jens Mueller wrote:
> Jonathan M Davis wrote:
> > On Thursday, June 14, 2012 23:37:05 jerro wrote:
> > > Issues with copying rngs are not unique to randomSample - anyone
> > > who uses random number generators will run into them
> > > sooner or later. So the problem is not that randomSample takes
> > > a rng as a parameter - it's that rngs are value types. The
> > > proper solution would be to make them reference types.
> > 
> > Exactly. This has been discussed before (IIRC, there's a bug report which discusses it). The most straightforward solution is to change the random number generator ranges to classes, but that _will_ break code, because they're constructed completely differently.
> 
> Is this
> http://d.puremagic.com/issues/show_bug.cgi?id=7067
> the bug report you are referring to?

I think so.

- Jonathan M Davis