Jump to page: 1 2
Thread overview
randomShuffle
Jun 03, 2013
Yann
Jun 03, 2013
maarten van damme
Jun 03, 2013
bearophile
Jun 03, 2013
Yann
Jun 03, 2013
Diggory
Jun 03, 2013
Diggory
Jun 03, 2013
Diggory
Jun 04, 2013
Diggory
Jun 04, 2013
Diggory
Jun 04, 2013
Diggory
Jun 21, 2013
Diggory
June 03, 2013
Hey,

I am trying to generate an array of 10 unique (!) random numbers from 1 to 1000 each (for example). The best I could come up with is:

auto arr = iota(1, 1000).array;
randomShuffle(arr);
return arr.take(10).array.sort;

This leaves me quite unhappy, because I would like the code
a) to be one line
b) not use "array" more than once

Is there a better way to accomplish this? Naively, I would expect something like
"return iota(1, 1000).randomShuffle.take(10).sort;"

Thanks,
Yann
June 03, 2013
How about using std.random.randomSample?


2013/6/3 Yann <skratchie@gmx.de>

> Hey,
>
> I am trying to generate an array of 10 unique (!) random numbers from 1 to
> 1000 each (for example). The best I could come up with is:
>
> auto arr = iota(1, 1000).array;
> randomShuffle(arr);
> return arr.take(10).array.sort;
>
> This leaves me quite unhappy, because I would like the code
> a) to be one line
> b) not use "array" more than once
>
> Is there a better way to accomplish this? Naively, I would expect
> something like
> "return iota(1, 1000).randomShuffle.take(10).**sort;"
>
> Thanks,
> Yann
>


June 03, 2013
Yann:

> Is there a better way to accomplish this? Naively, I would expect something like
> "return iota(1, 1000).randomShuffle.take(10).sort;"

Two ways, the first gives items in random order, the second ordered as you seem to desire:

import std.stdio, std.random, std.range, std.array;
void main() {
    iota(1, 1001).randomCover(rndGen).take(10).writeln;
    iota(1, 1001).randomSample(10).writeln;
}


But be careful with randomCover, because maybe it takes the rnd generator by value.

Bye,
bearophile
June 03, 2013
Thanks a lot for the suggestions!

Cheers,
Yann

On Monday, 3 June 2013 at 10:06:30 UTC, bearophile wrote:
> Yann:
>
>> Is there a better way to accomplish this? Naively, I would expect something like
>> "return iota(1, 1000).randomShuffle.take(10).sort;"
>
> Two ways, the first gives items in random order, the second ordered as you seem to desire:
>
> import std.stdio, std.random, std.range, std.array;
> void main() {
>     iota(1, 1001).randomCover(rndGen).take(10).writeln;
>     iota(1, 1001).randomSample(10).writeln;
> }
>
>
> But be careful with randomCover, because maybe it takes the rnd generator by value.
>
> Bye,
> bearophile

June 03, 2013
On Monday, 3 June 2013 at 10:10:15 UTC, Yann wrote:
> Thanks a lot for the suggestions!
>
> Cheers,
> Yann
>
> On Monday, 3 June 2013 at 10:06:30 UTC, bearophile wrote:
>> Yann:
>>
>>> Is there a better way to accomplish this? Naively, I would expect something like
>>> "return iota(1, 1000).randomShuffle.take(10).sort;"
>>
>> Two ways, the first gives items in random order, the second ordered as you seem to desire:
>>
>> import std.stdio, std.random, std.range, std.array;
>> void main() {
>>    iota(1, 1001).randomCover(rndGen).take(10).writeln;
>>    iota(1, 1001).randomSample(10).writeln;
>> }
>>
>>
>> But be careful with randomCover, because maybe it takes the rnd generator by value.
>>
>> Bye,
>> bearophile

For small samples from very large ranges an efficient algorithm would be:

int[] randomGen(int N, int M) {
    if (N == 0)
        return [];

    int[] result = randomGen(N-1, M-1);

    int num = rand(M);
    foreach (ref int i; result)
        if (i >= num)
            ++i;

    result ~= num;

    return result;
}
June 03, 2013
On 06/03/2013 10:48 AM, Yann wrote:
> I am trying to generate an array of 10 unique (!) random numbers from 1 to 1000
> each (for example). The best I could come up with is:

You mean you want a random sample of the numbers 1, 2, ..., 1000?

That is, you want to pick 10 unique numbers from 1, ..., 1000 with equal probability of picking any one?

randomSample is your friend:

	auto arr = randomSample(iota(1, 1001), 10).array;

This picks a random sample of 10 values from the range iota(1, 1001).  (Note the
1001 is because iota iterates over [start, finish) not [start, finish], so you
need (1, 1001) if you want it to cover the values 1 to 1000.  I wish there was a
better/safer way of having a "closed-interval" iota.)
June 03, 2013
On 06/03/2013 01:29 PM, Diggory wrote:
> For small samples from very large ranges an efficient algorithm would be:
> 
> int[] randomGen(int N, int M) {
>     if (N == 0)
>         return [];
> 
>     int[] result = randomGen(N-1, M-1);
> 
>     int num = rand(M);
>     foreach (ref int i; result)
>         if (i >= num)
>             ++i;
> 
>     result ~= num;
> 
>     return result;
> }

Are you sure about the efficiency of that algorithm?  Looks like it's be O(M) to me.

randomSample currently has a computational complexity (and number of random
numbers required) of O(n) where n is sample size.

June 03, 2013
On 06/03/2013 02:30 PM, Joseph Rushton Wakeling wrote:
> On 06/03/2013 01:29 PM, Diggory wrote:
>> For small samples from very large ranges an efficient algorithm would be:
>>
>> int[] randomGen(int N, int M) {
>>     if (N == 0)
>>         return [];
>>
>>     int[] result = randomGen(N-1, M-1);
>>
>>     int num = rand(M);
>>     foreach (ref int i; result)
>>         if (i >= num)
>>             ++i;
>>
>>     result ~= num;
>>
>>     return result;
>> }
> 
> Are you sure about the efficiency of that algorithm?  Looks like it's be O(M) to me.
> 
> randomSample currently has a computational complexity (and number of random
> numbers required) of O(n) where n is sample size.

There are at least two other key faults with that algorithm compared to
randomSample -- you're alloc'ing repeatedly inside it, and you return the whole
sample.  randomSample doesn't require any such storage, it can take any input
range as the sample source and just jumps across the range; and what it returns
is another range interface, from which you can take the sample points sequentially.
June 03, 2013
On 06/03/2013 02:30 PM, Joseph Rushton Wakeling wrote:
> On 06/03/2013 01:29 PM, Diggory wrote:
>> For small samples from very large ranges an efficient algorithm would be:
>>
>> int[] randomGen(int N, int M) {
>>     if (N == 0)
>>         return [];
>>
>>     int[] result = randomGen(N-1, M-1);
>>
>>     int num = rand(M);
>>     foreach (ref int i; result)
>>         if (i >= num)
>>             ++i;
>>
>>     result ~= num;
>>
>>     return result;
>> }
> 
> Are you sure about the efficiency of that algorithm?  Looks like it's be O(M) to me.

Apologies, I misread the algorithm slightly.  Your qualifier is quite correct -- when the number of sample points is very small (e.g. 5 out of some arbitrary very large number), that algorithm is very quick indeed (I ran some tests as I was curious:-), and can outperform D's randomSample.

It doesn't scale with the size of the input (your value M), but with the number of sample points n, but that scaling is very bad -- so as the sample size grows it becomes _much_ worse than randomSample.

Do you have a reference for the algorithm?  Off the top of my head I don't recognize it from any of the texts I've read.
June 03, 2013
On Monday, 3 June 2013 at 13:18:30 UTC, Joseph Rushton Wakeling wrote:
> On 06/03/2013 02:30 PM, Joseph Rushton Wakeling wrote:
>> On 06/03/2013 01:29 PM, Diggory wrote:
>>> For small samples from very large ranges an efficient algorithm would be:
>>>
>>> int[] randomGen(int N, int M) {
>>>     if (N == 0)
>>>         return [];
>>>
>>>     int[] result = randomGen(N-1, M-1);
>>>
>>>     int num = rand(M);
>>>     foreach (ref int i; result)
>>>         if (i >= num)
>>>             ++i;
>>>
>>>     result ~= num;
>>>
>>>     return result;
>>> }
>> 
>> Are you sure about the efficiency of that algorithm?  Looks like it's be O(M) to me.
>
> Apologies, I misread the algorithm slightly.  Your qualifier is quite correct --
> when the number of sample points is very small (e.g. 5 out of some arbitrary
> very large number), that algorithm is very quick indeed (I ran some tests as I
> was curious:-), and can outperform D's randomSample.
>
> It doesn't scale with the size of the input (your value M), but with the number
> of sample points n, but that scaling is very bad -- so as the sample size grows
> it becomes _much_ worse than randomSample.
>
> Do you have a reference for the algorithm?  Off the top of my head I don't
> recognize it from any of the texts I've read.

Thanks for testing before dismissing completely :P The way it returns results can be improved a lot by pre-allocating a range of the necessary size/using a range passed in.

The complexity is O(N²) where N is the number of samples out of M. The algorithm is something I just thought of although I've possibly used it before and I'm sure someone else has thought of it too. It's quite simple, it effectively says "pick a value from the range except for this value" recursively but instead of dividing the range in two, it shifts the top part down first to make a contiguous range and then shifts the results which are past the split up one afterward.

I expect the phobos implementation to be something like O(M) in which case my method would be faster whenever N < sqrt(M) (with the optimisations)
« First   ‹ Prev
1 2