February 13, 2009
Andrei Alexandrescu Wrote:

> Andrei Alexandrescu wrote:
> > Andrei Alexandrescu wrote:
> >> Given an array of length m, return a range that iterates the array in random order such that the entire array is visited without going through the same element more than once. Consume minimal amounts of memory and time. Baseline solution: allocate an array of indices, shuffle it, then follow indices stored in the array.
> > 
> > Ok, here's something that should work.
> > 
> > Start with array a of length a.length, allocate a bitmap with one bit per element in the map. Also the number of unselected elements left in the array is kept, call it k.
> > 
> > To implement next():
> > 
> > 1. Walk from the beginning of the array to the first unselected element.
> > 
> > 2. Roll a fair dice with k faces. If the dice chooses a specific face, choose that first unselected element. Otherwise continue.
> > 
> > 3. Continue walking until the next unselected element.
> > 
> > 4. Roll a fair dice with k-1 faces. If the dice chooses a specific face, choose that unselected element. Otherwise continue from step 3 using dices with k-1, k-2, ..., 1 faces.
> > 
> > This has O(n log n) complexity. There is one obvious optimization: eliminate the first selected elements so we don't need to walk over them at each iteration. It's unclear to me how this affects complexity.
> > 
> > 
> > Andrei
> 
> This seems to work for forward ranges as well. I'll be darned.
> 
> Andrei

Random ranges and forward ranges should have different implementations. Most notably, random ranges can avoid a lot of unnecessary calls to next.

February 13, 2009
dsimcha Wrote:

> == Quote from Andrei Alexandrescu (SeeWebsiteForEmail@erdani.org)'s article
> > bearophile wrote:
> > > Andrei Alexandrescu>The quest for finding a random cover of an array with as little extra memory consumed and with good complexity is on!<
> > >
> > >
> > > This thread is very long and complex, I am unable to understand where to start reading, etc. So can someone explain the problem to me, so I may be able to suggest some idea/code?
> > Given an array of length m, return a range that iterates the array in
> > random order such that the entire array is visited without going through
> > the same element more than once. Consume minimal amounts of memory and
> > time. Baseline solution: allocate an array of indices, shuffle it, then
> > follow indices stored in the array.
> > Andrei
> 
> One important thing to add, since we already have a solution that would be good except for this requirement:
> 
> The frequency with which each of the N! permutations of each array occur must be uniform.
> 

Believe it or not, that's an unreasonable requirement. IIRC, most RNG's have periods such as 2^32. You can't get more permutations than that. Somewhere in the ballpark of 17 elements, your requurement becones impossible to meet without a highly specialized RNG. I hope someone can prove me wrong about that!
February 13, 2009
On Sat, 14 Feb 2009 02:34:48 +0300, Jason House <jason.james.house@gmail.com> wrote:

> dsimcha Wrote:
>
>> == Quote from Andrei Alexandrescu (SeeWebsiteForEmail@erdani.org)'s article
>> > bearophile wrote:
>> > > Andrei Alexandrescu>The quest for finding a random cover of an array
>> > > with as little extra memory consumed and with good complexity is  
>> on!<
>> > >
>> > >
>> > > This thread is very long and complex, I am unable to understand  
>> where
>> > > to start reading, etc. So can someone explain the problem to me, so  
>> I
>> > > may be able to suggest some idea/code?
>> > Given an array of length m, return a range that iterates the array in
>> > random order such that the entire array is visited without going  
>> through
>> > the same element more than once. Consume minimal amounts of memory and
>> > time. Baseline solution: allocate an array of indices, shuffle it,  
>> then
>> > follow indices stored in the array.
>> > Andrei
>>
>> One important thing to add, since we already have a solution that would be good except for this
>> requirement:
>>
>> The frequency with which each of the N! permutations of each array occur must be uniform.
>>
>
> Believe it or not, that's an unreasonable requirement. IIRC, most RNG's have periods such as 2^32. You can't get more permutations than that. Somewhere in the ballpark of 17 elements, your requurement becones impossible to meet without a highly specialized RNG. I hope someone can prove me wrong about that!

You are wrong!

It's 13 elements :p

February 13, 2009
Jason House:
> Believe it or not, that's an unreasonable requirement. IIRC, most RNG's have periods such as 2^32. You can't get more permutations than that. Somewhere in the ballpark of 17 elements, your requurement becones impossible to meet without a highly specialized RNG. I hope someone can prove me wrong about that!<

Mersenne Twister has a period much longer, if you want.
For even longer permutations you have to generate them in a different way.

Bye,
bearophile
February 14, 2009
Steven Schveighoffer wrote:
> "Steven Schveighoffer" wrote
>> "Andrei Alexandrescu" wrote
>>> Andrei Alexandrescu wrote:
>>>> Andrei Alexandrescu wrote:
>>>>> Given an array of length m, return a range that iterates the array in random order such that the entire array is visited without going through the same element more than once. Consume minimal amounts of memory and time. Baseline solution: allocate an array of indices, shuffle it, then follow indices stored in the array.
>>>> Ok, here's something that should work.
>>>>
>>>> Start with array a of length a.length, allocate a bitmap with one bit per element in the map. Also the number of unselected elements left in the array is kept, call it k.
>>>>
>>>> To implement next():
>>>>
>>>> 1. Walk from the beginning of the array to the first unselected element.
>>>>
>>>> 2. Roll a fair dice with k faces. If the dice chooses a specific face, choose that first unselected element. Otherwise continue.
>>>>
>>>> 3. Continue walking until the next unselected element.
>>>>
>>>> 4. Roll a fair dice with k-1 faces. If the dice chooses a specific face, choose that unselected element. Otherwise continue from step 3 using dices with k-1, k-2, ..., 1 faces.
>>>>
>>>> This has O(n log n) complexity. There is one obvious optimization: eliminate the first selected elements so we don't need to walk over them at each iteration. It's unclear to me how this affects complexity.
>>
>> This significantly increases the runtime by executing the random number algorithm up to k-1 times per element.
>>
>> How is this different than just a linear search through the sparse array for the random(0..k)th unused element?
>>
>> But I do concur that it provides a uniform map.  However, wouldn't the worst runtime be O(n^2)?  i.e. k + k-1 + k-2 + k-3... + 1 iterations == k *(k+1) / 2?  I'm not sure if average runtime is going to be O(nlgn).
> 
> Average runtime is still going to be O(n^2), because the average element chosen should be k / 2 elements into the range.

So at some point there are k slots for grabs out of n. The number of steps taken is composed of some steps taken because of the already-taken slots, plus the number of steps taken because the dice tells me to ignore the slot even if it could be taken. The first number of steps is proportional to the number of taken slots, i.e. n-k. The second number of steps is proportional to the number of options, i.e. k. If you sum you get indeed O(n) per pass.

Now let us consider the following improvement: whenever an element at the front of the array is taken, we eliminate all elements in the front that were taken already. That way there's no more need to skip each time over already-occupied slots. In other words, it is an invariant that the first slot we'll look at is not taken. How does this affect complexity?


Andrei
February 14, 2009
Yigal Chripun wrote:
> Andrei Alexandrescu wrote:
>> Andrei Alexandrescu wrote:
>>> Andrei Alexandrescu wrote:
>>>> Given an array of length m, return a range that iterates the array in
>>>> random order such that the entire array is visited without going
>>>> through the same element more than once. Consume minimal amounts of
>>>> memory and time. Baseline solution: allocate an array of indices,
>>>> shuffle it, then follow indices stored in the array.
>>>
>>> Ok, here's something that should work.
>>>
>>> Start with array a of length a.length, allocate a bitmap with one bit
>>> per element in the map. Also the number of unselected elements left in
>>> the array is kept, call it k.
>>>
>>> To implement next():
>>>
>>> 1. Walk from the beginning of the array to the first unselected element.
>>>
>>> 2. Roll a fair dice with k faces. If the dice chooses a specific face,
>>> choose that first unselected element. Otherwise continue.
>>>
>>> 3. Continue walking until the next unselected element.
>>>
>>> 4. Roll a fair dice with k-1 faces. If the dice chooses a specific
>>> face, choose that unselected element. Otherwise continue from step 3
>>> using dices with k-1, k-2, ..., 1 faces.
>>>
>>> This has O(n log n) complexity. There is one obvious optimization:
>>> eliminate the first selected elements so we don't need to walk over
>>> them at each iteration. It's unclear to me how this affects complexity.
>>>
>>>
>>> Andrei
>>
>> This seems to work for forward ranges as well. I'll be darned.
>>
>> Andrei
> 
> what do you mean by "If the dice chooses a specific face"?
> I don't understand this wording..

If you have a dice with 6 faces (the classic dice), a "specific face" means you bet the dice will show e.g. "1" after rolling.

Andrei
February 14, 2009
bearophile wrote:
> Andrei Alexandrescu:
>> Given an array of length m, return a range that iterates the array in random order such that the entire array is visited without going through the same element more than once. Consume minimal amounts of memory and time. Baseline solution: allocate an array of indices, shuffle it, then follow indices stored in the array.
> 
> I have faced such problem more than once, and I have tried to solve it in various ways. The faster solution was:
> - create an array of m bits, set to zero
> - Use a random generator to generate integer numbers in [0, m-1], if the corresponding big is set, extract another random number, if it's not set then set it, increment a counter, and return the array item that corresponds to the bit.
> - When about 90% of the bits is set, create an array with the remaining 10% items of the array, shuffle them in place with the usual algorithm by Knuth, and return them.

A quadratic algorithm applied to a fixed fraction of the input size will still yield quadratic behavior. I understand how your solution may be practical for a variety of needs, but if something is to be put in Phobos, we need something principled.

Andrei
February 14, 2009
Andrei Alexandrescu:
> A quadratic algorithm applied to a fixed fraction of the input size will still yield quadratic behavior.

It's not quadratic, an in-place shuffling done right, by Knuth algorithm, is linear. I have a linear shuffling in my dlibs too.
http://en.wikipedia.org/wiki/Shuffling
(And my name is bearophile, thank you).

Bye,
bearophile
February 14, 2009
bearophile wrote:
> Andrei Alexandrescu:
>> A quadratic algorithm applied to a fixed fraction of the input size will still yield quadratic behavior.
> 
> It's not quadratic, an in-place shuffling done right, by Knuth algorithm, is linear. I have a linear shuffling in my dlibs too.
> http://en.wikipedia.org/wiki/Shuffling

Well we were discussing how to tackle the algorithm that does not touch the original array/range.

Andrei
February 14, 2009
On Fri, 13 Feb 2009 19:04:54 +0300, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:

> bearophile wrote:
>> Andrei Alexandrescu>The quest for finding a random cover of an array
>> with as little extra memory consumed and with good complexity is on!<
>>   This thread is very long and complex, I am unable to understand where
>> to start reading, etc. So can someone explain the problem to me, so I
>> may be able to suggest some idea/code?
>
> Given an array of length m, return a range that iterates the array in random order such that the entire array is visited without going through the same element more than once. Consume minimal amounts of memory and time. Baseline solution: allocate an array of indices, shuffle it, then follow indices stored in the array.
>
>
> Andrei

In ideal world I'd like to see the following permutations generator in std library, instead:

auto range = [0, 1, 2, 3, 4, 5];
auto perms = permutations(range); // a random-access range that returns lazy permutation generators.
auto permutation = perms[0]; // get first permutation out of range.length!
// auto permutation = perms(uniform(0, perms.length)); // get random permutation

// print all possible permutations of a given array
foreach (p; perms) {

  // print a given permutation
  foreach (e; p) {
    write(e, " ");
  }

  writeln(); // new line
}

BTW, I think it's ok to have permutation generator that consumes O(N) memory and runs at constant time.
It would be totally awesome if generator would accept temporary buffer as an optional parameter:

auto range = ...;
auto buffer = cast(int*)alloca(int.sizeof * range.length)[0..range.length];
auto perms = permutations(range, buffer);
// same code afterwards