February 14, 2009
Denis Koroskin wrote:
> On Fri, 13 Feb 2009 19:04:54 +0300, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
> 
>> bearophile wrote:
>>> Andrei Alexandrescu>The quest for finding a random cover of an array
>>> with as little extra memory consumed and with good complexity is on!<
>>>   This thread is very long and complex, I am unable to understand where
>>> to start reading, etc. So can someone explain the problem to me, so I
>>> may be able to suggest some idea/code?
>>
>> Given an array of length m, return a range that iterates the array in random order such that the entire array is visited without going through the same element more than once. Consume minimal amounts of memory and time. Baseline solution: allocate an array of indices, shuffle it, then follow indices stored in the array.
>>
>>
>> Andrei
> 
> In ideal world I'd like to see the following permutations generator in std library, instead:
> 
> auto range = [0, 1, 2, 3, 4, 5];
> auto perms = permutations(range); // a random-access range that returns lazy permutation generators.
> auto permutation = perms[0]; // get first permutation out of range.length!
> // auto permutation = perms(uniform(0, perms.length)); // get random permutation

That can be done easily if you settle for forward iteration. The problem is that there are N! permutations of N objects, so reaching any interesting permutation will take a long time.


Andrei
February 14, 2009
Andrei Alexandrescu:
> Well we were discussing how to tackle the algorithm that does not touch the original array/range.

What I have said so far applies still: at the end instead of copying the remaining 10% of the items, create an array of the indexes of those 10%, and then shuffle that.

Bye,
bearophile
February 14, 2009
Denis Koroskin:
> In ideal world I'd like to see the following permutations generator in std library, instead:

In my dlibs there's already lazy versions for that, both lazy and eager, and more (but it's D1 code, not range-based): http://www.fantascienza.net/leonardo/so/dlibs/comb.html

Bye,
bearophile
February 14, 2009
bearophile wrote:
> Andrei Alexandrescu:
>> Well we were discussing how to tackle the algorithm that does not touch the original array/range.
> 
> What I have said so far applies still: at the end instead of copying the remaining 10% of the items, create an array of the indexes of those 10%, and then shuffle that.

I understood that. My problem is that a quadratic algorithm is applied to 0.9 of the input. That makes overall behavior quadratic even if you complete the last 10% in no time.

Andrei
February 14, 2009
Andrei Alexandrescu:
> I understood that. My problem is that a quadratic algorithm is applied to 0.9 of the input. That makes overall behavior quadratic even if you complete the last 10% in no time.

No part of the algorithm I have shown is quadratic, I think.

Bye,
bearophile
February 14, 2009
bearophile wrote:
> Andrei Alexandrescu:
>> I understood that. My problem is that a quadratic algorithm is applied to 0.9 of the input. That makes overall behavior quadratic even if you complete the last 10% in no time.
> 
> No part of the algorithm I have shown is quadratic, I think.

Your algo is:

=========
- create an array of m bits, set to zero
- Use a random generator to generate integer numbers in [0, m-1], if the corresponding big is set, extract another random number, if it's not set then set it, increment a counter, and return the array item that corresponds to the bit.
- When about 90% of the bits is set, [... alternate impl ...]
==========

Say at some point there are k available (not taken) slots out of "n". There is a k/n chance that a random selection finds an unoccupied slot. The average number of random trials needed to find an unoccupied slot is proportional to 1/(k/n) = n/k. So the total number of random trials to span the entire array is quadratic. Multiplying that by 0.9 leaves it quadratic.


Andrei
February 14, 2009
Andrei Alexandrescu:
> Say at some point there are k available (not taken) slots out of "n". There is a k/n chance that a random selection finds an unoccupied slot. The average number of random trials needed to find an unoccupied slot is proportional to 1/(k/n) = n/k. So the total number of random trials to span the entire array is quadratic. Multiplying that by 0.9 leaves it quadratic.

It's like in hashing: if you want to fill 99% of the available space in a hash, then you take ages to find empty slots.
But if you fill it only at 75-90%, then on average you need only one or two tries to find an empty slot. So your time is linear, with a small multiplicative constant. When the slots start to get mostly full, you change algorithm, copying the empty slots elsewhere.

Bye,
bearophile
February 14, 2009
bearophile wrote:
> Andrei Alexandrescu:
>> Say at some point there are k available (not taken) slots out of
>> "n". There is a k/n chance that a random selection finds an
>> unoccupied slot. The average number of random trials needed to find
>> an unoccupied slot is proportional to 1/(k/n) = n/k. So the total
>> number of random trials to span the entire array is quadratic.
>> Multiplying that by 0.9 leaves it quadratic.
> 
> It's like in hashing: if you want to fill 99% of the available space
> in a hash, then you take ages to find empty slots. But if you fill it
> only at 75-90%, then on average you need only one or two tries to
> find an empty slot. So your time is linear, with a small
> multiplicative constant. When the slots start to get mostly full, you
> change algorithm, copying the empty slots elsewhere.

Well I don't buy it. If you make a point, you need to be more precise than such hand-waving. It's not like in hashing. It's like in the algorithm we discuss. If you make a clear point that your performance is better than O(n*n) by stopping at 90% then make it. I didn't go through much formalism, but my napkin says you're firmly in quadratic territory.

Andrei
February 14, 2009
On Sat, Feb 14, 2009 at 1:03 PM, Andrei Alexandrescu < SeeWebsiteForEmail@erdani.org> wrote:

> bearophile wrote:
>
>> Andrei Alexandrescu:
>>
>>> Say at some point there are k available (not taken) slots out of "n". There is a k/n chance that a random selection finds an unoccupied slot. The average number of random trials needed to find an unoccupied slot is proportional to 1/(k/n) = n/k. So the total number of random trials to span the entire array is quadratic. Multiplying that by 0.9 leaves it quadratic.
>>>
>>
>> It's like in hashing: if you want to fill 99% of the available space in a hash, then you take ages to find empty slots. But if you fill it only at 75-90%, then on average you need only one or two tries to find an empty slot. So your time is linear, with a small multiplicative constant. When the slots start to get mostly full, you change algorithm, copying the empty slots elsewhere.
>>
>
> Well I don't buy it. If you make a point, you need to be more precise than such hand-waving. It's not like in hashing. It's like in the algorithm we discuss. If you make a clear point that your performance is better than O(n*n) by stopping at 90% then make it. I didn't go through much formalism, but my napkin says you're firmly in quadratic territory.


Well he has a point that the number of trials required to find an empty depends not on the absolute number of empty items, but only the ratio of empties to fulls.   Even your own claim about average number of trials was n/k -- not sure how you got that though.  If he stops when that reaches a maximum of 9 then the algo can't be quadratic up to that point.  It's O(n) expected, with a possibly fairly high hidden constant.  It also has a fairly long tail, as in there's a small probability of it taking a very long time to complete.

--bb


February 14, 2009
On Fri, 13 Feb 2009 16:45:21 -0800, Andrei Alexandrescu wrote:

> Steven Schveighoffer wrote:
>> Average runtime is still going to be O(n^2), because the average element chosen should be k / 2 elements into the range.
> 
> So at some point there are k slots for grabs out of n. The number of steps taken is composed of some steps taken because of the already-taken slots, plus the number of steps taken because the dice tells me to ignore the slot even if it could be taken. The first number of steps is proportional to the number of taken slots, i.e. n-k. The second number of steps is proportional to the number of options, i.e. k. If you sum you get indeed O(n) per pass.
> 
> Now let us consider the following improvement: whenever an element at the front of the array is taken, we eliminate all elements in the front that were taken already. That way there's no more need to skip each time over already-occupied slots. In other words, it is an invariant that the first slot we'll look at is not taken. How does this affect complexity?

It does not affect the complexity.  It is an optimization, but one that is proportional to n because k is proportional to n, so it gets factored out, and still becomes O(n).

even if you removed the used elements so they are not traversed, you still get O(n^2) because of the linear search.

-Steve