Jump to page: 1 2 3
Thread overview
And here's another interesting algorithm/structure: Randomized Slide to Front
Nov 30, 2015
H. S. Teoh
Nov 30, 2015
H. S. Teoh
Nov 30, 2015
Chris Wright
Nov 30, 2015
deadalnix
Nov 30, 2015
Denis Koroskin
OT: Denis is back! :)
Nov 30, 2015
Ali Çehreli
Dec 01, 2015
Iain Buclaw
Nov 30, 2015
deadalnix
Nov 30, 2015
H. S. Teoh
Dec 01, 2015
Chris Wright
Dec 01, 2015
Marcelo Juchem
Dec 01, 2015
Richard Hodges
Dec 01, 2015
CraigDillabaugh
Dec 08, 2015
Jakob Ovrum
November 30, 2015
Now that we got talking about searching in arrays, allow me to also share an idea I've had a short while ago.

(Again, we're in the "I'd prefer to use an array if at all possible" mindset. So let's see how we can help searching an array with as little work as possible.)

One well-known search strategy is "Bring to front" (described by Knuth in TAoCP). A BtF-organized linear data structure is searched with the classic linear algorithm. The difference is what happens after the search: whenever the search is successful, the found element is brought to the front of the structure. If we're looking most often for a handful of elements, in time these will be near the front of the searched structure.

For a linked list, bringing an element to the front is O(1) (just rewire the pointers). For an array, things are not so pleasant - rotating the found element to the front of the array is O(n).

So let's see how we can implement a successful BtF for arrays.

The first idea is to just swap the found element with the first element of the array. That's O(1) but has many disadvantages - if you search e.g. for two elements, they'll compete for the front of the array and they'll go back and forth without making progress.

Another idea is to just swap the found element with the one just before it. The logic is, each successful find will shift the element closer to the front, in a bubble sort manner. In time, the frequently searched elements will slowly creep toward the front. The resulting performance is not appealing - you need O(n) searches to bring a given element to the front, for a total of O(n * n) steps spent in the n searches. Meh.

So let's improve on that: whenever an element is found in position k, pick a random number i in the range 0, 1, 2, ..., k inclusive. Then swap the array elements at indexes i and k. This is the Randomized Slide to Front strategy.

With RStF, worst case search time remains O(n), as is the unsuccessful search. However, frequently searched elements migrate quickly to the front - it only takes O(log n) searches to bring a given value at the front of the array.

Insertion and removal are both a sweet O(1), owing to the light structuring: to insert just append the element (and perhaps swap it in a random position of the array to prime searching for it). Removal by position simply swaps the last element into the position to be removed and then reduces the size of the array.

So the RStF is suitable in all cases where BtF would be recommended, but allows an array layout without considerable penalty.

Related work: Theodoulos Garefalakis' Master's thesis "A Family of Randomized Algorithms for List Accessing" describes Markov Move to Front, which brings the searched element to front according to a Markov chain schedule; and also Randomized Move to Front, which decides whether a found element is brought to front depending on a random choice. These approaches are similar in that they both use randomization, but different because neither has good complexity on array storage.


Andrei
November 30, 2015
On 11/30/15 4:33 PM, Andrei Alexandrescu wrote:
[snip]

I just posted to reddit: https://www.reddit.com/r/programming/comments/3uwp42/its_my_birthday_so_heres_some_cake_for_thought/

Andrei
November 30, 2015
On Mon, Nov 30, 2015 at 04:33:27PM -0500, Andrei Alexandrescu via Digitalmars-d wrote: [...]
> One well-known search strategy is "Bring to front" (described by Knuth in TAoCP). A BtF-organized linear data structure is searched with the classic linear algorithm. The difference is what happens after the search: whenever the search is successful, the found element is brought to the front of the structure. If we're looking most often for a handful of elements, in time these will be near the front of the searched structure.
[...]
> So let's see how we can implement a successful BtF for arrays.
> 
> The first idea is to just swap the found element with the first element of the array. That's O(1) but has many disadvantages - if you search e.g. for two elements, they'll compete for the front of the array and they'll go back and forth without making progress.
> 
> Another idea is to just swap the found element with the one just before it.  The logic is, each successful find will shift the element closer to the front, in a bubble sort manner. In time, the frequently searched elements will slowly creep toward the front. The resulting performance is not appealing - you need O(n) searches to bring a given element to the front, for a total of O(n * n) steps spent in the n searches. Meh.
> 
> So let's improve on that: whenever an element is found in position k, pick a random number i in the range 0, 1, 2, ..., k inclusive. Then swap the array elements at indexes i and k. This is the Randomized Slide to Front strategy.

What about when element i is matched, swap it with the (i/2)'th element?

Then it will take just log(n) searches to bring it to the front of the array, but it won't (immediately) compete with whatever's currently the most popular item in the array. Furthermore, when it does compete with it, it will already have been moved closer to the front of the array, so the previous most-popular element won't be pushed far back into the deep recesses of the array, but remain close to the front where it will be quickly found.

More generally, you could pick the (i/k)'th element for swapping when you've matched an element, with k being a constant, chosen parameter. You may be able to optimize for certain use cases (e.g., if you knew beforehand the average number of "popular" elements) by choosing an appropriate value of k.


T

-- 
Nobody is perfect.  I am Nobody. -- pepoluan, GKC forum
November 30, 2015
On 11/30/15 4:41 PM, H. S. Teoh via Digitalmars-d wrote:
> What about when element i is matched, swap it with the (i/2)'th element?

Randomization is essential - without it you have thrashing if you search for 2 elements in alternation. -- Andrei

November 30, 2015
On Monday, 30 November 2015 at 21:33:31 UTC, Andrei Alexandrescu wrote:
> Now that we got talking about searching in arrays, allow me to also share an idea I've had a short while ago.
>
> (Again, we're in the "I'd prefer to use an array if at all possible" mindset. So let's see how we can help searching an array with as little work as possible.)
>
> One well-known search strategy is "Bring to front" (described by Knuth in TAoCP). A BtF-organized linear data structure is searched with the classic linear algorithm. The difference is what happens after the search: whenever the search is successful, the found element is brought to the front of the structure. If we're looking most often for a handful of elements, in time these will be near the front of the searched structure.
>
> For a linked list, bringing an element to the front is O(1) (just rewire the pointers). For an array, things are not so pleasant - rotating the found element to the front of the array is O(n).
>
> So let's see how we can implement a successful BtF for arrays.
>
> The first idea is to just swap the found element with the first element of the array. That's O(1) but has many disadvantages - if you search e.g. for two elements, they'll compete for the front of the array and they'll go back and forth without making progress.
>
> Another idea is to just swap the found element with the one just before it. The logic is, each successful find will shift the element closer to the front, in a bubble sort manner. In time, the frequently searched elements will slowly creep toward the front. The resulting performance is not appealing - you need O(n) searches to bring a given element to the front, for a total of O(n * n) steps spent in the n searches. Meh.
>
> So let's improve on that: whenever an element is found in position k, pick a random number i in the range 0, 1, 2, ..., k inclusive. Then swap the array elements at indexes i and k. This is the Randomized Slide to Front strategy.
>
> With RStF, worst case search time remains O(n), as is the unsuccessful search. However, frequently searched elements migrate quickly to the front - it only takes O(log n) searches to bring a given value at the front of the array.
>
> Insertion and removal are both a sweet O(1), owing to the light structuring: to insert just append the element (and perhaps swap it in a random position of the array to prime searching for it). Removal by position simply swaps the last element into the position to be removed and then reduces the size of the array.
>
> So the RStF is suitable in all cases where BtF would be recommended, but allows an array layout without considerable penalty.
>
> Related work: Theodoulos Garefalakis' Master's thesis "A Family of Randomized Algorithms for List Accessing" describes Markov Move to Front, which brings the searched element to front according to a Markov chain schedule; and also Randomized Move to Front, which decides whether a found element is brought to front depending on a random choice. These approaches are similar in that they both use randomization, but different because neither has good complexity on array storage.
>
>
> Andrei

What is the advantage compared to let's say a ringbuffer ? On find you can put the element to the front, and swap the old element with the new element ?

I guess randomizing would avoid hitting pathological cases too often, but would converge more slowly ?
November 30, 2015
On Mon, Nov 30, 2015 at 01:41:12PM -0800, H. S. Teoh via Digitalmars-d wrote: [...]
> What about when element i is matched, swap it with the (i/2)'th
> element?
> 
> Then it will take just log(n) searches to bring it to the front of the array, but it won't (immediately) compete with whatever's currently the most popular item in the array. Furthermore, when it does compete with it, it will already have been moved closer to the front of the array, so the previous most-popular element won't be pushed far back into the deep recesses of the array, but remain close to the front where it will be quickly found.

In fact, it's probably provable that if there are 2 most popular items in the array, they will eventually migrate to the 1st two positions of the array. Not so sure about the case of n most popular items for n>2, as position 3 is a kind of odd case where it gets displaced only by elements from indices that aren't a power of 2, but it would seem at a cursory glance that the 3 most popular items would tend to settle around the first 4 elements of the array.

Hmm... it seems that in the worst case (the most popular n items all lie precisely at indices of the form 2^j) the most popular items will end up within the first 2^n positions of the array. Not sure how to compute the average case; intuitively at least it seems that it should lie somewhere between the first n positions and the first 2^n positions.


T

-- 
Любишь кататься - люби и саночки возить.
November 30, 2015
On 11/30/15 4:50 PM, Andrei Alexandrescu wrote:
> On 11/30/15 4:41 PM, H. S. Teoh via Digitalmars-d wrote:
>> What about when element i is matched, swap it with the (i/2)'th element?
>
> Randomization is essential - without it you have thrashing if you search
> for 2 elements in alternation. -- Andrei
>

What about selecting a random element in 0..k/2 instead of 0..k-1?

-Steve
November 30, 2015
On 11/30/15 4:53 PM, H. S. Teoh via Digitalmars-d wrote:
> On Mon, Nov 30, 2015 at 01:41:12PM -0800, H. S. Teoh via Digitalmars-d wrote:
> [...]
>> What about when element i is matched, swap it with the (i/2)'th
>> element?
>>
>> Then it will take just log(n) searches to bring it to the front of the
>> array, but it won't (immediately) compete with whatever's currently
>> the most popular item in the array. Furthermore, when it does compete
>> with it, it will already have been moved closer to the front of the
>> array, so the previous most-popular element won't be pushed far back
>> into the deep recesses of the array, but remain close to the front
>> where it will be quickly found.
>
> In fact, it's probably provable that if there are 2 most popular items
> in the array, they will eventually migrate to the 1st two positions of
> the array. Not so sure about the case of n most popular items for n>2,
> as position 3 is a kind of odd case where it gets displaced only by
> elements from indices that aren't a power of 2, but it would seem at a
> cursory glance that the 3 most popular items would tend to settle around
> the first 4 elements of the array.
>
> Hmm... it seems that in the worst case (the most popular n items all lie
> precisely at indices of the form 2^j) the most popular items will end up
> within the first 2^n positions of the array. Not sure how to compute the
> average case; intuitively at least it seems that it should lie somewhere
> between the first n positions and the first 2^n positions.

With RStF it's easy to prove (e.g. by reductio ad absurdum) that if you search only for k items out of n, they will end up in the top k positions of the array. Then they'll churn there :o). The key to the proof is that in the stationary state no element migrates in our out of the top k slots. I think it would be difficult to achieve this property with a deterministic approach.

The more interesting question would be what the element distribution is if both elements and searches are Gaussian-distributed (probably a frequent case in practice).


Andrei

November 30, 2015
On Mon, Nov 30, 2015 at 04:58:16PM -0500, Steven Schveighoffer via Digitalmars-d wrote:
> On 11/30/15 4:50 PM, Andrei Alexandrescu wrote:
> >On 11/30/15 4:41 PM, H. S. Teoh via Digitalmars-d wrote:
> >>What about when element i is matched, swap it with the (i/2)'th element?
> >
> >Randomization is essential - without it you have thrashing if you search for 2 elements in alternation. -- Andrei
> >
> 
> What about selecting a random element in 0..k/2 instead of 0..k-1?
[...]

Or selecting the (i/k)'th element for k = uniform(1..i)?


T

-- 
People walk. Computers run.
November 30, 2015
On 11/30/15 4:55 PM, deadalnix wrote:
> I guess randomizing would avoid hitting pathological cases too often,
> but would converge more slowly ?

That's it. Problem is with deterministic approaches pathological cases are easy to hit and relatively common. -- Andrei
« First   ‹ Prev
1 2 3