OT: Denis is back! :) (page 3)

December 01, 2015

Re: And here's another interesting algorithm/structure: Randomized Slide to Front

Posted by Richard Hodges
in reply to Andrei Alexandrescu

Permalink

Richard Hodges

Posted in reply to Andrei Alexandrescu

Permalink

I had a little think about the pathological case of most searches being for one of a few elements.

I'm sure my idea is not new, but it seems to me that keeping a 'hit count' for each element solves this. The decision of whether to swap then becomes:

++ Frequency[I]
if I >= Threshold1 then
  choose an index J that is [either I/2 or rand(0...I-1), depending on preference]
  if (Frequency[I] - Frequency[J]) >= Threshold2 then
    swap Item[I] and Item[J]
    swap Frequency[I] and Frequency[J]
    I = J
  endif
endif
return I

I wrote a little test in c++ (sorry guys, old habits die hard):

template<class T>
struct mrucache
{
    using vector_type  = std::vector<T>;
    using iterator = typename vector_type::iterator;
    using const_iterator = typename vector_type::const_iterator;

    void add(T item) {
        _items.push_back(std::move(item));
        _frequency.push_back(0);
    }

    template<class Pred>
    iterator find_if(Pred&& pred)
    {
        using std::begin;
        using std::end;
        auto iter = std::find_if(begin(_items),
                                 end(_items),
                                 std::forward<Pred>(pred));
        if (iter != end(_items))
        {
            auto i = std::distance(_items.begin(), iter);
            ++ _frequency[i];
            i = maybe_swap(i);
            iter = std::next(begin(_items), i);
        }
        return iter;
    }

    std::size_t maybe_swap(std::size_t i)
    {
        if (i >= closeness_threshold)
        {
            auto candidate_i = i / 2;
            if ((_frequency[i] - _frequency[candidate_i]) >= difference_threshold) {
                swap(_items[i], _items[candidate_i]);
                swap(_frequency[i], _frequency[candidate_i]);
                i = candidate_i;
            }
        }
        return i;
    }

    auto begin() const { return _items.begin(); }
    auto end() const { return _items.end(); }

    static const size_t closeness_threshold = 4;
    static const int difference_threshold = 1;

    std::vector<T> _items;
    std::vector<int> _frequency;
};

On Monday, 30 November 2015 at 21:33:31 UTC, Andrei Alexandrescu wrote: > Now that we got talking about searching in arrays, allow me to also share an idea I've had a short while ago. > > [...] Perhaps some strategy similar to Working Sets: https://en.wikipedia.org/wiki/Iacono's_working_set_structure would work (or inspired by the same idea). You move the element from where it is found to T_1, move a random element from T_1 to T_2, from T_2 to T_3 and so on to T_i. In this case rather than trees you would have lists. Maybe that has poor cache locality properties though.

On 11/30/15 5:07 PM, Andrei Alexandrescu wrote: > On 11/30/15 4:58 PM, Steven Schveighoffer wrote: >> >> What about selecting a random element in 0..k/2 instead of 0..k-1? > > I think complexity would stay the same. Choosing a tighter range puts a > greater weight on the last search than on recent searches. In the case where you search for a very small number of elements, it puts an upper bound on how soon they make it to the front (log(n) instead of n) > One thing I like is that I choose 0..k, not 0..k-1, which means it's > possible that the element stays put (no change in position). That > reduces thrashing for the top (most frequently searched) few elements. I think insignificantly, depending on the number of "frequently searched" elements. But you could tune it, let's say to 8 elements: const upperBound = max(k/2, min(8, k)); There are a lot of options for tuning that can be played with, probably the best way to "prove" what is best is to just try some experiments :) -Steve

On Monday, 30 November 2015 at 21:33:31 UTC, Andrei Alexandrescu wrote: > Now that we got talking about searching in arrays, allow me to also share an idea I've had a short while ago. I wrote a range-based implementation to see how it would look like. https://gist.github.com/JakobOvrum/45a37f55ba5c9a7501d6 The tests are very thin but it seems to do the trick.

Forums