Jump to page: 1 24  
Page
Thread overview
An interesting data structure with search time O(sqrt n)
Nov 30, 2015
H. S. Teoh
Nov 30, 2015
H. S. Teoh
Dec 01, 2015
Timon Gehr
Dec 01, 2015
Timon Gehr
Dec 02, 2015
Navin
Dec 02, 2015
Chris Wright
Dec 02, 2015
Timon Gehr
Dec 02, 2015
Timon Gehr
Dec 02, 2015
Timon Gehr
Dec 04, 2015
Timon Gehr
Implementation
Dec 10, 2015
Jason Causey
Nov 30, 2015
Dmitry Olshansky
Nov 30, 2015
Torin
Nov 30, 2015
Titus Nicolae
Nov 30, 2015
Torin
Nov 30, 2015
Titus Nicolae
Dec 01, 2015
Timon Gehr
Dec 01, 2015
Sriram Srinivasan
Dec 01, 2015
Emil Kirschner
Dec 01, 2015
Marko Mikulicic
Dec 02, 2015
rsw0x
November 30, 2015
Okasaki's book is a continued inspiration of data structures and algorithms.

I was thinking along the following lines: typical collections are searchable in linear time. Then we have highly structured collections that feature logarithmic search time. But there seems to be nothing in between. So I was thinking, what would be a data structure that allows O(sqrt n) search times?

After a little tinkering, here's what I came up with.

Start with a simple array of data. Then mentally decompose that array into a concatenation of smaller arrays: first has size 1, second has size 4, third has size 9, then 16, 25, 36, ... Generally the size of these imaginary subarrays grows quadratically. And they're adjacent to each other. The last array may be incomplete.

Example: we decompose an array of size 35 into: an array of size 1 followed by arrays of sizes 4, 9, 16, 5 (last being an incomplete fragment of an array of size 25).

Now each of these arrays we organize as a max heap. Moreover, we arrange data such that the maximums of these heaps are in INCREASING order. That means the smallest element of the entire (initial) array is at the first position, then followed by the next 4 smallest organized in a max heap, followed by the next 9 smallest organized in a max heap, ... etc. There are a total of O(sqrt n) heaps, and each has O(sqrt n) elements.

That's the layout! Now, to search we look at one heap at a time. Whenever the maximum element (first element in the subarray) is smaller than the searched value, we skip over that entire heap and go to the next one. In the worst case we'll skip O(sqrt n) heaps. When the max value in a heap is less than the searched element, we found the heap and we run a linear search among O(sqrt n) elements.

The structure is nice for sorting and in particular parallel sorting because each subarray can be sorted independently - there's no migration into or out of each subarray. Inside each subarray, of course heapsort would be a good choice because it takes advantage of the existing max heap. 	

I haven't worked out the math for insertion and deletion yet, but they seem to also be around O(sqrt n) or O(log(n) * sqrt(n)). Just wanted to share with you and discuss what seems to be an interesting data structure.

Please share any thoughts!


Andrei
November 30, 2015
On Mon, Nov 30, 2015 at 03:13:11PM -0500, Andrei Alexandrescu via Digitalmars-d wrote:
> Okasaki's book is a continued inspiration of data structures and algorithms.
> 
> I was thinking along the following lines: typical collections are searchable in linear time. Then we have highly structured collections that feature logarithmic search time. But there seems to be nothing in between. So I was thinking, what would be a data structure that allows O(sqrt n) search times?
> 
> After a little tinkering, here's what I came up with.

Interesting indeed.

It leaves me wondering, though, what's the point of having an O(sqrt n) collection? What are the advantages?  Why would I use this structure instead of, say, a traditional array heap with O(log n) search time?


T

-- 
Why are you blatanly misspelling "blatant"? -- Branden Robinson
November 30, 2015
On 30-Nov-2015 23:13, Andrei Alexandrescu wrote:
> Okasaki's book is a continued inspiration of data structures and
> algorithms.
>
> I was thinking along the following lines: typical collections are
> searchable in linear time. Then we have highly structured collections
> that feature logarithmic search time. But there seems to be nothing in
> between. So I was thinking, what would be a data structure that allows
> O(sqrt n) search times?
>
> After a little tinkering, here's what I came up with.
>
> Start with a simple array of data. Then mentally decompose that array
> into a concatenation of smaller arrays: first has size 1, second has
> size 4, third has size 9, then 16, 25, 36, ... Generally the size of
> these imaginary subarrays grows quadratically. And they're adjacent to
> each other. The last array may be incomplete.
>
> Example: we decompose an array of size 35 into: an array of size 1
> followed by arrays of sizes 4, 9, 16, 5 (last being an incomplete
> fragment of an array of size 25).
>
> Now each of these arrays we organize as a max heap. Moreover, we arrange
> data such that the maximums of these heaps are in INCREASING order. That
> means the smallest element of the entire (initial) array is at the first
> position, then followed by the next 4 smallest organized in a max heap,
> followed by the next 9 smallest organized in a max heap, ... etc. There
> are a total of O(sqrt n) heaps, and each has O(sqrt n) elements.
>
> That's the layout! Now, to search we look at one heap at a time.
> Whenever the maximum element (first element in the subarray) is smaller
> than the searched value, we skip over that entire heap and go to the
> next one. In the worst case we'll skip O(sqrt n) heaps. When the max
> value in a heap is less than the searched element, we found the heap and
> we run a linear search among O(sqrt n) elements.

Reminds me of Van Emde Boas layout which is however fractal in nature: sqrt(N) pieces each having sqrt(N) element are subdivided again into sqrt(sqrt(N)) pieces and so on.

Not sure if you have seen, but see also cache-oblivious data-structures:
http://erikdemaine.org/papers/BRICS2002/paper.pdf


-- 
Dmitry Olshansky
November 30, 2015
On 11/30/15 3:20 PM, H. S. Teoh via Digitalmars-d wrote:
> On Mon, Nov 30, 2015 at 03:13:11PM -0500, Andrei Alexandrescu via Digitalmars-d wrote:
>> Okasaki's book is a continued inspiration of data structures and
>> algorithms.
>>
>> I was thinking along the following lines: typical collections are
>> searchable in linear time. Then we have highly structured collections
>> that feature logarithmic search time. But there seems to be nothing in
>> between. So I was thinking, what would be a data structure that allows
>> O(sqrt n) search times?
>>
>> After a little tinkering, here's what I came up with.
>
> Interesting indeed.
>
> It leaves me wondering, though, what's the point of having an O(sqrt n)
> collection? What are the advantages?  Why would I use this structure
> instead of, say, a traditional array heap with O(log n) search time?

(Heaps offer only linear search time. You may take advantage of the heap structure to skip portions of the array, but on average and in the worst case the search is still O(n). So I assume you meant "sorted array or one of the classic search trees".)

The motivation starts with a desire to use arrays as the fundamental layout. There have been many trends in computing as of late, among which: cache hierarchies are here to stay and contiguous layout is preferable.

The short of it is, arrays are king. No two ways about it - following pointers is a losing strategy when there's an alternative. A variety of new data structures (Clojure's arrays, heaps with tombstones) avoid classic pointer-based data structures in favor of adding structure on top of arrays.

So now if we consider thinking, "how do we organize an array for good search performance" we have a spectrum. Generally we also care about insertion and removal.

At one end of the spectrum there's doing nothing at all - that means O(1) build (nothing to do), O(n) search, O(1) insert (just append it), O(n) removal. Not very nice.

At the other end, the absolute best structuring on top of an array for search is sorting. With sorting you get great O(log n) search performance. But the others are not nice - O(n log n) build, O(n) add, O(n) remove.

So now consider my square heaps. We have O(n) build time (just a bunch of heapifications) and O(sqrt n) search. Then (again I haven't worked out the math yet) let's assume insertion and removal are both O(sqrt n). Then you get something less structured than full sorting, but also just good enough to offer the same complexity for each of search, insert, and delete. That would be pretty neat.


Andrei

November 30, 2015
On 11/30/15 3:29 PM, Dmitry Olshansky wrote:
> Reminds me of Van Emde Boas layout which is however fractal in nature:
> sqrt(N) pieces each having sqrt(N) element are subdivided again into
> sqrt(sqrt(N)) pieces and so on.
>
> Not sure if you have seen, but see also cache-oblivious data-structures:
> http://erikdemaine.org/papers/BRICS2002/paper.pdf

Thanks, I'll look these up! -- Andrei
November 30, 2015
On Mon, Nov 30, 2015 at 03:57:24PM -0500, Andrei Alexandrescu via Digitalmars-d wrote:
> On 11/30/15 3:20 PM, H. S. Teoh via Digitalmars-d wrote:
> >On Mon, Nov 30, 2015 at 03:13:11PM -0500, Andrei Alexandrescu via Digitalmars-d wrote:
> >>Okasaki's book is a continued inspiration of data structures and algorithms.
> >>
> >>I was thinking along the following lines: typical collections are searchable in linear time. Then we have highly structured collections that feature logarithmic search time. But there seems to be nothing in between. So I was thinking, what would be a data structure that allows O(sqrt n) search times?
> >>
> >>After a little tinkering, here's what I came up with.
> >
> >Interesting indeed.
> >
> >It leaves me wondering, though, what's the point of having an O(sqrt n) collection? What are the advantages?  Why would I use this structure instead of, say, a traditional array heap with O(log n) search time?
> 
> (Heaps offer only linear search time. You may take advantage of the heap structure to skip portions of the array, but on average and in the worst case the search is still O(n). So I assume you meant "sorted array or one of the classic search trees".)

Right, I wrote that without thinking. :-P


> The motivation starts with a desire to use arrays as the fundamental layout.  There have been many trends in computing as of late, among which: cache hierarchies are here to stay and contiguous layout is preferable.
> 
> The short of it is, arrays are king. No two ways about it - following pointers is a losing strategy when there's an alternative. A variety of new data structures (Clojure's arrays, heaps with tombstones) avoid classic pointer-based data structures in favor of adding structure on top of arrays.

I'm not so sure about that. At the micro level, yes, following pointers for a tree / graph / etc., that could easily fit in a small/medium sized array is a losing strategy, due to cache misses, hardware prefetchers, etc.. When you're dealing with larger amounts of data, though, things become less clear-cut. If your array size is on the order of MB's or GB's, pointers start looking a lot more attractive. IMO in the long run what will win is data structures that use tree or pointer based implementations in the large scale, but built on cache-friendly array-based structures below a certain level of granularity.

I agree with you, though, that array-based structures of intermediate big-O complexities are very interesting. If you can come up with an array structure that has the same complexity for all common operations, that could be useful as a quick-fix drop in solution in cases where top performance isn't required, but you'd like something better than O(n) array scanning.


T

-- 
Philosophy: how to make a career out of daydreaming.
November 30, 2015
On Monday, 30 November 2015 at 20:13:15 UTC, Andrei Alexandrescu wrote:
> Now each of these arrays we organize as a max heap. Moreover, we arrange data such that the maximums of these heaps are in INCREASING order. That means the smallest element of the entire (initial) array is at the first position, then followed by the next 4 smallest organized in a max heap, followed by the next 9 smallest organized in a max heap, ... etc. There are a total of O(sqrt n) heaps, and each has O(sqrt n) elements.

Isn't the size of each heap a little larger than O(sqrt n)? The total number of elements you can hold in k heaps is equal to the kth square pyramidal number, which is of size O(k^3), so the largest heap is of size k^2 = O(n^(2/3))
November 30, 2015
On Monday, 30 November 2015 at 20:13:15 UTC, Andrei Alexandrescu wrote:
...
> smallest organized in a max heap, ... etc. There are a total of O(sqrt n) heaps, and each has O(sqrt n) elements.
...

Hi Andrei,
If I'm not mistaken, the number of heaps is proportional to n^(1/3) not n^(1/2)
1+2^2+..+n^2 is proportional to n^3
https://en.wikipedia.org/wiki/Square_pyramidal_number
If n heaps have n^3 elements, rewrite n^3=m, then m elements will be stored in m^(1/3) heaps
Interesting data structure! need to look over the details some more
Titus
November 30, 2015
On Monday, 30 November 2015 at 23:19:44 UTC, Titus Nicolae wrote:
> On Monday, 30 November 2015 at 20:13:15 UTC, Andrei Alexandrescu wrote:
> ...
>> smallest organized in a max heap, ... etc. There are a total of O(sqrt n) heaps, and each has O(sqrt n) elements.
> ...
>
> Hi Andrei,
> If I'm not mistaken, the number of heaps is proportional to n^(1/3) not n^(1/2)
> 1+2^2+..+n^2 is proportional to n^3
> https://en.wikipedia.org/wiki/Square_pyramidal_number
> If n heaps have n^3 elements, rewrite n^3=m, then m elements will be stored in m^(1/3) heaps
> Interesting data structure! need to look over the details some more
> Titus

I think Titus and I noticed the same thing at the same time.

You could achieve O(sqrt n) lookup time by having each heap contain ceil(sqrt(n)) elements (except for the last one), giving you O(sqrt n) heaps of size O(sqrt n), but it may make insertion trickier as you try to keep the heaps the same size.
November 30, 2015
On Monday, 30 November 2015 at 23:19:44 UTC, Titus Nicolae wrote:
> On Monday, 30 November 2015 at 20:13:15 UTC, Andrei Alexandrescu wrote:
> ...
>> smallest organized in a max heap, ... etc. There are a total of O(sqrt n) heaps, and each has O(sqrt n) elements.
> ...
>
> Hi Andrei,
> If I'm not mistaken, the number of heaps is proportional to n^(1/3) not n^(1/2)
> 1+2^2+..+n^2 is proportional to n^3
> https://en.wikipedia.org/wiki/Square_pyramidal_number
> If n heaps have n^3 elements, rewrite n^3=m, then m elements will be stored in m^(1/3) heaps
> Interesting data structure! need to look over the details some more
> Titus

This might give a better balanced structure.

Sum of odd numbers 1+3+5+..+(2n-1) = n^2
For N total elements, there are sqrt(n) heaps, and 2*sqrt(n) elements in the largest heap
@Andrei: La multi ani! :)

« First   ‹ Prev
1 2 3 4