Thread overview  


November 30, 2015 An interesting data structure with search time O(sqrt n)  

 
Okasaki's book is a continued inspiration of data structures and algorithms. I was thinking along the following lines: typical collections are searchable in linear time. Then we have highly structured collections that feature logarithmic search time. But there seems to be nothing in between. So I was thinking, what would be a data structure that allows O(sqrt n) search times? After a little tinkering, here's what I came up with. Start with a simple array of data. Then mentally decompose that array into a concatenation of smaller arrays: first has size 1, second has size 4, third has size 9, then 16, 25, 36, ... Generally the size of these imaginary subarrays grows quadratically. And they're adjacent to each other. The last array may be incomplete. Example: we decompose an array of size 35 into: an array of size 1 followed by arrays of sizes 4, 9, 16, 5 (last being an incomplete fragment of an array of size 25). Now each of these arrays we organize as a max heap. Moreover, we arrange data such that the maximums of these heaps are in INCREASING order. That means the smallest element of the entire (initial) array is at the first position, then followed by the next 4 smallest organized in a max heap, followed by the next 9 smallest organized in a max heap, ... etc. There are a total of O(sqrt n) heaps, and each has O(sqrt n) elements. That's the layout! Now, to search we look at one heap at a time. Whenever the maximum element (first element in the subarray) is smaller than the searched value, we skip over that entire heap and go to the next one. In the worst case we'll skip O(sqrt n) heaps. When the max value in a heap is less than the searched element, we found the heap and we run a linear search among O(sqrt n) elements. The structure is nice for sorting and in particular parallel sorting because each subarray can be sorted independently  there's no migration into or out of each subarray. Inside each subarray, of course heapsort would be a good choice because it takes advantage of the existing max heap. I haven't worked out the math for insertion and deletion yet, but they seem to also be around O(sqrt n) or O(log(n) * sqrt(n)). Just wanted to share with you and discuss what seems to be an interesting data structure. Please share any thoughts! Andrei 
November 30, 2015 Re: An interesting data structure with search time O(sqrt n)  

 
Posted in reply to Andrei Alexandrescu  On Mon, Nov 30, 2015 at 03:13:11PM 0500, Andrei Alexandrescu via Digitalmarsd wrote: > Okasaki's book is a continued inspiration of data structures and algorithms. > > I was thinking along the following lines: typical collections are searchable in linear time. Then we have highly structured collections that feature logarithmic search time. But there seems to be nothing in between. So I was thinking, what would be a data structure that allows O(sqrt n) search times? > > After a little tinkering, here's what I came up with. Interesting indeed. It leaves me wondering, though, what's the point of having an O(sqrt n) collection? What are the advantages? Why would I use this structure instead of, say, a traditional array heap with O(log n) search time? T  Why are you blatanly misspelling "blatant"?  Branden Robinson 
November 30, 2015 Re: An interesting data structure with search time O(sqrt n)  

 
Posted in reply to Andrei Alexandrescu  On 30Nov2015 23:13, Andrei Alexandrescu wrote: > Okasaki's book is a continued inspiration of data structures and > algorithms. > > I was thinking along the following lines: typical collections are > searchable in linear time. Then we have highly structured collections > that feature logarithmic search time. But there seems to be nothing in > between. So I was thinking, what would be a data structure that allows > O(sqrt n) search times? > > After a little tinkering, here's what I came up with. > > Start with a simple array of data. Then mentally decompose that array > into a concatenation of smaller arrays: first has size 1, second has > size 4, third has size 9, then 16, 25, 36, ... Generally the size of > these imaginary subarrays grows quadratically. And they're adjacent to > each other. The last array may be incomplete. > > Example: we decompose an array of size 35 into: an array of size 1 > followed by arrays of sizes 4, 9, 16, 5 (last being an incomplete > fragment of an array of size 25). > > Now each of these arrays we organize as a max heap. Moreover, we arrange > data such that the maximums of these heaps are in INCREASING order. That > means the smallest element of the entire (initial) array is at the first > position, then followed by the next 4 smallest organized in a max heap, > followed by the next 9 smallest organized in a max heap, ... etc. There > are a total of O(sqrt n) heaps, and each has O(sqrt n) elements. > > That's the layout! Now, to search we look at one heap at a time. > Whenever the maximum element (first element in the subarray) is smaller > than the searched value, we skip over that entire heap and go to the > next one. In the worst case we'll skip O(sqrt n) heaps. When the max > value in a heap is less than the searched element, we found the heap and > we run a linear search among O(sqrt n) elements. Reminds me of Van Emde Boas layout which is however fractal in nature: sqrt(N) pieces each having sqrt(N) element are subdivided again into sqrt(sqrt(N)) pieces and so on. Not sure if you have seen, but see also cacheoblivious datastructures: http://erikdemaine.org/papers/BRICS2002/paper.pdf  Dmitry Olshansky 
November 30, 2015 Re: An interesting data structure with search time O(sqrt n)  

 
Posted in reply to H. S. Teoh  On 11/30/15 3:20 PM, H. S. Teoh via Digitalmarsd wrote:
> On Mon, Nov 30, 2015 at 03:13:11PM 0500, Andrei Alexandrescu via Digitalmarsd wrote:
>> Okasaki's book is a continued inspiration of data structures and
>> algorithms.
>>
>> I was thinking along the following lines: typical collections are
>> searchable in linear time. Then we have highly structured collections
>> that feature logarithmic search time. But there seems to be nothing in
>> between. So I was thinking, what would be a data structure that allows
>> O(sqrt n) search times?
>>
>> After a little tinkering, here's what I came up with.
>
> Interesting indeed.
>
> It leaves me wondering, though, what's the point of having an O(sqrt n)
> collection? What are the advantages? Why would I use this structure
> instead of, say, a traditional array heap with O(log n) search time?
(Heaps offer only linear search time. You may take advantage of the heap structure to skip portions of the array, but on average and in the worst case the search is still O(n). So I assume you meant "sorted array or one of the classic search trees".)
The motivation starts with a desire to use arrays as the fundamental layout. There have been many trends in computing as of late, among which: cache hierarchies are here to stay and contiguous layout is preferable.
The short of it is, arrays are king. No two ways about it  following pointers is a losing strategy when there's an alternative. A variety of new data structures (Clojure's arrays, heaps with tombstones) avoid classic pointerbased data structures in favor of adding structure on top of arrays.
So now if we consider thinking, "how do we organize an array for good search performance" we have a spectrum. Generally we also care about insertion and removal.
At one end of the spectrum there's doing nothing at all  that means O(1) build (nothing to do), O(n) search, O(1) insert (just append it), O(n) removal. Not very nice.
At the other end, the absolute best structuring on top of an array for search is sorting. With sorting you get great O(log n) search performance. But the others are not nice  O(n log n) build, O(n) add, O(n) remove.
So now consider my square heaps. We have O(n) build time (just a bunch of heapifications) and O(sqrt n) search. Then (again I haven't worked out the math yet) let's assume insertion and removal are both O(sqrt n). Then you get something less structured than full sorting, but also just good enough to offer the same complexity for each of search, insert, and delete. That would be pretty neat.
Andrei

November 30, 2015 Re: An interesting data structure with search time O(sqrt n)  

 
Posted in reply to Dmitry Olshansky  On 11/30/15 3:29 PM, Dmitry Olshansky wrote:
> Reminds me of Van Emde Boas layout which is however fractal in nature:
> sqrt(N) pieces each having sqrt(N) element are subdivided again into
> sqrt(sqrt(N)) pieces and so on.
>
> Not sure if you have seen, but see also cacheoblivious datastructures:
> http://erikdemaine.org/papers/BRICS2002/paper.pdf
Thanks, I'll look these up!  Andrei

November 30, 2015 Re: An interesting data structure with search time O(sqrt n)  

 
Posted in reply to Andrei Alexandrescu  On Mon, Nov 30, 2015 at 03:57:24PM 0500, Andrei Alexandrescu via Digitalmarsd wrote: > On 11/30/15 3:20 PM, H. S. Teoh via Digitalmarsd wrote: > >On Mon, Nov 30, 2015 at 03:13:11PM 0500, Andrei Alexandrescu via Digitalmarsd wrote: > >>Okasaki's book is a continued inspiration of data structures and algorithms. > >> > >>I was thinking along the following lines: typical collections are searchable in linear time. Then we have highly structured collections that feature logarithmic search time. But there seems to be nothing in between. So I was thinking, what would be a data structure that allows O(sqrt n) search times? > >> > >>After a little tinkering, here's what I came up with. > > > >Interesting indeed. > > > >It leaves me wondering, though, what's the point of having an O(sqrt n) collection? What are the advantages? Why would I use this structure instead of, say, a traditional array heap with O(log n) search time? > > (Heaps offer only linear search time. You may take advantage of the heap structure to skip portions of the array, but on average and in the worst case the search is still O(n). So I assume you meant "sorted array or one of the classic search trees".) Right, I wrote that without thinking. :P > The motivation starts with a desire to use arrays as the fundamental layout. There have been many trends in computing as of late, among which: cache hierarchies are here to stay and contiguous layout is preferable. > > The short of it is, arrays are king. No two ways about it  following pointers is a losing strategy when there's an alternative. A variety of new data structures (Clojure's arrays, heaps with tombstones) avoid classic pointerbased data structures in favor of adding structure on top of arrays. I'm not so sure about that. At the micro level, yes, following pointers for a tree / graph / etc., that could easily fit in a small/medium sized array is a losing strategy, due to cache misses, hardware prefetchers, etc.. When you're dealing with larger amounts of data, though, things become less clearcut. If your array size is on the order of MB's or GB's, pointers start looking a lot more attractive. IMO in the long run what will win is data structures that use tree or pointer based implementations in the large scale, but built on cachefriendly arraybased structures below a certain level of granularity. I agree with you, though, that arraybased structures of intermediate bigO complexities are very interesting. If you can come up with an array structure that has the same complexity for all common operations, that could be useful as a quickfix drop in solution in cases where top performance isn't required, but you'd like something better than O(n) array scanning. T  Philosophy: how to make a career out of daydreaming. 
November 30, 2015 Re: An interesting data structure with search time O(sqrt n)  

 
Posted in reply to Andrei Alexandrescu  On Monday, 30 November 2015 at 20:13:15 UTC, Andrei Alexandrescu wrote:
> Now each of these arrays we organize as a max heap. Moreover, we arrange data such that the maximums of these heaps are in INCREASING order. That means the smallest element of the entire (initial) array is at the first position, then followed by the next 4 smallest organized in a max heap, followed by the next 9 smallest organized in a max heap, ... etc. There are a total of O(sqrt n) heaps, and each has O(sqrt n) elements.
Isn't the size of each heap a little larger than O(sqrt n)? The total number of elements you can hold in k heaps is equal to the kth square pyramidal number, which is of size O(k^3), so the largest heap is of size k^2 = O(n^(2/3))

November 30, 2015 Re: An interesting data structure with search time O(sqrt n)  

 
Posted in reply to Andrei Alexandrescu  On Monday, 30 November 2015 at 20:13:15 UTC, Andrei Alexandrescu wrote: ... > smallest organized in a max heap, ... etc. There are a total of O(sqrt n) heaps, and each has O(sqrt n) elements. ... Hi Andrei, If I'm not mistaken, the number of heaps is proportional to n^(1/3) not n^(1/2) 1+2^2+..+n^2 is proportional to n^3 https://en.wikipedia.org/wiki/Square_pyramidal_number If n heaps have n^3 elements, rewrite n^3=m, then m elements will be stored in m^(1/3) heaps Interesting data structure! need to look over the details some more Titus 
November 30, 2015 Re: An interesting data structure with search time O(sqrt n)  

 
Posted in reply to Titus Nicolae  On Monday, 30 November 2015 at 23:19:44 UTC, Titus Nicolae wrote:
> On Monday, 30 November 2015 at 20:13:15 UTC, Andrei Alexandrescu wrote:
> ...
>> smallest organized in a max heap, ... etc. There are a total of O(sqrt n) heaps, and each has O(sqrt n) elements.
> ...
>
> Hi Andrei,
> If I'm not mistaken, the number of heaps is proportional to n^(1/3) not n^(1/2)
> 1+2^2+..+n^2 is proportional to n^3
> https://en.wikipedia.org/wiki/Square_pyramidal_number
> If n heaps have n^3 elements, rewrite n^3=m, then m elements will be stored in m^(1/3) heaps
> Interesting data structure! need to look over the details some more
> Titus
I think Titus and I noticed the same thing at the same time.
You could achieve O(sqrt n) lookup time by having each heap contain ceil(sqrt(n)) elements (except for the last one), giving you O(sqrt n) heaps of size O(sqrt n), but it may make insertion trickier as you try to keep the heaps the same size.

November 30, 2015 Re: An interesting data structure with search time O(sqrt n)  

 
Posted in reply to Titus Nicolae  On Monday, 30 November 2015 at 23:19:44 UTC, Titus Nicolae wrote:
> On Monday, 30 November 2015 at 20:13:15 UTC, Andrei Alexandrescu wrote:
> ...
>> smallest organized in a max heap, ... etc. There are a total of O(sqrt n) heaps, and each has O(sqrt n) elements.
> ...
>
> Hi Andrei,
> If I'm not mistaken, the number of heaps is proportional to n^(1/3) not n^(1/2)
> 1+2^2+..+n^2 is proportional to n^3
> https://en.wikipedia.org/wiki/Square_pyramidal_number
> If n heaps have n^3 elements, rewrite n^3=m, then m elements will be stored in m^(1/3) heaps
> Interesting data structure! need to look over the details some more
> Titus
This might give a better balanced structure.
Sum of odd numbers 1+3+5+..+(2n1) = n^2
For N total elements, there are sqrt(n) heaps, and 2*sqrt(n) elements in the largest heap
@Andrei: La multi ani! :)

« First ‹ Prev 

Copyright © 19992018 by the D Language Foundation