Thread overview | |||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
March 08, 2013 how to get top N distinct elements from range? | ||||
---|---|---|---|---|
| ||||
I wonder if exists a way to get top n distinct elements from a range (infinite too!) A (not efficient) way to to this is range.array.sort.uniq.take(n) but it's a bit overkill, it sorts elements, and of course doesn't work with infinite ranges. Am i missing any function? |
March 08, 2013 Re: how to get top N distinct elements from range? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrea Fontana | On 3/8/13 10:33 AM, Andrea Fontana wrote:
> I wonder if exists a way to get top n distinct elements from a range
> (infinite too!)
Top from an infinite range? o_O
|
March 08, 2013 Re: how to get top N distinct elements from range? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrea Fontana | Andrea Fontana: > I wonder if exists a way to get top n distinct elements from a range (infinite too!) > > A (not efficient) way to to this is range.array.sort.uniq.take(n) but it's a bit overkill, it sorts elements, and of course doesn't work with infinite ranges. Am i missing any function? What does it mean "top"? I think you are not missing functions. One solution (not tested): bool[ForeachType!(typeof(myRange))] mySet; foreach (item; myRange) { mySet[item] = true; if (mySet.length >= n) break; } auto top = mySet.byKey; Bye, bearophile |
March 08, 2013 Re: how to get top N distinct elements from range? | ||||
---|---|---|---|---|
| ||||
Posted in reply to bearophile | On Friday, 8 March 2013 at 14:10:55 UTC, bearophile wrote:
> Andrea Fontana:
>
>> I wonder if exists a way to get top n distinct elements from a range (infinite too!)
>>
>> A (not efficient) way to to this is range.array.sort.uniq.take(n) but it's a bit overkill, it sorts elements, and of course doesn't work with infinite ranges. Am i missing any function?
>
> What does it mean "top"?
>
> I think you are not missing functions.
>
> One solution (not tested):
>
>
> bool[ForeachType!(typeof(myRange))] mySet;
> foreach (item; myRange) {
> mySet[item] = true;
> if (mySet.length >= n)
> break;
> }
> auto top = mySet.byKey;
>
>
> Bye,
> bearophile
Yes I wonder if a lazy function like this function was in phobos. It's quite common, and I didn't want to reinvent wheel :)
|
March 08, 2013 Re: how to get top N distinct elements from range? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrea Fontana | On Friday, 8 March 2013 at 13:33:24 UTC, Andrea Fontana wrote: > I wonder if exists a way to get top n distinct elements from a range (infinite too!) It's impossible to do that for infinite ranges > A (not efficient) way to to this is range.array.sort.uniq.take(n) but it's a bit overkill, it sorts elements, and of course doesn't work with infinite ranges. Am i missing any function? You could do something like (untested code): for(size_t m = 0, mnext = n; ; n = mnext, mnext = min(2 * mnext, range.length)) { partialSort!condition(range[m .. $], mnext - m); if(range[0 .. mnext].uniq.count >= n) break; assert(mnext < range.length); } auto topN = range.uniq.take(n); You will need a random access range with slicing for that, and it will modify it. If I am not mistaken, this has time complexity O(log(ndup / n) * range.length), where ndup is the number of all elements (including duplicates) with values among the top n. |
March 08, 2013 Re: how to get top N distinct elements from range? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrea Fontana | > I wonder if exists a way to get top n distinct elements from a range (infinite too!) > A (not efficient) way to to this is range.array.sort.uniq.take(n) but it's a bit overkill, it sorts elements, and of course doesn't work with infinite ranges. Am i missing any function? For an arbitrary infinite range, it is impossible from the mathematical point of view. An impossible example would be doing that with the infinite range of natural numbers, stored in bignums. For a finite range, you can iterate over the range while maintaining a collection of the top n elements and discarding duplicates as you find them. For example, using a RedBlackTree as the collection, you will get O(log(n) * range.length) total time and O(n) total memory used. If n is small (on the order of 10s), an array will suffice, resulting in O(n * range.length) total time but with lower constant factor. |
March 08, 2013 Re: how to get top N distinct elements from range? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Ivan Kazmenko | On Friday, 8 March 2013 at 15:04:32 UTC, Ivan Kazmenko wrote:
>
> For a finite range, you can iterate over the range while maintaining a collection of the top n elements and discarding duplicates as you find them. For example, using a RedBlackTree as the collection, you will get O(log(n) * range.length) total time and O(n) total memory used. If n is small (on the order of 10s), an array will suffice, resulting in O(n * range.length) total time but with lower constant factor.
You can also use an array if you use a BinaryHeap. If you use a heap then it should have a much lower constant factor than a red-black tree so, even though they would both have O(m*log(n)) (where n is the number of elements you want the top of and m is the size of the collection) running time, a heap would be faster.
|
March 08, 2013 Re: how to get top N distinct elements from range? | ||||
---|---|---|---|---|
| ||||
Posted in reply to jerro | On Friday, 8 March 2013 at 14:43:29 UTC, jerro wrote: > On Friday, 8 March 2013 at 13:33:24 UTC, Andrea Fontana wrote: >> I wonder if exists a way to get top n distinct elements from a range (infinite too!) > > It's impossible to do that for infinite ranges Why? sequence!"n*2".myTopDistinct!"a==b"(3); will give [2,4,6] We just need to keep a list of "just-seen" elements But I don't know how to make it lazy |
March 08, 2013 Re: how to get top N distinct elements from range? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrea Fontana | >>> I wonder if exists a way to get top n distinct elements from a range (infinite too!) >> >> It's impossible to do that for infinite ranges > > Why? > > sequence!"n*2".myTopDistinct!"a==b"(3); > > will give [2,4,6] I don't quite get it, do you want the least n elements? What result would you want if the sequence was the following: sequence!"-n*2".myTopDistinct!"a==b"(3); The sequence goes like [-2, -4, -6, -8, -10, ...] ad infinitum. To clarify a doubt, do you mean "top" as in "front", or "top" as the "greatest" or "least" in terms of comparison? Or maybe "the first n distinct elements found"? If so, why "sort" in the original post? It could bring middle elements to front. |
March 08, 2013 Re: how to get top N distinct elements from range? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Chris Cain | >> For a finite range, you can iterate over the range while maintaining a collection of the top n elements and discarding duplicates as you find them. For example, using a RedBlackTree as the collection, you will get O(log(n) * range.length) total time and O(n) total memory used. If n is small (on the order of 10s), an array will suffice, resulting in O(n * range.length) total time but with lower constant factor.
>
> You can also use an array if you use a BinaryHeap. If you use a heap then it should have a much lower constant factor than a red-black tree so, even though they would both have O(m*log(n)) (where n is the number of elements you want the top of and m is the size of the collection) running time, a heap would be faster.
My first thought was about a heap, too, but it does not allow to search for a duplicate in O(log(n)) as it guarantees only partial ordering.
|
Copyright © 1999-2021 by the D Language Foundation