March 08, 2013 Re: how to get top N distinct elements from range? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Ivan Kazmenko | On Friday, 8 March 2013 at 16:12:37 UTC, Ivan Kazmenko wrote: >>>> I wonder if exists a way to get top n distinct elements from a range (infinite too!) >>> >>> It's impossible to do that for infinite ranges >> >> Why? >> >> sequence!"n*2".myTopDistinct!"a==b"(3); >> >> will give [2,4,6] > > I don't quite get it, do you want the least n elements? What > result would you want if the sequence was the following: > > sequence!"-n*2".myTopDistinct!"a==b"(3); Infinite loop as: sequence!"n*2".filter!"a>1"; and a lot of other cases. > > The sequence goes like [-2, -4, -6, -8, -10, ...] ad infinitum. > > To clarify a doubt, do you mean "top" as in "front", or "top" as the "greatest" or "least" in terms of comparison? Or maybe "the first n distinct elements found"? If so, why "sort" in the original post? It could bring middle elements to front. I mean "the first n distinct elements found". Example was a mistake. Uniq works with sorted range (if you want to get distinct element) so I sort it to make it works. But I should not :) Whoops. Bearophile got it btw. So, is there any lazy way to do it? |
March 08, 2013 Re: how to get top N distinct elements from range? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrea Fontana | On Friday, 8 March 2013 at 15:53:56 UTC, Andrea Fontana wrote: > On Friday, 8 March 2013 at 14:43:29 UTC, jerro wrote: >> On Friday, 8 March 2013 at 13:33:24 UTC, Andrea Fontana wrote: >>> I wonder if exists a way to get top n distinct elements from a range (infinite too!) >> >> It's impossible to do that for infinite ranges > > Why? > > sequence!"n*2".myTopDistinct!"a==b"(3); > > will give [2,4,6] I didn't understand you correctly then. I thought that top n distinct elements meant n largest distinct elements. Now, I'm assuming that you just need first n distinct elements. > We just need to keep a list of "just-seen" elements > But I don't know how to make it lazy You could write your own range that uses an associative array to check for duplicates: struct Unique(R) { R r; bool[elementType!R] seen; @property front(){ return r.front; } @property empty(){ return r.empty; } void popFront() { seen[r] = bool.init; while(r.front in seen) r.popFront(); } } I don't think there's anything like this in Phobos. |
March 08, 2013 Re: how to get top N distinct elements from range? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrea Fontana | Andrea Fontana:
> So, is there any lazy way to do it?
Take my code and put it inside a struct, add empty, popFront and front.
Otherwise return an impure take of a filter closure from a function that keeps the set. Something like (untested):
auto firstDistinct(Range)(Range r, size_t n) {
bool[ForeachType!(Range)] mySet;
return f.filter!((k) {
if (k in mySet)
return false;
mySet[k] = true;
return true;
}).take(n);
}
Bye,
bearophile
|
March 08, 2013 Re: how to get top N distinct elements from range? | ||||
---|---|---|---|---|
| ||||
Posted in reply to bearophile | > Otherwise return an impure take of a filter closure from a function that keeps the set. Something like (untested):
It seems to work:
import std.stdio, std.range, std.algorithm, std.traits;
auto firstDistinct(Range)(Range r, in size_t n) {
bool[ForeachType!Range] mySet;
return r.filter!((k) {
if (k in mySet)
return false;
mySet[k] = true;
return true;
}).take(n);
}
void main() {
immutable data = [1, 0, 0, 0, 0, 3, 3, 0, 3, 3, 0, 2, 3, 2, 3, 2,
3, 1, 1, 3, 2, 1, 0, 0, 0, 1, 0, 3, 0, 3];
foreach (i; 0 .. 6)
data.firstDistinct(i).writeln;
}
Bye,
bearophile
|
March 08, 2013 Re: how to get top N distinct elements from range? | ||||
---|---|---|---|---|
| ||||
Posted in reply to bearophile | On Friday, 8 March 2013 at 18:17:22 UTC, bearophile wrote:
>> Otherwise return an impure take of a filter closure from a function that keeps the set. Something like (untested):
>
> It seems to work:
>
>
>
> import std.stdio, std.range, std.algorithm, std.traits;
>
> auto firstDistinct(Range)(Range r, in size_t n) {
> bool[ForeachType!Range] mySet;
>
> return r.filter!((k) {
> if (k in mySet)
> return false;
> mySet[k] = true;
> return true;
> }).take(n);
> }
>
I have to say, that's a great example of beautiful simplicity in programming. Bravo.
|
March 08, 2013 Re: how to get top N distinct elements from range? | ||||
---|---|---|---|---|
| ||||
Posted in reply to bearophile | It needs a constraint: auto firstDistinct(Range)(Range r, in size_t n) if (isInputRange!Range) { bool[ForeachType!Range] mySet; return r.filter!((k) { if (k in mySet) return false; mySet[k] = true; return true; }).take(n); } Bye, bearophile |
March 08, 2013 Re: how to get top N distinct elements from range? | ||||
---|---|---|---|---|
| ||||
Posted in reply to bearophile | > auto firstDistinct(Range)(Range r, in size_t n)
> if (isInputRange!Range) {
> bool[ForeachType!Range] mySet;
>
> return r.filter!((k) {
> if (k in mySet)
> return false;
> mySet[k] = true;
> return true;
> }).take(n);
> }
I think the standard library of Ada2012 has bounded containers, so you can use them to stack-allocate a hash set that can contain up to n items (plus some more free cells to keep its load factor low enough).
In Rust there are fully safe stack-allocated closures. Combining the two things it's possible to create a function like that firstDistinct() with zero heap allocations, that is probably fast.
Bye,
bearophile
|
March 08, 2013 Re: how to get top N distinct elements from range? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrea Fontana | On 2013-03-08 14:33, Andrea Fontana wrote: > I wonder if exists a way to get top n distinct elements from a range > (infinite too!) > > A (not efficient) way to to this is range.array.sort.uniq.take(n) but > it's a bit overkill, it sorts elements, and of course doesn't work with > infinite ranges. Am i missing any function? There's a function called "topN" in std.algorithm, I don't know if that's what you're looking for. http://dlang.org/phobos/std_algorithm.html#topN -- /Jacob Carlborg |
March 08, 2013 Re: how to get top N distinct elements from range? | ||||
---|---|---|---|---|
| ||||
Posted in reply to bearophile | On Friday, 8 March 2013 at 19:36:04 UTC, bearophile wrote: >> auto firstDistinct(Range)(Range r, in size_t n) >> if (isInputRange!Range) { >> bool[ForeachType!Range] mySet; >> >> return r.filter!((k) { >> if (k in mySet) >> return false; >> mySet[k] = true; >> return true; >> }).take(n); >> } > > I think the standard library of Ada2012 has bounded containers, so you can use them to stack-allocate a hash set that can contain up to n items (plus some more free cells to keep its load factor low enough). > > In Rust there are fully safe stack-allocated closures. Combining the two things it's possible to create a function like that firstDistinct() with zero heap allocations, that is probably fast. > > Bye, > bearophile Something goes wrong by the way. I removed "n" template params and "take(n)" (i can do after distinct() call, isn't it the same?). Try this (sorted for better reading): iota(100).map!((x) => uniform(0,10))().distinct().array.sort.writeln; iota(100).map!((x) => uniform(0,10))().array.distinct().array.sort.writeln; output: [0, 0, 3, 3, 4, 4, 4, 6, 7, 8] [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] |
March 08, 2013 Re: how to get top N distinct elements from range? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrea Fontana | Andrea Fontana:
> Something goes wrong by the way. I removed "n" template params and "take(n)" (i can do after distinct() call, isn't it the same?).
I think take() is not the cause.
See this program:
import std.stdio, std.range, std.algorithm, std.traits, std.random;
auto distinct(Range)(Range r) if (isInputRange!Range) {
bool[ForeachType!Range] mySet;
return r.filter!((k) {
if (k in mySet)
return false;
mySet[k] = true;
return true;
});
}
void main() {
5.iota.map!((_) {
auto x = uniform(0, 10);
write("*");
return x;
}).distinct.writeln;
}
It outputs something like:
*[*2*, *0*, *2**, *1]
You expect that output to contain only 5 stars.
Maybe map() or filter() is the cause of this.
Bye,
bearophile
|
Copyright © 1999-2021 by the D Language Foundation