how to get top N distinct elements from range? (page 2)

On Friday, 8 March 2013 at 16:12:37 UTC, Ivan Kazmenko wrote: >>>> I wonder if exists a way to get top n distinct elements from a range (infinite too!) >>> >>> It's impossible to do that for infinite ranges >> >> Why? >> >> sequence!"n*2".myTopDistinct!"a==b"(3); >> >> will give [2,4,6] > > I don't quite get it, do you want the least n elements? What > result would you want if the sequence was the following: > > sequence!"-n*2".myTopDistinct!"a==b"(3); Infinite loop as: sequence!"n*2".filter!"a>1"; and a lot of other cases. > > The sequence goes like [-2, -4, -6, -8, -10, ...] ad infinitum. > > To clarify a doubt, do you mean "top" as in "front", or "top" as the "greatest" or "least" in terms of comparison? Or maybe "the first n distinct elements found"? If so, why "sort" in the original post? It could bring middle elements to front. I mean "the first n distinct elements found". Example was a mistake. Uniq works with sorted range (if you want to get distinct element) so I sort it to make it works. But I should not :) Whoops. Bearophile got it btw. So, is there any lazy way to do it?

On Friday, 8 March 2013 at 15:53:56 UTC, Andrea Fontana wrote: > On Friday, 8 March 2013 at 14:43:29 UTC, jerro wrote: >> On Friday, 8 March 2013 at 13:33:24 UTC, Andrea Fontana wrote: >>> I wonder if exists a way to get top n distinct elements from a range (infinite too!) >> >> It's impossible to do that for infinite ranges > > Why? > > sequence!"n*2".myTopDistinct!"a==b"(3); > > will give [2,4,6] I didn't understand you correctly then. I thought that top n distinct elements meant n largest distinct elements. Now, I'm assuming that you just need first n distinct elements. > We just need to keep a list of "just-seen" elements > But I don't know how to make it lazy You could write your own range that uses an associative array to check for duplicates: struct Unique(R) { R r; bool[elementType!R] seen; @property front(){ return r.front; } @property empty(){ return r.empty; } void popFront() { seen[r] = bool.init; while(r.front in seen) r.popFront(); } } I don't think there's anything like this in Phobos.

Andrea Fontana: > So, is there any lazy way to do it? Take my code and put it inside a struct, add empty, popFront and front. Otherwise return an impure take of a filter closure from a function that keeps the set. Something like (untested): auto firstDistinct(Range)(Range r, size_t n) { bool[ForeachType!(Range)] mySet; return f.filter!((k) { if (k in mySet) return false; mySet[k] = true; return true; }).take(n); } Bye, bearophile

> Otherwise return an impure take of a filter closure from a function that keeps the set. Something like (untested): It seems to work: import std.stdio, std.range, std.algorithm, std.traits; auto firstDistinct(Range)(Range r, in size_t n) { bool[ForeachType!Range] mySet; return r.filter!((k) { if (k in mySet) return false; mySet[k] = true; return true; }).take(n); } void main() { immutable data = [1, 0, 0, 0, 0, 3, 3, 0, 3, 3, 0, 2, 3, 2, 3, 2, 3, 1, 1, 3, 2, 1, 0, 0, 0, 1, 0, 3, 0, 3]; foreach (i; 0 .. 6) data.firstDistinct(i).writeln; } Bye, bearophile

On Friday, 8 March 2013 at 18:17:22 UTC, bearophile wrote: >> Otherwise return an impure take of a filter closure from a function that keeps the set. Something like (untested): > > It seems to work: > > > > import std.stdio, std.range, std.algorithm, std.traits; > > auto firstDistinct(Range)(Range r, in size_t n) { > bool[ForeachType!Range] mySet; > > return r.filter!((k) { > if (k in mySet) > return false; > mySet[k] = true; > return true; > }).take(n); > } > I have to say, that's a great example of beautiful simplicity in programming. Bravo.

It needs a constraint: auto firstDistinct(Range)(Range r, in size_t n) if (isInputRange!Range) { bool[ForeachType!Range] mySet; return r.filter!((k) { if (k in mySet) return false; mySet[k] = true; return true; }).take(n); } Bye, bearophile

> auto firstDistinct(Range)(Range r, in size_t n) > if (isInputRange!Range) { > bool[ForeachType!Range] mySet; > > return r.filter!((k) { > if (k in mySet) > return false; > mySet[k] = true; > return true; > }).take(n); > } I think the standard library of Ada2012 has bounded containers, so you can use them to stack-allocate a hash set that can contain up to n items (plus some more free cells to keep its load factor low enough). In Rust there are fully safe stack-allocated closures. Combining the two things it's possible to create a function like that firstDistinct() with zero heap allocations, that is probably fast. Bye, bearophile

On 2013-03-08 14:33, Andrea Fontana wrote: > I wonder if exists a way to get top n distinct elements from a range > (infinite too!) > > A (not efficient) way to to this is range.array.sort.uniq.take(n) but > it's a bit overkill, it sorts elements, and of course doesn't work with > infinite ranges. Am i missing any function? There's a function called "topN" in std.algorithm, I don't know if that's what you're looking for. http://dlang.org/phobos/std_algorithm.html#topN -- /Jacob Carlborg

March 08, 2013

Re: how to get top N distinct elements from range?

Posted by Andrea Fontana
in reply to bearophile

Permalink

Andrea Fontana

Posted in reply to bearophile

Permalink

On Friday, 8 March 2013 at 19:36:04 UTC, bearophile wrote:
>> auto firstDistinct(Range)(Range r, in size_t n)
>> if (isInputRange!Range) {
>>    bool[ForeachType!Range] mySet;
>>
>>    return r.filter!((k) {
>>        if (k in mySet)
>>            return false;
>>        mySet[k] = true;
>>        return true;
>>    }).take(n);
>> }
>
> I think the standard library of Ada2012 has bounded containers, so you can use them to stack-allocate a hash set that can contain up to n items (plus some more free cells to keep its load factor low enough).
>
> In Rust there are fully safe stack-allocated closures. Combining the two things it's possible to create a function like that firstDistinct() with zero heap allocations, that is probably fast.
>
> Bye,
> bearophile

Something goes wrong by the way. I removed "n" template params and "take(n)" (i can do after distinct() call, isn't it the same?).

Try this (sorted for better reading):

iota(100).map!((x) => uniform(0,10))().distinct().array.sort.writeln;
iota(100).map!((x) => uniform(0,10))().array.distinct().array.sort.writeln;

output:
[0, 0, 3, 3, 4, 4, 4, 6, 7, 8]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Andrea Fontana: > Something goes wrong by the way. I removed "n" template params and "take(n)" (i can do after distinct() call, isn't it the same?). I think take() is not the cause. See this program: import std.stdio, std.range, std.algorithm, std.traits, std.random; auto distinct(Range)(Range r) if (isInputRange!Range) { bool[ForeachType!Range] mySet; return r.filter!((k) { if (k in mySet) return false; mySet[k] = true; return true; }); } void main() { 5.iota.map!((_) { auto x = uniform(0, 10); write("*"); return x; }).distinct.writeln; } It outputs something like: *[*2*, *0*, *2**, *1] You expect that output to contain only 5 stars. Maybe map() or filter() is the cause of this. Bye, bearophile

Forums