March 08, 2013 Re: how to get top N distinct elements from range? | ||||
---|---|---|---|---|
| ||||
Posted in reply to bearophile | > You expect that output to contain only 5 stars.
Better seen with this code:
import std.stdio, std.range, std.algorithm, std.traits, std.random;
auto distinct(Range)(Range r) if (isInputRange!Range) {
bool[ForeachType!Range] mySet;
return r.filter!((k) {
if (k in mySet)
return false;
mySet[k] = true;
return true;
});
}
void main() {
5.iota.map!((_) {
auto x = uniform(0, 10);
write("*");
return x;
}).distinct.array;
}
Bye,
bearophile
|
March 08, 2013 Re: how to get top N distinct elements from range? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrea Fontana | On Friday, 8 March 2013 at 22:36:35 UTC, Andrea Fontana wrote:
> On Friday, 8 March 2013 at 19:36:04 UTC, bearophile wrote:
>>> auto firstDistinct(Range)(Range r, in size_t n)
>>> if (isInputRange!Range) {
>>> bool[ForeachType!Range] mySet;
>>>
>>> return r.filter!((k) {
>>> if (k in mySet)
>>> return false;
>>> mySet[k] = true;
>>> return true;
>>> }).take(n);
>>> }
>>
>> I think the standard library of Ada2012 has bounded containers, so you can use them to stack-allocate a hash set that can contain up to n items (plus some more free cells to keep its load factor low enough).
>>
>> In Rust there are fully safe stack-allocated closures. Combining the two things it's possible to create a function like that firstDistinct() with zero heap allocations, that is probably fast.
>>
>> Bye,
>> bearophile
>
> Something goes wrong by the way. I removed "n" template params and "take(n)" (i can do after distinct() call, isn't it the same?).
>
> Try this (sorted for better reading):
>
> iota(100).map!((x) => uniform(0,10))().distinct().array.sort.writeln;
> iota(100).map!((x) => uniform(0,10))().array.distinct().array.sort.writeln;
>
> output:
> [0, 0, 3, 3, 4, 4, 4, 6, 7, 8]
> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Here the answer:
auto r=iota(100).map!((x) => uniform(0,10))();
writeln(r.front," ", r.front, " ", r.front, " ", r.front);
But i think "front" was "cached", but it seems not...
|
March 08, 2013 Re: how to get top N distinct elements from range? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrea Fontana | On Friday, 8 March 2013 at 23:37:20 UTC, Andrea Fontana wrote:
> On Friday, 8 March 2013 at 22:36:35 UTC, Andrea Fontana wrote:
>> On Friday, 8 March 2013 at 19:36:04 UTC, bearophile wrote:
>>>> auto firstDistinct(Range)(Range r, in size_t n)
>>>> if (isInputRange!Range) {
>>>> bool[ForeachType!Range] mySet;
>>>>
>>>> return r.filter!((k) {
>>>> if (k in mySet)
>>>> return false;
>>>> mySet[k] = true;
>>>> return true;
>>>> }).take(n);
>>>> }
>>>
>>> I think the standard library of Ada2012 has bounded containers, so you can use them to stack-allocate a hash set that can contain up to n items (plus some more free cells to keep its load factor low enough).
>>>
>>> In Rust there are fully safe stack-allocated closures. Combining the two things it's possible to create a function like that firstDistinct() with zero heap allocations, that is probably fast.
>>>
>>> Bye,
>>> bearophile
>>
>> Something goes wrong by the way. I removed "n" template params and "take(n)" (i can do after distinct() call, isn't it the same?).
>>
>> Try this (sorted for better reading):
>>
>> iota(100).map!((x) => uniform(0,10))().distinct().array.sort.writeln;
>> iota(100).map!((x) => uniform(0,10))().array.distinct().array.sort.writeln;
>>
>> output:
>> [0, 0, 3, 3, 4, 4, 4, 6, 7, 8]
>> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>
> Here the answer:
>
> auto r=iota(100).map!((x) => uniform(0,10))();
> writeln(r.front," ", r.front, " ", r.front, " ", r.front);
>
> But i think "front" was "cached", but it seems not...
>
This is my "struct" version that helped me to findout that problem.
auto distinct(Range)(Range rs) if (isInputRange!(Unqual!Range))
{
return DistinctResult!Range(rs);
}
struct DistinctResult(Range) if (isInputRange!(Unqual!Range))
{
alias Unqual!Range T;
T range;
bool[ElementType!T] justSeen;
@property front() { return range.front; }
@property empty() { return range.empty; }
void popFront()
{
justSeen[range.front] = true;
while(true)
{
range.popFront();
if (range.empty || range.front !in justSeen) break;
}
}
}
|
March 08, 2013 Re: how to get top N distinct elements from range? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrea Fontana | Andrea Fontana:
> But i think "front" was "cached", but it seems not...
The caching of map front was recently removed.
More test code:
import std.stdio, std.range, std.algorithm, std.traits, std.random;
void main() {
5.iota.map!((_) {
auto x = uniform(0, 10);
write("*");
return x;
}).filter!(_ => true).array;
}
Prints:
**********
Bye,
bearophile
|
March 08, 2013 Re: how to get top N distinct elements from range? | ||||
---|---|---|---|---|
| ||||
Posted in reply to bearophile | Simpler: import std.stdio, std.range, std.algorithm; void main() { iota(5).map!((x) { write("*"); return x; }).filter!(_ => true).array; } |
March 08, 2013 Re: how to get top N distinct elements from range? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrea Fontana | Andrea Fontana:
> Here the answer:
>
> auto r=iota(100).map!((x) => uniform(0,10))();
> writeln(r.front," ", r.front, " ", r.front, " ", r.front);
I think that's not the answer. I think the answer is a bug in filter().
Bye,
bearophile
|
March 08, 2013 Re: how to get top N distinct elements from range? | ||||
---|---|---|---|---|
| ||||
Posted in reply to bearophile | On Friday, 8 March 2013 at 23:51:48 UTC, bearophile wrote:
> Andrea Fontana:
>
>> Here the answer:
>>
>> auto r=iota(100).map!((x) => uniform(0,10))();
>> writeln(r.front," ", r.front, " ", r.front, " ", r.front);
>
> I think that's not the answer. I think the answer is a bug in filter().
>
> Bye,
> bearophile
My struct version never work if map doesn't cache. Every time I call (inside popFront() too!) front() gives different values.
If i do array.distinct() it works fine: an array is filled with random values and they will never changes again.
|
March 09, 2013 Re: how to get top N distinct elements from range? | ||||
---|---|---|---|---|
| ||||
Posted in reply to bearophile | On Friday, 8 March 2013 at 23:44:40 UTC, bearophile wrote:
> Simpler:
>
>
> import std.stdio, std.range, std.algorithm;
>
> void main() {
> iota(5).map!((x) {
> write("*");
> return x;
> }).filter!(_ => true).array;
> }
Chech this:
5.iota.map!((_) {
auto x = uniform(0, 1000);
writeln(" MAP-1 = ", x);
return x;
}).filter!(_ => true).map!((x){ writeln(" MAP-2 = ", x); return
x; })().array.writeln;
Here some function calls front() internally (filter? map?) to
check some conditions and triggers random creation again.
|
March 09, 2013 Re: how to get top N distinct elements from range? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrea Fontana | This is a shortened version of the filter(), it calls pred only n times, but it calls _input.front n*2 times. If _input.front is not deterministic (that means it's not pure, as in the case of distinct filtering function) it gives wrong results: struct FilterResult(alias pred, Range) { Range _input; this(Range r) { _input = r; while (!_input.empty && !pred(_input.front)) { _input.popFront(); } } @property bool empty() { return _input.empty; } void popFront() { do { _input.popFront(); } while (!_input.empty && !pred(_input.front)); } @property auto ref front() { return _input.front; } } This isn't exactly a bug of filter(), it's a design decision of not caching _input.front. Bye, bearophile |
March 09, 2013 Re: how to get top N distinct elements from range? | ||||
---|---|---|---|---|
| ||||
Posted in reply to bearophile | On Saturday, 9 March 2013 at 01:00:26 UTC, bearophile wrote:
> This is a shortened version of the filter(), it calls pred only n times, but it calls _input.front n*2 times. If _input.front is not deterministic (that means it's not pure, as in the case of distinct filtering function) it gives wrong results:
>
>
> struct FilterResult(alias pred, Range) {
> Range _input;
>
> this(Range r) {
> _input = r;
> while (!_input.empty && !pred(_input.front)) {
> _input.popFront();
> }
> }
>
> @property bool empty() { return _input.empty; }
>
> void popFront() {
> do {
> _input.popFront();
> } while (!_input.empty && !pred(_input.front));
> }
>
> @property auto ref front() {
> return _input.front;
> }
> }
>
>
> This isn't exactly a bug of filter(), it's a design decision of not caching _input.front.
>
> Bye,
> bearophile
Not funny btw :) It's not so easy to find out. And it's not easy to write a "cached map" function if you use classes or some pointers as ElementType...
|
Copyright © 1999-2021 by the D Language Foundation