March 09, 2013
Andrea Fontana:

> Not funny btw :) It's not so easy to find out. And it's not easy to write a "cached map" function if you use classes or some pointers as ElementType...

Not caching _input.front in filter() means that filter() assumes the front() of its argument range to act as a pure function (even if this isn't enforced by the type system). At least this should be documented in std.algorithm ddocs...


Python filter() does not work that way:


from sys import stdout
def fun(x):
    stdout.write("*")
    return x
list(filter(lambda x: True, map(fun, xrange(5))))


Prints:

*****

Bye,
bearophile
March 09, 2013
> from sys import stdout
> def fun(x):
>     stdout.write("*")
>     return x
> list(filter(lambda x: True, map(fun, xrange(5))))

Sorry, I meant:


from itertools import imap, ifilter
from sys import stdout
def fun(x):
    stdout.write("*")
    return x
list(ifilter(lambda x: True, imap(fun, xrange(5))))
March 09, 2013
With this modified filter (that caches _input.front) it seems to work:


import std.stdio, std.range, std.algorithm, std.traits, std.random;

struct FilterResult2(alias pred, Range) {
    import std.traits: ForeachType;

    Range _input;
    ForeachType!Range rFront;

    this(Range r) {
        _input = r;

        while (!_input.empty) {
            this.rFront = _input.front;
            if (pred(this.rFront))
                break;
            _input.popFront();
        }
    }

    @property bool empty() { return _input.empty; }

    void popFront() {
        do {
            _input.popFront();
            if (_input.empty)
                break;
            this.rFront = _input.front;
            if (pred(this.rFront))
                break;
        } while (true);
    }

    @property auto ref front() {
        return this.rFront;
    }
}

FilterResult2!(pred, Range) filter2(alias pred, Range)(Range rs) {
    return FilterResult2!(pred, Range)(rs);
}


auto distinct(Range)(Range r)
if (isInputRange!Range) {
    bool[ForeachType!Range] mySet;

    return r.filter2!((k) {
        if (k in mySet)
            return false;
        mySet[k] = true;
        return true;
    });
}

void main() {
    100.iota.map!(_ => uniform(0, 10)).distinct.array.sort.writeln;
}


Is it a good idea to replace std.algorithm.filter with something like that filter2 (plus some missing methods)?

Bye,
bearophile
March 09, 2013
On Saturday, 9 March 2013 at 01:36:53 UTC, bearophile wrote:
>
> Is it a good idea to replace std.algorithm.filter with something like that filter2 (plus some missing methods)?
>
> Bye,
> bearophile

Maybe two versions (filter and cachedFilter) or a bool template param?

I was thinking about @pure front() too: but I think it's a wrong assumption. The right assumption would be that front should return the same value until popFront is called again. It can read front value lazily from front() call. It can do a lot of impure things (lol) but it shouldn't change front "randomly" at each call.

I would improve distinct to support an alias pred = "a < b" to build a bst instead of an AA.

Or just a field like distinct!"a.id" (that should work with aa backend)
March 09, 2013
On Saturday, 9 March 2013 at 09:13:26 UTC, Andrea Fontana wrote:
> On Saturday, 9 March 2013 at 01:36:53 UTC, bearophile wrote:
>>
>> Is it a good idea to replace std.algorithm.filter with something like that filter2 (plus some missing methods)?
>>
>> Bye,
>> bearophile
>
> Maybe two versions (filter and cachedFilter) or a bool template param?
>
> I was thinking about @pure front() too: but I think it's a wrong assumption. The right assumption would be that front should return the same value until popFront is called again. It can read front value lazily from front() call. It can do a lot of impure things (lol) but it shouldn't change front "randomly" at each call.
>
> I would improve distinct to support an alias pred = "a < b" to build a bst instead of an AA.
>
> Or just a field like distinct!"a.id" (that should work with aa backend)

I almost got it:

auto distinct(alias fun = "a", Range)(Range r)
if (isInputRange!Range) {
	alias unaryFun!fun _fun;

	bool[ReturnType!_fun] mySet;
	
	return r.filter2!((k) {
					  	if (_fun(k) in mySet)
					  		return false;
					  	mySet[_fun(k)] = true;
					  	return true;
					  });
}

ReturnType!_fun doesn't work, if i set to the right type, function works as expected
March 09, 2013
On Saturday, 9 March 2013 at 09:32:43 UTC, Andrea Fontana wrote:
> On Saturday, 9 March 2013 at 09:13:26 UTC, Andrea Fontana wrote:
>> On Saturday, 9 March 2013 at 01:36:53 UTC, bearophile wrote:
>>>
>>> Is it a good idea to replace std.algorithm.filter with something like that filter2 (plus some missing methods)?
>>>
>>> Bye,
>>> bearophile
>>
>> Maybe two versions (filter and cachedFilter) or a bool template param?
>>
>> I was thinking about @pure front() too: but I think it's a wrong assumption. The right assumption would be that front should return the same value until popFront is called again. It can read front value lazily from front() call. It can do a lot of impure things (lol) but it shouldn't change front "randomly" at each call.
>>
>> I would improve distinct to support an alias pred = "a < b" to build a bst instead of an AA.
>>
>> Or just a field like distinct!"a.id" (that should work with aa backend)
>
> I almost got it:
>
> auto distinct(alias fun = "a", Range)(Range r)
> if (isInputRange!Range) {
> 	alias unaryFun!fun _fun;
>
> 	bool[ReturnType!_fun] mySet;
> 	
> 	return r.filter2!((k) {
> 					  	if (_fun(k) in mySet)
> 					  		return false;
> 					  	mySet[_fun(k)] = true;
> 					  	return true;
> 					  });
> }
>
> ReturnType!_fun doesn't work, if i set to the right type, function works as expected


auto distinct(alias fun = "a", Range)(Range r)
if (isInputRange!Range) {
	alias unaryFun!fun _fun;

	bool[typeof(_fun((ElementType!Range).init))] mySet;
	
	return r.filter2!((k) {
					  	if (_fun(k) in mySet)
					  		return false;
					  	mySet[_fun(k)] = true;
					  	return true;
					  });
}


struct User
{
	int id;
	string name;
}

void main() {
	User[] users = [ {1, "first"}, {2, "second"}, {3, "first"}];

	users.distinct!("a.id")().writeln;
	users.distinct!("a.name")().writeln;

}

this works. Is this:
bool[typeof(_fun((ElementType!Range).init))] mySet;
	
the right way?
March 09, 2013
Andrea Fontana:

> Maybe two versions (filter and cachedFilter) or a bool template param?

In the meantime I have filed the problem, I hope Andrei will take a look:

http://d.puremagic.com/issues/show_bug.cgi?id=9674

Bye,
bearophile
March 09, 2013
Andrea Fontana:

> Is this:
> bool[typeof(_fun((ElementType!Range).init))] mySet;
> 	
> the right way?

I don't know if that's the best way, but I think it's OK. (I think it should even go in the D wiki, so if it's not the best to do it, someone will improve it.)

Bye,
bearophile
March 09, 2013
On 3/8/13 1:34 PM, Chris Cain wrote:
> On Friday, 8 March 2013 at 18:17:22 UTC, bearophile wrote:
>> auto firstDistinct(Range)(Range r, in size_t n) {
>> bool[ForeachType!Range] mySet;
>>
>> return r.filter!((k) {
>> if (k in mySet)
>> return false;
>> mySet[k] = true;
>> return true;
>> }).take(n);
>> }
>>
>
> I have to say, that's a great example of beautiful simplicity in
> programming. Bravo.

Yah, impressive. Nice work bearophile.

Andrei


March 09, 2013
Andrei Alexandrescu:

> Yah,

There are 3 problems:
- In a successive post I have also added a template constraint, so Range is an InputRange.
- firstDistinct() packs two behaviours that are better kept separated, it's like a takeDistinct. It's better to remove the final take, and return all distinct items. If a user wants the first ones, he/she/shi will add a take after it. This is why later in this thread we have used distinct(). In very few cases I'd like some extra functions in Phobos like flatMap and amap, because they are very common operations. But in most cases it's better to keep only the simplest algorithms (like distinct and take), so you can combine them in many different ways.
- Successive posts have shown that the code doesn't work. After several tests I have opened this: http://d.puremagic.com/issues/show_bug.cgi?id=9674  Please take a look at it.
- I have recently written a little D program, about something else, that I think you will appreciate even more than that firstDistinct/distinct :-)

Bye,
bearophile