D idom for removing array elements

What is the D idiom for removing array elements that are present in another array? Is this the right/fastest way? int[] a = [1, 2, 3, 4, 5, 6, 7, 4]; int[] b = [3, 4, 6]; auto c = a.remove!(x => b.canFind(x)); assert(c == [1, 2, 5, 7]);

On Thursday, 26 January 2017 at 08:22:09 UTC, albert-j wrote: > What is the D idiom for removing array elements that are present in another array? > > Is this the right/fastest way? > > int[] a = [1, 2, 3, 4, 5, 6, 7, 4]; > int[] b = [3, 4, 6]; > auto c = a.remove!(x => b.canFind(x)); > assert(c == [1, 2, 5, 7]); import std.stdio, std.algorithm, std.range, std.array; int[] a = [1, 2, 3, 4, 5, 6, 7, 4]; int[] b = [3, 4, 6]; auto sortedB = sort(b.dup); auto c = a . filter!(i => !sortedB.contains(i)) . array ; assert(c == [1, 2, 5, 7]); If arrays get large, this should be more efficient since it performs O(n * n.log) instead of O(n * n). On the other hand you could also just use assumeSorted if b is already sorted, as in this case. But if you do, I still recommend adding an assert to check the range truly is sorted when debugging. It can be made to perform O(n) by sorting both a and b, but then you need to figure a way to return a back to normal order after filtering. I tried, and found it to be so hard I didn't bother.

On Thursday, 26 January 2017 at 11:44:27 UTC, Nicholas Wilson wrote: > On Thursday, 26 January 2017 at 08:22:09 UTC, albert-j wrote: >> What is the D idiom for removing array elements that are present in another array? >> >> Is this the right/fastest way? >> >> int[] a = [1, 2, 3, 4, 5, 6, 7, 4]; >> int[] b = [3, 4, 6]; >> auto c = a.remove!(x => b.canFind(x)); >> assert(c == [1, 2, 5, 7]); > > filter. > > auto c = a.filter!(x => !b.canFind(x)).array; He wanted to remove elements. This will allocate a freaking new array. And do N function calls to build it. This is WORSE!

On Thursday, 26 January 2017 at 13:21:38 UTC, Dukc wrote: > import std.stdio, std.algorithm, std.range, std.array; > int[] a = [1, 2, 3, 4, 5, 6, 7, 4]; > int[] b = [3, 4, 6]; > auto sortedB = sort(b.dup); > auto c = a > . filter!(i => !sortedB.contains(i)) > . array > ; > assert(c == [1, 2, 5, 7]); > > If arrays get large, this should be more efficient since it performs O(n * n.log) instead of O(n * n). It does look much faster than my solution. Will it also work correctly and fast for arrays of custom objects? How should opCmp() be defined if objects don't have a meaningful ordering? The order of elements in the original array does not matter.

On Thursday, 26 January 2017 at 23:10:02 UTC, albert-j wrote: > On Thursday, 26 January 2017 at 13:21:38 UTC, Dukc wrote: > >> import std.stdio, std.algorithm, std.range, std.array; >> int[] a = [1, 2, 3, 4, 5, 6, 7, 4]; >> int[] b = [3, 4, 6]; >> auto sortedB = sort(b.dup); >> auto c = a >> . filter!(i => !sortedB.contains(i)) >> . array >> ; >> assert(c == [1, 2, 5, 7]); >> >> If arrays get large, this should be more efficient since it performs O(n * n.log) instead of O(n * n). > > It does look much faster than my solution. Will it also work correctly and fast for arrays of custom objects? How should opCmp() be defined if objects don't have a meaningful ordering? The order of elements in the original array does not matter. To me it looks rather slow. please benchmark!

January 27, 2017

Re: D idom for removing array elements

Posted by Dukc
in reply to Stefan Koch

Permalink

Dukc

Posted in reply to Stefan Koch

Permalink

On Friday, 27 January 2017 at 05:48:27 UTC, Stefan Koch wrote:
> To me it looks rather slow.
> please benchmark!

void main()
{   import std.stdio, std.algorithm, std.range, std.array, std.datetime;
    int[] a = [1, 2, 3, 4, 5, 6, 7, 4].cycle.take(2000).array;
    int[] b = [3, 4, 6].cycle.take(2000).array;

    void originalMethod()
    {   auto c = a.remove!(x => b.canFind(x));
        assert(c[0 .. 4] == [1, 2, 5, 7]);
    }

    void WilsonMethod()
    {   auto c = a.filter!(x => !b.canFind(x)).array;
        assert(c[0 .. 4] == [1, 2, 5, 7]);
    }

    void myMethod()
    {   auto sortedB = sort(b.dup);
        auto c = a
        .   filter!(i => !sortedB.contains(i))
        .   array
        ;
        assert(c[0 .. 4] == [1, 2, 5, 7]);
    }

    auto r = benchmark!(originalMethod, WilsonMethod, myMethod)(1);
    foreach(result; r) result.writeln;
}

resulted in:
TickDuration(28085)
TickDuration(42868)
TickDuration(1509)

So, you were correct that the original method is better than Wilsons filter. My method is much better for large arrays I tested here. But what I think you were afraid of is that it needlessly wastes performance by allocating a new array, and I agree.

Also, as I said, it could be made O(n).

On Thursday, 26 January 2017 at 23:10:02 UTC, albert-j wrote: > Will it also work correctly and fast for arrays of custom objects? How should opCmp() be defined if objects don't have a meaningful ordering? The order of elements in the original array does not matter. Two options: either define opCmp for the custom objects, or define ordering for the sort algorithm (see std.algorithm in phobos documentation)

On Friday, 27 January 2017 at 08:15:56 UTC, Dukc wrote: > My method is much better for large arrays I tested here. Trough, considering the size of the arrays the performance difference should be even greater, like 1000X better instead of 15X better so it's definitely not that great.

Forums