Jump to page: 1 2 3
Thread overview
D idom for removing array elements
Jan 26, 2017
albert-j
Jan 26, 2017
Nicholas Wilson
Jan 26, 2017
Stefan Koch
Jan 26, 2017
Dukc
Jan 26, 2017
albert-j
Jan 27, 2017
Stefan Koch
Jan 27, 2017
Dukc
Jan 27, 2017
Dukc
Jan 27, 2017
cym13
Jan 27, 2017
cym13
Jan 27, 2017
albert-j
Jan 27, 2017
Dukc
Jan 28, 2017
cym13
Jan 28, 2017
albert-j
Jan 28, 2017
cym13
Jan 29, 2017
albert-j
Jan 29, 2017
Jordan Wilson
Jan 29, 2017
Jordan Wilson
Jan 30, 2017
albert-j
Jan 30, 2017
ag0aep6g
Jan 30, 2017
albert-j
Jan 30, 2017
albert-j
Jan 30, 2017
cym13
Jan 30, 2017
cym13
Jan 30, 2017
albert-j
Jan 30, 2017
albert-j
Jan 30, 2017
ag0aep6g
Jan 27, 2017
Dukc
Jan 26, 2017
Jordan Wilson
January 26, 2017
What is the D idiom for removing array elements that are present in another array?

Is this the right/fastest way?

int[] a = [1, 2, 3, 4, 5, 6, 7, 4];
int[] b = [3, 4, 6];
auto c = a.remove!(x => b.canFind(x));
assert(c == [1, 2, 5, 7]);
January 26, 2017
On Thursday, 26 January 2017 at 08:22:09 UTC, albert-j wrote:
> What is the D idiom for removing array elements that are present in another array?
>
> Is this the right/fastest way?
>
> int[] a = [1, 2, 3, 4, 5, 6, 7, 4];
> int[] b = [3, 4, 6];
> auto c = a.remove!(x => b.canFind(x));
> assert(c == [1, 2, 5, 7]);

filter.

auto c = a.filter!(x => !b.canFind(x)).array;


January 26, 2017
On Thursday, 26 January 2017 at 08:22:09 UTC, albert-j wrote:
> What is the D idiom for removing array elements that are present in another array?
>
> Is this the right/fastest way?
>
> int[] a = [1, 2, 3, 4, 5, 6, 7, 4];
> int[] b = [3, 4, 6];
> auto c = a.remove!(x => b.canFind(x));
> assert(c == [1, 2, 5, 7]);

import std.stdio, std.algorithm, std.range, std.array;
int[] a = [1, 2, 3, 4, 5, 6, 7, 4];
int[] b = [3, 4, 6];
auto sortedB = sort(b.dup);
auto c = a
.   filter!(i => !sortedB.contains(i))
.   array
;
assert(c == [1, 2, 5, 7]);

If arrays get large, this should be more efficient since it performs O(n * n.log) instead of O(n * n).

On the other hand you could also just use assumeSorted if b is already sorted, as in this case. But if you do, I still recommend adding an assert to check the range truly is sorted when debugging.

It can be made to perform O(n) by sorting both a and b, but then you need to figure a way to return a back to normal order after filtering. I tried, and found it to be so hard I didn't bother.
January 26, 2017
On Thursday, 26 January 2017 at 11:44:27 UTC, Nicholas Wilson wrote:
> On Thursday, 26 January 2017 at 08:22:09 UTC, albert-j wrote:
>> What is the D idiom for removing array elements that are present in another array?
>>
>> Is this the right/fastest way?
>>
>> int[] a = [1, 2, 3, 4, 5, 6, 7, 4];
>> int[] b = [3, 4, 6];
>> auto c = a.remove!(x => b.canFind(x));
>> assert(c == [1, 2, 5, 7]);
>
> filter.
>
> auto c = a.filter!(x => !b.canFind(x)).array;

He wanted to remove elements.
This will allocate a freaking new array.
And do N function calls to build it.
This is WORSE!

January 26, 2017
On Thursday, 26 January 2017 at 08:22:09 UTC, albert-j wrote:
> What is the D idiom for removing array elements that are present in another array?
>
> Is this the right/fastest way?
>
> int[] a = [1, 2, 3, 4, 5, 6, 7, 4];
> int[] b = [3, 4, 6];
> auto c = a.remove!(x => b.canFind(x));
> assert(c == [1, 2, 5, 7]);

If you don't care about array a being mutated, then I think what you have is best (although I would suggest that if you don't care about a being mutated, just reassign the results back to a again).

Otherwise, I think you need to allocate a new array, so the other answers using filter are good.
January 26, 2017
On Thursday, 26 January 2017 at 13:21:38 UTC, Dukc wrote:

> import std.stdio, std.algorithm, std.range, std.array;
> int[] a = [1, 2, 3, 4, 5, 6, 7, 4];
> int[] b = [3, 4, 6];
> auto sortedB = sort(b.dup);
> auto c = a
> .   filter!(i => !sortedB.contains(i))
> .   array
> ;
> assert(c == [1, 2, 5, 7]);
>
> If arrays get large, this should be more efficient since it performs O(n * n.log) instead of O(n * n).

It does look much faster than my solution. Will it also work correctly and fast for arrays of custom objects? How should opCmp() be defined if objects don't have a meaningful ordering? The order of elements in the original array does not matter.
January 27, 2017
On Thursday, 26 January 2017 at 23:10:02 UTC, albert-j wrote:
> On Thursday, 26 January 2017 at 13:21:38 UTC, Dukc wrote:
>
>> import std.stdio, std.algorithm, std.range, std.array;
>> int[] a = [1, 2, 3, 4, 5, 6, 7, 4];
>> int[] b = [3, 4, 6];
>> auto sortedB = sort(b.dup);
>> auto c = a
>> .   filter!(i => !sortedB.contains(i))
>> .   array
>> ;
>> assert(c == [1, 2, 5, 7]);
>>
>> If arrays get large, this should be more efficient since it performs O(n * n.log) instead of O(n * n).
>
> It does look much faster than my solution. Will it also work correctly and fast for arrays of custom objects? How should opCmp() be defined if objects don't have a meaningful ordering? The order of elements in the original array does not matter.

To me it looks rather slow.
please benchmark!
January 27, 2017
On Friday, 27 January 2017 at 05:48:27 UTC, Stefan Koch wrote:
> To me it looks rather slow.
> please benchmark!

void main()
{   import std.stdio, std.algorithm, std.range, std.array, std.datetime;
    int[] a = [1, 2, 3, 4, 5, 6, 7, 4].cycle.take(2000).array;
    int[] b = [3, 4, 6].cycle.take(2000).array;

    void originalMethod()
    {   auto c = a.remove!(x => b.canFind(x));
        assert(c[0 .. 4] == [1, 2, 5, 7]);
    }

    void WilsonMethod()
    {   auto c = a.filter!(x => !b.canFind(x)).array;
        assert(c[0 .. 4] == [1, 2, 5, 7]);
    }

    void myMethod()
    {   auto sortedB = sort(b.dup);
        auto c = a
        .   filter!(i => !sortedB.contains(i))
        .   array
        ;
        assert(c[0 .. 4] == [1, 2, 5, 7]);
    }

    auto r = benchmark!(originalMethod, WilsonMethod, myMethod)(1);
    foreach(result; r) result.writeln;
}

resulted in:
TickDuration(28085)
TickDuration(42868)
TickDuration(1509)

So, you were correct that the original method is better than Wilsons filter. My method is much better for large arrays I tested here. But what I think you were afraid of is that it needlessly wastes performance by allocating a new array, and I agree.

Also, as I said, it could be made O(n).
January 27, 2017
On Thursday, 26 January 2017 at 23:10:02 UTC, albert-j wrote:
> Will it also work correctly and fast for arrays of custom objects? How should opCmp() be defined if objects don't have a meaningful ordering? The order of elements in the original array does not matter.

Two options: either define opCmp for the custom objects, or define ordering for the sort algorithm (see std.algorithm in phobos documentation)


January 27, 2017
On Friday, 27 January 2017 at 08:15:56 UTC, Dukc wrote:
> My method is much better for large arrays I tested here.

Trough, considering the size of the arrays the performance difference should be even greater, like 1000X better instead of 15X better so it's definitely not that great.

« First   ‹ Prev
1 2 3