January 27, 2017
On Friday, 27 January 2017 at 08:15:56 UTC, Dukc wrote:

> TickDuration(28085)
> TickDuration(42868)
> TickDuration(1509)


Thank you, this is very helpful. I am also wondering why the standard library doesn't have convenience functions for this, e.g. like Java's removeAll? Now there's more typing than necessary for a relatively common task.

January 27, 2017
On Friday, 27 January 2017 at 10:20:19 UTC, albert-j wrote:
> I am also wondering why the standard library doesn't have convenience functions for this, e.g. like Java's removeAll? Now there's more typing than necessary for a relatively common task.

That might be a good addition considering how well it can be optimized compared to a naive implementation. Of course, D version of that would accept ranges as input, not only collections.
January 27, 2017
On Friday, 27 January 2017 at 08:30:41 UTC, Dukc wrote:
> On Friday, 27 January 2017 at 08:15:56 UTC, Dukc wrote:
>> My method is much better for large arrays I tested here.
>
> Trough, considering the size of the arrays the performance difference should be even greater, like 1000X better instead of 15X better so it's definitely not that great.

Note that if the set of values to be excluded isn't smaller than the haystack then using partition is way faster and your method is the slowest of all. If the order of the array matters stable partition can be used but is slower than the original method.

Added test cases and benchmark results:


    void partitionMethod() {
        auto c = a.partition!(v => b.canFind(v));
        // Sort to recover the order
        assert(sort(c[0..4]).array == [1, 2, 5, 7]);
    }

    void stablePartitionMethod() {
        auto c = a.partition!(v => b.canFind(v), SwapStrategy.stable);
        assert(c[0..4] == [1, 2, 5, 7]);
    }

    benchmark!(originalMethod,
               WilsonMethod,
               myMethod,
               partitionMethod,
               stablePartitionMethod)(1)
    .each!writeln;


With "a" of length 4000 and "b" of length 4000:

/*
int[] a = [1, 2, 3, 4, 5, 6, 7, 4].cycle.take(4000).array;
int[] b = [3, 4, 6].cycle.take(4000).array;
*/

TickDuration(51482830)
TickDuration(43634792)
TickDuration(1085159)  // myMethod is faster
TickDuration(44216820) // 3rd
TickDuration(42920567) // 2nd

With "a" of length 400000 and "b" of length 3:

/*
int[] a = [1, 2, 3, 4, 5, 6, 7, 4].cycle.take(400000).array;
int[] b = [3, 4, 6];
*/

TickDuration(312038) // 2nd
TickDuration(541912)
TickDuration(606883)
TickDuration(190662) // partitionMethod is way faster
TickDuration(418751) // 3rd

January 27, 2017
On Friday, 27 January 2017 at 15:39:57 UTC, cym13 wrote:
> On Friday, 27 January 2017 at 08:30:41 UTC, Dukc wrote:
>> [...]
>
> Note that if the set of values to be excluded isn't smaller than the haystack then using partition is way faster and your method is the slowest of all. If the order of the array matters stable partition can be used but is slower than the original method.
>
> [...]

I forgot to say that the main reason to use partition is that it's easy to use it to remove the elements of the array in place:

         auto r = a.partition!(v => !b.canFind(v));
         a.length -= r.length;

It just puts the "bad" elements at the end and changes the length of the array.
January 28, 2017
On Friday, 27 January 2017 at 08:15:56 UTC, Dukc wrote:

> void main()
> {   import std.stdio, std.algorithm, std.range, std.array, std.datetime;
>     int[] a = [1, 2, 3, 4, 5, 6, 7, 4].cycle.take(2000).array;
>     int[] b = [3, 4, 6].cycle.take(2000).array;
>
>     void originalMethod()
>     {   auto c = a.remove!(x => b.canFind(x));
>         assert(c[0 .. 4] == [1, 2, 5, 7]);
>     }
>
>     void WilsonMethod()
>     {   auto c = a.filter!(x => !b.canFind(x)).array;
>         assert(c[0 .. 4] == [1, 2, 5, 7]);
>     }
>
>     void myMethod()
>     {   auto sortedB = sort(b.dup);
>         auto c = a
>         .   filter!(i => !sortedB.contains(i))
>         .   array
>         ;
>         assert(c[0 .. 4] == [1, 2, 5, 7]);
>     }
>
>     auto r = benchmark!(originalMethod, WilsonMethod, myMethod)(1);
>     foreach(result; r) result.writeln;
> }


How to make it work with std.container.array?


January 28, 2017
On Friday, 27 January 2017 at 11:56:02 UTC, Dukc wrote:
> On Friday, 27 January 2017 at 10:20:19 UTC, albert-j wrote:
>> I am also wondering why the standard library doesn't have convenience functions for this, e.g. like Java's removeAll? Now there's more typing than necessary for a relatively common task.
>
> That might be a good addition considering how well it can be optimized compared to a naive implementation. Of course, D version of that would accept ranges as input, not only collections.

Just to follow up, I've been playing a bit and the fastest I've got so far is:

#!/usr/bin/env rdmd

import std.conv;
import std.stdio;
import std.array;
import std.range;
import std.datetime;
import std.algorithm;


void main() {
    int[] a = [1, 2, 3, 4, 5, 6, 7, 4].cycle.take(2000).array;
    int[] b = [3, 4, 6].cycle.take(2000).array;

    void originalMethod()
    {   auto c = a.remove!(x => b.canFind(x));
        assert(c[0 .. 4] == [1, 2, 5, 7]);
    }

    void WilsonMethod()
    {   auto c = a.filter!(x => !b.canFind(x)).array;
        assert(c[0 .. 4] == [1, 2, 5, 7]);
    }

    void myMethod()
    {   auto sortedB = sort(b.dup);
        auto c = a
        .   filter!(i => !sortedB.contains(i))
        .   array
        ;
        assert(c[0 .. 4] == [1, 2, 5, 7]);
    }

    void partitionMethod() {
        auto c = a.partition!(v => b.canFind(v));
        // Sort to recover the order
        assert(sort(c[0..4]).array == [1, 2, 5, 7]);
    }

    void stablePartitionMethod() {
        auto c = a.partition!(v => b.canFind(v), SwapStrategy.stable);
        assert(c[0..4] == [1, 2, 5, 7]);
    }

    void newPartitionMethod()
    {   auto sortedB = sort(b.dup).uniq.array;
        auto c = a.partition!(v => sortedB.assumeSorted.contains(v));
        assert(c[0 .. 4] == [1, 2, 5, 7], c[0..4].to!string);
    }

    void newStablePartitionMethod()
    {   auto sortedB = sort(b.dup).uniq.array;
        auto c = a.partition!(v => sortedB.assumeSorted.contains(v),
                              SwapStrategy.stable);
        assert(c[0 .. 4] == [1, 2, 5, 7], c[0..4].to!string);
    }

    benchmark!(originalMethod,
               WilsonMethod,
               myMethod,
               partitionMethod,
               stablePartitionMethod,
               newPartitionMethod,
               newStablePartitionMethod)(1)
        .each!writeln;
}

/* Results (one of many runs, considered characteristic):
 *     TickDuration(18129442) // originalMethod
 *     TickDuration(28536187) // WilsonMethod
 *     TickDuration(845396)   // myMethod
 *     TickDuration(29936970) // partitionMethod
 *     TickDuration(33447345) // stablePartitionMethod
 *     TickDuration(548226)   // newPartitionMethod
 *     TickDuration(597183)   // newStablePartitionMethod
 */

January 28, 2017
On Saturday, 28 January 2017 at 10:46:29 UTC, albert-j wrote:
> On Friday, 27 January 2017 at 08:15:56 UTC, Dukc wrote:
>
>> void main()
>> {   import std.stdio, std.algorithm, std.range, std.array, std.datetime;
>>     int[] a = [1, 2, 3, 4, 5, 6, 7, 4].cycle.take(2000).array;
>>     int[] b = [3, 4, 6].cycle.take(2000).array;
>>
>>     void originalMethod()
>>     {   auto c = a.remove!(x => b.canFind(x));
>>         assert(c[0 .. 4] == [1, 2, 5, 7]);
>>     }
>>
>>     void WilsonMethod()
>>     {   auto c = a.filter!(x => !b.canFind(x)).array;
>>         assert(c[0 .. 4] == [1, 2, 5, 7]);
>>     }
>>
>>     void myMethod()
>>     {   auto sortedB = sort(b.dup);
>>         auto c = a
>>         .   filter!(i => !sortedB.contains(i))
>>         .   array
>>         ;
>>         assert(c[0 .. 4] == [1, 2, 5, 7]);
>>     }
>>
>>     auto r = benchmark!(originalMethod, WilsonMethod, myMethod)(1);
>>     foreach(result; r) result.writeln;
>> }
>
>
> How to make it work with std.container.array?

Here is an example:

import std.container.array;
alias ArrI = Array!int;

auto removeAll(ArrI a, ArrI b) {
    import std.array;
    import std.range;
    import std.algorithm;

    auto sortedB = sort(b[]).uniq.array;
    auto c = a[].partition!(v => sortedB.assumeSorted.contains(v),
                          SwapStrategy.stable);
    return ArrI(c);
}

void main() {
    import std.stdio;

    auto a = ArrI(1, 2, 3, 4, 5, 6, 7, 4);
    auto b = ArrI(3, 4, 6);

    a.removeAll(b)[].writeln;
}

Note especially how you slice an Array to access its data.
January 29, 2017
On Saturday, 28 January 2017 at 11:54:58 UTC, cym13 wrote:
> 

I am trying to wrap my head around lazy evaluation during filtering/mapping, but there's something I don't understand.
I want to create an array, square some elements, remove some elements from original array and add the squared ones to the original array:

    import std.stdio, std.algorithm, std.array;

    int[] arr;
    foreach (i; 0..10)
        arr ~= i;

    writeln("Original array: ",arr);
    // [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] -- OK

    auto arrMap = arr.filter!(x => x > 5).map!(x => x^^2);
    writeln("arrMap: ", arrMap);
    // [36, 49, 64, 81] -- OK

    int[] toRemove = [1, 2, 9];
    arr = arr.remove!(x => toRemove.canFind(x)).array;

    writeln("Original array after removal: ", arr);
    // [0, 3, 4, 5, 6, 7, 8] -- OK

    arr ~= arrMap.array;

    writeln("Original array after removal and concatenation: ", arr);
    // [0, 3, 4, 5, 6, 7, 8, 64, 49, 64, 81] -- what?

The last result is not what I wanted. I would expect [0, 3, 4, 5, 6, 7, 8] and [36, 49, 64, 81] concatenated into [0, 3, 4, 5, 6, 7, 8, 36, 49, 64, 81], but something else is happening here.
It looks like arr = arr.remove!....  is messing things up, but why?

January 29, 2017
On Sunday, 29 January 2017 at 21:41:57 UTC, albert-j wrote:
> On Saturday, 28 January 2017 at 11:54:58 UTC, cym13 wrote:
>> [...]
>
> I am trying to wrap my head around lazy evaluation during filtering/mapping, but there's something I don't understand.
> I want to create an array, square some elements, remove some elements from original array and add the squared ones to the original array:
>
> [...]

You need to do something like this:
auto arrMap = arr.filter!(x => x > 5).map!(x => x^^2).array;

It's because arrMap is lazy evaluated.
January 29, 2017
On Sunday, 29 January 2017 at 23:42:40 UTC, Jordan Wilson wrote:
> On Sunday, 29 January 2017 at 21:41:57 UTC, albert-j wrote:
>> On Saturday, 28 January 2017 at 11:54:58 UTC, cym13 wrote:
>>> [...]
>>
>> I am trying to wrap my head around lazy evaluation during filtering/mapping, but there's something I don't understand.
>> I want to create an array, square some elements, remove some elements from original array and add the squared ones to the original array:
>>
>> [...]
>
> You need to do something like this:
> auto arrMap = arr.filter!(x => x > 5).map!(x => x^^2).array;
>
> It's because arrMap is lazy evaluated.

Specifically:
arr ~= arrMap.array;
will cause arrMap to be evaluated again using whatever arr is.

So instead of:
auto arrMap = arr.filter!(x => x > 5).map!(x => x^^2);
// mutate arr
arr ~= arrMap.array;

you would want:
auto arrMap = arr.filter!(x => x > 5).map!(x => x^^2).array;
// mutate arr
arr ~= arrMap;