Thread overview
Remove duplicates
May 21, 2013
bearophile
May 21, 2013
Namespace
May 21, 2013
bearophile
May 21, 2013
Namespace
May 21, 2013
bearophile
May 22, 2013
bearophile
May 21, 2013
Sometimes I have need a simple function like this, related to std.string.squeeze:


// Must keep the original order of the items.
// Slow implementation that shows the semantics.
T[] noDupes(T)(in T[] s) {
    import std.algorithm: canFind;
    T[] result;
    foreach (T c; s)
        if (!result.canFind(c))
            result ~= c;
    return result;
}

void main() {
    import std.string: squeeze;
    assert("AAAA".noDupes == "A");
    assert("AAAA".squeeze == "A");
    assert("ABAC".noDupes == "ABC");
    assert("ABAC".squeeze == "ABAC");
}


Do you know if this function (or a simple way to implement it) already in Phobos?

Bye,
bearophile
May 21, 2013
On Tuesday, 21 May 2013 at 22:00:09 UTC, bearophile wrote:
> Sometimes I have need a simple function like this, related to std.string.squeeze:
>
>
> // Must keep the original order of the items.
> // Slow implementation that shows the semantics.
> T[] noDupes(T)(in T[] s) {
>     import std.algorithm: canFind;
>     T[] result;
>     foreach (T c; s)
>         if (!result.canFind(c))
>             result ~= c;
>     return result;
> }
>
> void main() {
>     import std.string: squeeze;
>     assert("AAAA".noDupes == "A");
>     assert("AAAA".squeeze == "A");
>     assert("ABAC".noDupes == "ABC");
>     assert("ABAC".squeeze == "ABAC");
> }
>
>
> Do you know if this function (or a simple way to implement it) already in Phobos?
>
> Bye,
> bearophile

I would prefer a map solution like this:

----
@property
T[] noDupes(T)(const T[] arr) {
    bool[T] map;

    foreach (T item; arr) {
        if (item !in map)
            map[item] = true;
    }

    return map.keys;
}
----

I do not know if this solution would be much faster. But if I should make a statement, I would tend to yes.
May 21, 2013
Namespace:

>> // Slow implementation that shows the semantics.
> ...
> I do not know if this solution would be much faster. But if I should make a statement, I would tend to yes.

(The point of my code was to show the semantics as much clearly as possible, it was not to show fast code.)

If no similar function is in Phobos, and there is no trivial way to implement it efficiently, then maybe it's worth writing a Phobos enhancement request.

Bye,
bearophile
May 21, 2013
On Tuesday, 21 May 2013 at 22:51:26 UTC, bearophile wrote:
> Namespace:
>
>>> // Slow implementation that shows the semantics.
>> ...
>> I do not know if this solution would be much faster. But if I should make a statement, I would tend to yes.
>
> (The point of my code was to show the semantics as much clearly as possible, it was not to show fast code.)
>
> If no similar function is in Phobos, and there is no trivial way to implement it efficiently, then maybe it's worth writing a Phobos enhancement request.
>
> Bye,
> bearophile

http://dlang.org/phobos/std_algorithm.html#uniq
May 21, 2013
Namespace:

> http://dlang.org/phobos/std_algorithm.html#uniq

Just like squeeze, uniq requires the items to be sorted, to remove the duplicates. The function I am discussing here must keep the order of the not removed items.

Bye,
bearophile
May 21, 2013
On Tue, 21 May 2013 18:00:07 -0400, bearophile <bearophileHUGS@lycos.com> wrote:

> Sometimes I have need a simple function like this, related to std.string.squeeze:
>
>
> // Must keep the original order of the items.
> // Slow implementation that shows the semantics.
> T[] noDupes(T)(in T[] s) {
>      import std.algorithm: canFind;
>      T[] result;
>      foreach (T c; s)
>          if (!result.canFind(c))
>              result ~= c;
>      return result;
> }
>
> void main() {
>      import std.string: squeeze;
>      assert("AAAA".noDupes == "A");
>      assert("AAAA".squeeze == "A");
>      assert("ABAC".noDupes == "ABC");
>      assert("ABAC".squeeze == "ABAC");
> }
>
>
> Do you know if this function (or a simple way to implement it) already in Phobos?

This seems to work for ASCII strings, but the conversion to array is required (I think because writeln may re-evaluate the range's front more than once):

Should be O(n), but doesn't really get around the memory allocation.  You could probably write a custom range that does this properly (doesn't set present until popFront is called):

import std.algorithm;
import std.array;
import std.stdio;

void main()
{
    bool present[256];
    // writes "ABC"
    writeln(array("ABAC".filter!((a){scope(exit) present[a] = true; return !present[a];})));
}

Kind of a violation of how a range should work, front should be the same across multiple calls!  But it does the trick :)

-Steve
May 22, 2013
Steven Schveighoffer:

> This seems to work for ASCII strings, but the conversion to array is required (I think because writeln may re-evaluate the range's front more than once):

Thank you, I have opened an ER:

http://d.puremagic.com/issues/show_bug.cgi?id=10131

Bye,
bearophile