Jump to page: 1 2
Thread overview
Is there a function for this?
Oct 06, 2018
bauss
Oct 06, 2018
Adam D. Ruppe
Oct 06, 2018
bauss
Oct 06, 2018
Basile B
Oct 06, 2018
bauss
Oct 06, 2018
Seb
Oct 06, 2018
Basile B
Oct 06, 2018
bauss
Oct 06, 2018
H. S. Teoh
Oct 06, 2018
Ali Çehreli
Oct 06, 2018
Nicholas Wilson
Oct 08, 2018
Piotr Mitana
October 06, 2018
Let's say you have a range with struct, but some of the struct are duplicates of each other.

Is there a standard function in Phobos to remove duplicates?

My first thought was "uniq", but it can't really do it like that, but it doesn't work.

See: https://run.dlang.io/is/IcFEtw

Is there another function in Phobos that perhaps would work?

I can of course write my own function, but if there is a standard function that can do it, then I'd rather use that.
October 06, 2018
On Saturday, 6 October 2018 at 13:17:22 UTC, bauss wrote:
> My first thought was "uniq", but it can't really do it like that, but it doesn't work.

uniq needs it to be sorted first, it only compares side-by-side (to avoid allocating space to remember what it has already seen)

> Is there another function in Phobos that perhaps would work?

If the set is small enough and it is at least a forward range, you could do a O(n^2) search with like filter with !canFind (making sure the one it finds is not the one you are currently looking at).

But I think you are better off doing your own thing. I like to just stick them all in an associative array, then loop over that. The duplicate keys will naturally be filtered that way.

Of course, it allocates memory for that but that's often preferable to running the expensive search anyway.
October 06, 2018
On Saturday, 6 October 2018 at 13:17:22 UTC, bauss wrote:
> Let's say you have a range with struct, but some of the struct are duplicates of each other.
>
> Is there a standard function in Phobos to remove duplicates?
>
> My first thought was "uniq", but it can't really do it like that, but it doesn't work.
>
> See: https://run.dlang.io/is/IcFEtw
>
> Is there another function in Phobos that perhaps would work?
>
> I can of course write my own function, but if there is a standard function that can do it, then I'd rather use that.

see https://www.programming-idioms.org/idiom/119/deduplicate-list.
October 06, 2018
On Saturday, 6 October 2018 at 13:17:22 UTC, bauss wrote:
> Let's say you have a range with struct, but some of the struct are duplicates of each other.
>
> Is there a standard function in Phobos to remove duplicates?
>
> My first thought was "uniq", but it can't really do it like that, but it doesn't work.
>
> See: https://run.dlang.io/is/IcFEtw
>
> Is there another function in Phobos that perhaps would work?
>
> I can of course write my own function, but if there is a standard function that can do it, then I'd rather use that.

range
    .array.sort // make equal elements be adjacent
    .chunks!"a==b" // get a range over those equal elements
    .map(e => e.front); // get the first one

should work, possibly not the most efficient way to do it though.

I don't know off the top of mu head if there are any to preserve the order or the original range.
October 06, 2018
On Saturday, 6 October 2018 at 13:34:44 UTC, Adam D. Ruppe wrote:
> On Saturday, 6 October 2018 at 13:17:22 UTC, bauss wrote:
>> My first thought was "uniq", but it can't really do it like that, but it doesn't work.
>
> uniq needs it to be sorted first, it only compares side-by-side (to avoid allocating space to remember what it has already seen)
>
>> Is there another function in Phobos that perhaps would work?
>
> If the set is small enough and it is at least a forward range, you could do a O(n^2) search with like filter with !canFind (making sure the one it finds is not the one you are currently looking at).
>
> But I think you are better off doing your own thing. I like to just stick them all in an associative array, then loop over that. The duplicate keys will naturally be filtered that way.
>
> Of course, it allocates memory for that but that's often preferable to running the expensive search anyway.

It "can" be a big set, but it's probably the most straightforward way for now thought.

On Saturday, 6 October 2018 at 13:38:34 UTC, Nicholas Wilson wrote:
> On Saturday, 6 October 2018 at 13:17:22 UTC, bauss wrote:
>> Let's say you have a range with struct, but some of the struct are duplicates of each other.
>>
>> Is there a standard function in Phobos to remove duplicates?
>>
>> My first thought was "uniq", but it can't really do it like that, but it doesn't work.
>>
>> See: https://run.dlang.io/is/IcFEtw
>>
>> Is there another function in Phobos that perhaps would work?
>>
>> I can of course write my own function, but if there is a standard function that can do it, then I'd rather use that.
>
> range
>     .array.sort // make equal elements be adjacent
>     .chunks!"a==b" // get a range over those equal elements
>     .map(e => e.front); // get the first one
>
> should work, possibly not the most efficient way to do it though.
>
> I don't know off the top of mu head if there are any to preserve the order or the original range.

That doesn't work tbh. won't even compile.


October 06, 2018
On Saturday, 6 October 2018 at 13:35:38 UTC, Basile B wrote:
> On Saturday, 6 October 2018 at 13:17:22 UTC, bauss wrote:
>> Let's say you have a range with struct, but some of the struct are duplicates of each other.
>>
>> Is there a standard function in Phobos to remove duplicates?
>>
>> My first thought was "uniq", but it can't really do it like that, but it doesn't work.
>>
>> See: https://run.dlang.io/is/IcFEtw
>>
>> Is there another function in Phobos that perhaps would work?
>>
>> I can of course write my own function, but if there is a standard function that can do it, then I'd rather use that.
>
> see https://www.programming-idioms.org/idiom/119/deduplicate-list.

Did you even read my post? I stated I could already write the function myself, but if a standard function existed I'd rather use that.
October 06, 2018
On Saturday, 6 October 2018 at 13:56:32 UTC, bauss wrote:
> On Saturday, 6 October 2018 at 13:35:38 UTC, Basile B wrote:
>> On Saturday, 6 October 2018 at 13:17:22 UTC, bauss wrote:
>>> [...]
>>
>> see https://www.programming-idioms.org/idiom/119/deduplicate-list.
>
> Did you even read my post? I stated I could already write the function myself, but if a standard function existed I'd rather use that.

He did and x.redBlackTree[] isn't too bad!
October 06, 2018
On Saturday, 6 October 2018 at 13:56:32 UTC, bauss wrote:
> On Saturday, 6 October 2018 at 13:35:38 UTC, Basile B wrote:
>> On Saturday, 6 October 2018 at 13:17:22 UTC, bauss wrote:
>>> Let's say you have a range with struct, but some of the struct are duplicates of each other.
>>>
>>> Is there a standard function in Phobos to remove duplicates?
>>>
>>> My first thought was "uniq", but it can't really do it like that, but it doesn't work.
>>>
>>> See: https://run.dlang.io/is/IcFEtw
>>>
>>> Is there another function in Phobos that perhaps would work?
>>>
>>> I can of course write my own function, but if there is a standard function that can do it, then I'd rather use that.
>>
>> see https://www.programming-idioms.org/idiom/119/deduplicate-list.
>
> Did you even read my post? I stated I could already write the function myself, but if a standard function existed I'd rather use that.

There are two solutions there. One shows how to use uniq, which is why i posted this since you said that it doesn't work as you expected. I didn't suggest you to write a full double loop dedup. routine, i.e àla Golang. Unfortunately the site doesn't allow to link the solutions of a particular language among those who implement the idiom.
October 06, 2018
On Saturday, 6 October 2018 at 15:35:39 UTC, Basile B wrote:
> On Saturday, 6 October 2018 at 13:56:32 UTC, bauss wrote:
>> On Saturday, 6 October 2018 at 13:35:38 UTC, Basile B wrote:
>>> On Saturday, 6 October 2018 at 13:17:22 UTC, bauss wrote:
>>>> Let's say you have a range with struct, but some of the struct are duplicates of each other.
>>>>
>>>> Is there a standard function in Phobos to remove duplicates?
>>>>
>>>> My first thought was "uniq", but it can't really do it like that, but it doesn't work.
>>>>
>>>> See: https://run.dlang.io/is/IcFEtw
>>>>
>>>> Is there another function in Phobos that perhaps would work?
>>>>
>>>> I can of course write my own function, but if there is a standard function that can do it, then I'd rather use that.
>>>
>>> see https://www.programming-idioms.org/idiom/119/deduplicate-list.
>>
>> Did you even read my post? I stated I could already write the function myself, but if a standard function existed I'd rather use that.
>
> There are two solutions there. One shows how to use uniq, which is why i posted this since you said that it doesn't work as you expected. I didn't suggest you to write a full double loop dedup. routine, i.e àla Golang. Unfortunately the site doesn't allow to link the solutions of a particular language among those who implement the idiom.

uniq will not work with, say a class and the class will require you to implement opCmp, which you can't always do for classes you don't have access to.

That's the problem here. It's easy enough without, but you cannot do it by ex. a property of a class. It will not work properly.

You can't even use .group and then .map because you need to use .sort and it requires opCmp again.

The whole problem is actually that they do not work with ranges that aren't sorted. Things like .group and .uniq should work without sorted ranges. You can't always expect a range to be sorted to perform such algorithms.

October 06, 2018
On Sat, Oct 06, 2018 at 08:07:42PM +0000, bauss via Digitalmars-d-learn wrote: [...]
> The whole problem is actually that they do not work with ranges that aren't sorted. Things like .group and .uniq should work without sorted ranges. You can't always expect a range to be sorted to perform such algorithms.

The reason they don't work with ranges that aren't sorted is because the algorithms that don't require sorted ranges are more expensive, or need to allocate (non-@nogc), and are sometimes application-specific, meaning that you're better off writing your own algorithm for it that takes advantage of the specific knowledge you have about your data, rather than some poorly-performing generic algorithm that can't possibly optimize for every possible use case.

Not to mention, the expensiveness of such algorithms comes from having to do work that basically amounts to sorting the data, so you might as well sort them yourself in the first place.


T

-- 
Beware of bugs in the above code; I have only proved it correct, not tried it. -- Donald Knuth
« First   ‹ Prev
1 2