Jump to page: 1 24  
Page
Thread overview
We need to rethink remove in std.container
Feb 22, 2011
Jonathan M Davis
Feb 22, 2011
%u
Feb 22, 2011
Jonathan M Davis
Feb 22, 2011
%u
Feb 22, 2011
%u
Feb 22, 2011
Jonathan M Davis
Feb 22, 2011
%u
Feb 22, 2011
Jonathan M Davis
Feb 22, 2011
Denis Koroskin
Feb 22, 2011
Philippe Sigaud
Feb 23, 2011
Philippe Sigaud
Feb 22, 2011
Lutger Blijdestijn
Feb 22, 2011
Jonathan M Davis
Feb 22, 2011
Lutger Blijdestijn
Feb 22, 2011
Jonathan M Davis
Feb 22, 2011
Lutger Blijdestijn
Feb 22, 2011
Lutger Blijdestijn
Feb 22, 2011
Jonathan M Davis
Feb 23, 2011
Jonathan M Davis
Feb 23, 2011
Jonathan M Davis
Feb 22, 2011
Lutger Blijdestijn
Feb 22, 2011
spir
February 22, 2011
Okay, removing elements from a container sucks right now. You can do stuff like removeAny (generally pretty useless IMHO) or removeFront just fine, but removing an arbitrary range from a container just plain sucks.

remove takes a range type which is the range type for the container that it's on. That makes sense. It's obviously not going to be able to a take an arbitrary range and remove that from itself. How would it know which elements in that range corresponded to which elements in itself - especially when it could be a range which skips elements or something similar? So, _that_ part makes sense.

But have you actually tried to get a range of the appropriate type to remove from a container? It seems like almost ever function in std.range and std.algorithm returns a new range type, making them completely useless for processing a range to be removed from a container.

I was looking to remove a single element for a RedBlackTree. The best function that I could think to get the proper range was findSplit. The middle portion of the return value would be the portion that I was looking for. I'd take that and pass it to RedBlackTree's remove. Wrong. It uses takeExactly in its implementation and the first two portions of the result of findSplit aren't the right range type.

So, what do I do? The _only_ thing that I can think of at the moment is to use find to locate the beginning of the range that I want, take the length of the range with walkLength and then use popBackN to pop of the elements I don't want. e.g.

======
import std.algorithm, std.container, std.conv,
       std.range, std.stdio;

void main()
{
    auto rbt = RedBlackTree!int([0, 2, 5, 12, 59, 22]);
    assert(to!string(rbt[]) == "[0, 2, 5, 12, 22, 59]");

    auto found = find(rbt[], 5);
    assert(to!string(found) == "[5, 12, 22, 59]");

    popBackN(found, walkLength(found) - 1);
    assert(to!string(found) == "[5]");

    rbt.remove(found);
    assert(to!string(rbt[]) == "[0, 2, 12, 22, 59]");
}
======

That's disgusting. All that just to remove one element? And what if the range isn't bidirectional (e.g. SList)? Well, then you have no popBack, and as far as I can tell, you're screwed, since you can't use either take or takeExactly, because both of them return new range types.

In particular, the fact that you can't take a range and create a new one of the same type from its first n elements is highly problematic. Maybe we need to add a function to ForwardRange which returns a new range with the first n elements (it certainly looks like that's the key element that's missing here). I don't know if that would be reasonable, but the fact that you can't create a range of the same type as the original range when taking just its first n elements seems crippling in this situation.

I don't know what the proper solution to this is, but the current situation strikes me as untenable. I had to think through this problem for a while before I came to a solution that came even close to working, let alone get one that actually works. Removing elements from a container should _not_ be this hard. The situation with remove _needs_ to be improved.

- Jonathan M Davis
February 22, 2011
> remove takes a range type which is the range type for the
container that it's on. That makes sense. It's obviously not going to be able to a take an arbitrary range and remove that from itself. How would it know which elements in that range corresponded to which elements in itself - especially when it could be a range which skips elements or something similar? So, _that_ part makes sense.

I'm wondering, what was the reason for making a "remove range X from Y" method in the first place? To me, that's not a "remove" operation; if anything, it's the "removeAll" operation, but I'd actually call it "subtract" (because it's the set subtraction operation... although there can be duplicates).

If we change remove to only take in a single element, wouldn't its implementation then be trivial? (For a range, just return a filter that removes the element. For a container, actually remove the element. For ranges that also behave like containers -- like arrays -- I have no idea.)

Does this sound like a good idea?
February 22, 2011
On Monday 21 February 2011 21:04:47 %u wrote:
> > remove takes a range type which is the range type for the
> 
> container that it's on. That makes sense. It's obviously not going to be able to a take an arbitrary range and remove that from itself. How would it know which elements in that range corresponded to which elements in itself - especially when it could be a range which skips elements or something similar? So, _that_ part makes sense.
> 
> I'm wondering, what was the reason for making a "remove range X from Y" method in the first place? To me, that's not a "remove" operation; if anything, it's the "removeAll" operation, but I'd actually call it "subtract" (because it's the set subtraction operation... although there can be duplicates).
> 
> If we change remove to only take in a single element, wouldn't its implementation then be trivial? (For a range, just return a filter that removes the element. For a container, actually remove the element. For ranges that also behave like containers -- like arrays -- I have no idea.)
> 
> Does this sound like a good idea?

Removing a range is not necessarily related to removing an element with a particular value. Sure, a method could be added to the various container types which takes a value and removes the first instance of that value from the container, but that's not what remove is necessarily trying to do. It's like erase in C++'s standard library which takes two iterators. There are plenty of situations where removing a range makes sense. The odd bit with remove in Phobos vs erase in the STL is that with iterators, you can give a single iterator to indicate a single element whereas you can't do that with a range. So, remove takes a range, and if you want to remove a single element, that range only has one element in it. It's a bit weird, but that's fine.

The typical way to remove an element in the STL is to use find to find an element and then erase to remove it. remove in std.container is doing the same thing. The problem is that you can't give the result of find to remove, because instead of a single iterator, find gives you a whole range, and you probably don't want to remove that whole range. You generally either want to remove the first element or some set of elements at the front of the range. So, you need to able to remove the other elements from the range so that you can give that range to the remove method on the container. That's fine in principle, but since so many of the range-based algorithms give you a new type back and you need the exact type that that container uses for its ranges, it's very hard to get a range of the correct type with the correct elements in it.

So, while find will give you the beginning of the range you want, there's not an easy way to remove the end of that range. Functions like take and takeExactly return new types, so they don't work. The more I look at it, the more I'm convinced that we really need to add a primitive to forward ranges that returns the first n elements of that range. Without that, I don't see how you can get a range of the correct type with only those elements in it unless it's also a bidirectional range, which some ranges (like SList's range) aren't.

- Jonathan M Davis
February 22, 2011
> The more I look at it, the more I'm convinced that we really need to
add a primitive to forward ranges that returns the first n elements of that range. Without that, I don't see how you can get a range of the correct type with only those elements in it unless it's also a bidirectional range, which some ranges (like SList's range) aren't.

Huh, yeah I think I agree. It seems like ranges are just delayed evaluation, and that, at some point, we need to be able to force their evaluation -- otherwise, it's like combining Scheme's (delay X) and (force Y) operations with mutation, which just doesn't work sensibly in non-functional programming.
February 22, 2011
You know, I'm actually now questioning whether STL did the right thing with requiring iterators for the erase() method. It actually seems quite pointless -- there's no reason why integer indices wouldn't work. I don't think we really need to follow suit with STL here... is there some situation with erase() that I'm missing that can't be covered with plain integer (or rather, size_t) indices, and that requires ranges?
February 22, 2011
On Monday 21 February 2011 21:51:51 %u wrote:
> You know, I'm actually now questioning whether STL did the right thing with requiring iterators for the erase() method. It actually seems quite pointless -- there's no reason why integer indices wouldn't work. I don't think we really need to follow suit with STL here... is there some situation with erase() that I'm missing that can't be covered with plain integer (or rather, size_t) indices, and that requires ranges?

Some of them do take indices, but you can't generally operate on indices. To do that, you need the original container. But you can operate on an iterate just fine. So, it's generally easier to deal with iterators.

However, the big thing would be that it actually doesn't make sense to use indices in many cases. Think about RedBlackTree for a moment. If you remove something from it or add something to it, that doesn't invalidate any range that you have which points to it (assuming that the end points haven't changed). You can still take that range and use it any any algorithm that you want - including remove (assuming that the algorithm takes the appropriate kind of range of course), and it will work. But using indices wouldn't work. They would have changed. Adding or removing anything to a container at or before an index that you have would make it so that that index no longer points to the same element. However, for many containers (though not all), any iterators or ranges would still be valid.

So, on the whole, using iterators or ranges makes great sense. I have no problem with how the STL does that. The problem is transitioning that to ranges. The STL doesn't go and change your iterator types on you when you feed them to algorithms, but std.range and std.algorithm typically do. So, the result is sub- optimal in this case, to say the least.

- Jonathan M Davis
February 22, 2011
Hm... so the entire issue here seems to be the capability to do iteration and modification in a concurrent manner, right?

IMHO that may not be worth the costs we're paying -- I would argue that you normally shouldn't be modifying a collection that you're iterating over in the first place; it just doesn't make any sense when you're delaying everything. Sure, it would work fine if you couldn't have more than one reference to any container (since every range would would then have the same view of the container), but when we're introducing delayed evaluation with ranges, I just don't see how (or even why) it should be combinable with modification. It's like trying to read from a database and concurrently modifying the underlying storage by hand, bypassing the database... it just doesn't work that way, outside of functional programming. So is it really worth paying the price?
February 22, 2011
On Monday 21 February 2011 22:51:25 %u wrote:
> Hm... so the entire issue here seems to be the capability to do iteration and modification in a concurrent manner, right?
> 
> IMHO that may not be worth the costs we're paying -- I would argue that you normally shouldn't be modifying a collection that you're iterating over in the first place; it just doesn't make any sense when you're delaying everything. Sure, it would work fine if you couldn't have more than one reference to any container (since every range would would then have the same view of the container), but when we're introducing delayed evaluation with ranges, I just don't see how (or even why) it should be combinable with modification. It's like trying to read from a database and concurrently modifying the underlying storage by hand, bypassing the database... it just doesn't work that way, outside of functional programming. So is it really worth paying the price?

_Of course_, it's worth it. Do you never alter anything in a range? Functions like sort need to be able to alter the elements that the range refers to. Not to mention, there's generally no other way to alter elements in a container other than through a range to it unless you remove them and add new ones to replace them. The one exception would be if the container is random access, and then you can use the subscript operator to get at its elements. Ranges definitely need to be able to alter the elements that they contain. Sure, there are plenty of times when you don't need to do that, but there are times when you definitely do.

And really, the problem here isn't really the concept of ranges - it's their implementation. The fact that you can't get the right type of range with the right elements in it is the problem. Doing something like building the take functionality into forward ranges would fix that problem.

- Jonathan M Davis
February 22, 2011
On Tue, 22 Feb 2011 05:55:20 +0300, Jonathan M Davis <jmdavisProg@gmx.com> wrote:

> Okay, removing elements from a container sucks right now. You can do stuff like
> removeAny (generally pretty useless IMHO) or removeFront just fine, but removing
> an arbitrary range from a container just plain sucks.
>
> remove takes a range type which is the range type for the container that it's
> on. That makes sense. It's obviously not going to be able to a take an arbitrary
> range and remove that from itself. How would it know which elements in that
> range corresponded to which elements in itself - especially when it could be a
> range which skips elements or something similar? So, _that_ part makes sense.
>
> But have you actually tried to get a range of the appropriate type to remove
> from a container? It seems like almost ever function in std.range and
> std.algorithm returns a new range type, making them completely useless for
> processing a range to be removed from a container.
>
> I was looking to remove a single element for a RedBlackTree. The best function
> that I could think to get the proper range was findSplit. The middle portion of
> the return value would be the portion that I was looking for. I'd take that and
> pass it to RedBlackTree's remove. Wrong. It uses takeExactly in its
> implementation and the first two portions of the result of findSplit aren't the
> right range type.
>
> So, what do I do? The _only_ thing that I can think of at the moment is to use
> find to locate the beginning of the range that I want, take the length of the
> range with walkLength and then use popBackN to pop of the elements I don't want.
> e.g.
>
> ======
> import std.algorithm, std.container, std.conv,
>        std.range, std.stdio;
>
> void main()
> {
>     auto rbt = RedBlackTree!int([0, 2, 5, 12, 59, 22]);
>     assert(to!string(rbt[]) == "[0, 2, 5, 12, 22, 59]");
>
>     auto found = find(rbt[], 5);
>     assert(to!string(found) == "[5, 12, 22, 59]");
>
>     popBackN(found, walkLength(found) - 1);
>     assert(to!string(found) == "[5]");
>
>     rbt.remove(found);
>     assert(to!string(rbt[]) == "[0, 2, 12, 22, 59]");
> }
> ======
>
> That's disgusting. All that just to remove one element? And what if the range
> isn't bidirectional (e.g. SList)? Well, then you have no popBack, and as far as
> I can tell, you're screwed, since you can't use either take or takeExactly,
> because both of them return new range types.
>
> In particular, the fact that you can't take a range and create a new one of the
> same type from its first n elements is highly problematic. Maybe we need to add a
> function to ForwardRange which returns a new range with the first n elements (it
> certainly looks like that's the key element that's missing here). I don't know
> if that would be reasonable, but the fact that you can't create a range of the
> same type as the original range when taking just its first n elements seems
> crippling in this situation.
>
> I don't know what the proper solution to this is, but the current situation
> strikes me as untenable. I had to think through this problem for a while before
> I came to a solution that came even close to working, let alone get one that
> actually works. Removing elements from a container should _not_ be this hard.
> The situation with remove _needs_ to be improved.
>
> - Jonathan M Davis

I believe remove operation is better expressed in terms of a Cursor (iterator in C++). It also doesn't make much sense (at least of me) for a find to return a subrange of the source range. While it does generalize well, it's not really what most people expect.

What I'd love to have instead is something like this:

int[] arr = [1, 2, 3, 4, 5];
auto cursor = arr.find(3); // either points to 3 or null

int[] before = arr[0..cursor];
int value = arr[cursor]; // for consistency, C++ uses *cursor
int[] after = arr[arr.advance(cursor)..$];

Note that typeof(cursor) could/should actually be int*, but I don't use arr[cursor+1..$] because cursor doesn't know about range bounds, and as such it can't advance on it's own.

For dynamic arrays:

V[K] dynamicArray;
V* cursor = key in dynamicArray;
dynamicArray.remove(key); // requires an additional lookup, which is sub-optimal
// dynamicArray.remove(cursor); // optimal but doesn't work atm. I think it should


I think it's also worth noting that "find" for trees makes even less sense (to me). You can't "slice" a tree arbitrarily, yet find tries to do that:

>     auto rbt = RedBlackTree!int([0, 2, 5, 12, 59, 22]);
>     assert(to!string(rbt[]) == "[0, 2, 5, 12, 22, 59]");
>
>     auto found = find(rbt[], 5);
>     assert(to!string(found) == "[5, 12, 22, 59]");

What is the type of "found"? It's clearly not a tree anymore. Don't know why, but it's still a range. How is it implemented? Without looking at the source I can tell it's implemented as an original tree + a cursor. And I believe the two are better be separated.

As a side note, in my own code I have 2 version of remove - remove and removeUnstable, which work differently. Simple example:

// swaps current element with last one and pops last
void removeUnstable(T)(ref T[] array, T* cursor)
{
    auto length = array.length;
    assert(length > 0);
    --length;

    T* last = array.ptr + length;
    assert(array.ptr <= cursor && cursor <= last);
    *cursor = *last;

    array.length = length;
}

// Moves elements
T* remove(T)(ref T[] array, T* cursor)
{
    auto length = array.length;
    assert(length > 0);
    --length;
	
    assert(array.ptr <= cursor && cursor <= array.ptr + length);
	
    auto numElementsToMove = length - (cursor - array.ptr);
    memmove(cursor, cursor + 1, numElementsToMove * T.sizeof);

    array.length = length;

    return cursor;
}
February 22, 2011
Jonathan M Davis wrote:

> Okay, removing elements from a container sucks right now. You can do stuff like removeAny (generally pretty useless IMHO) or removeFront just fine, but removing an arbitrary range from a container just plain sucks.
> 
> remove takes a range type which is the range type for the container that it's on. That makes sense. It's obviously not going to be able to a take an arbitrary range and remove that from itself. How would it know which elements in that range corresponded to which elements in itself - especially when it could be a range which skips elements or something similar? So, _that_ part makes sense.

I don't understand why not. Given, as you said, that a lot of functions return a new type it would make sense. My answer as to how would be something like:

while range not empty
    container.remove range.front
    range.popFront

Is there a problem with that?

> But have you actually tried to get a range of the appropriate type to remove from a container? It seems like almost ever function in std.range and std.algorithm returns a new range type, making them completely useless for processing a range to be removed from a container.
> 
> I was looking to remove a single element for a RedBlackTree. The best function that I could think to get the proper range was findSplit. The middle portion of the return value would be the portion that I was looking for. I'd take that and pass it to RedBlackTree's remove. Wrong. It uses takeExactly in its implementation and the first two portions of the result of findSplit aren't the right range type.
> 
> So, what do I do? The _only_ thing that I can think of at the moment is to use find to locate the beginning of the range that I want, take the length of the range with walkLength and then use popBackN to pop of the elements I don't want. e.g.

The table in the docs mention stableRemoveAny(v) which says "Same as c.removeAny(v), but is guaranteed to not invalidate any iterators."

Though c.removeAny(v) itself is not listed in the table nor implemented in RedBlackTree, isn't this the right function for the job? (I take it v stands for a value of the container, rather than a range). builtin aa's also implement this function.




« First   ‹ Prev
1 2 3 4