View mode: basic / threaded / horizontal-split · Log in · Help
February 22, 2011
We need to rethink remove in std.container
Okay, removing elements from a container sucks right now. You can do stuff like 
removeAny (generally pretty useless IMHO) or removeFront just fine, but removing 
an arbitrary range from a container just plain sucks.

remove takes a range type which is the range type for the container that it's 
on. That makes sense. It's obviously not going to be able to a take an arbitrary 
range and remove that from itself. How would it know which elements in that 
range corresponded to which elements in itself - especially when it could be a 
range which skips elements or something similar? So, _that_ part makes sense.

But have you actually tried to get a range of the appropriate type to remove 
from a container? It seems like almost ever function in std.range and 
std.algorithm returns a new range type, making them completely useless for 
processing a range to be removed from a container.

I was looking to remove a single element for a RedBlackTree. The best function 
that I could think to get the proper range was findSplit. The middle portion of 
the return value would be the portion that I was looking for. I'd take that and 
pass it to RedBlackTree's remove. Wrong. It uses takeExactly in its 
implementation and the first two portions of the result of findSplit aren't the 
right range type.

So, what do I do? The _only_ thing that I can think of at the moment is to use 
find to locate the beginning of the range that I want, take the length of the 
range with walkLength and then use popBackN to pop of the elements I don't want. 
e.g.

======
import std.algorithm, std.container, std.conv,
      std.range, std.stdio;

void main()
{
   auto rbt = RedBlackTree!int([0, 2, 5, 12, 59, 22]);
   assert(to!string(rbt[]) == "[0, 2, 5, 12, 22, 59]");

   auto found = find(rbt[], 5);
   assert(to!string(found) == "[5, 12, 22, 59]");

   popBackN(found, walkLength(found) - 1);
   assert(to!string(found) == "[5]");

   rbt.remove(found);
   assert(to!string(rbt[]) == "[0, 2, 12, 22, 59]");
}
======

That's disgusting. All that just to remove one element? And what if the range 
isn't bidirectional (e.g. SList)? Well, then you have no popBack, and as far as 
I can tell, you're screwed, since you can't use either take or takeExactly, 
because both of them return new range types.

In particular, the fact that you can't take a range and create a new one of the 
same type from its first n elements is highly problematic. Maybe we need to add a 
function to ForwardRange which returns a new range with the first n elements (it 
certainly looks like that's the key element that's missing here). I don't know 
if that would be reasonable, but the fact that you can't create a range of the 
same type as the original range when taking just its first n elements seems 
crippling in this situation.

I don't know what the proper solution to this is, but the current situation 
strikes me as untenable. I had to think through this problem for a while before 
I came to a solution that came even close to working, let alone get one that 
actually works. Removing elements from a container should _not_ be this hard. 
The situation with remove _needs_ to be improved.

- Jonathan M Davis
February 22, 2011
Re: We need to rethink remove in std.container
> remove takes a range type which is the range type for the
container that it's on. That makes sense. It's obviously not going
to be able to a take an arbitrary range and remove that from itself.
How would it know which elements in that range corresponded to which
elements in itself - especially when it could be a range which skips
elements or something similar? So, _that_ part makes sense.

I'm wondering, what was the reason for making a "remove range X from
Y" method in the first place? To me, that's not a "remove"
operation; if anything, it's the "removeAll" operation, but I'd
actually call it "subtract" (because it's the set subtraction
operation... although there can be duplicates).

If we change remove to only take in a single element, wouldn't its
implementation then be trivial? (For a range, just return a filter
that removes the element. For a container, actually remove the
element. For ranges that also behave like containers -- like arrays
-- I have no idea.)

Does this sound like a good idea?
February 22, 2011
Re: We need to rethink remove in std.container
On Monday 21 February 2011 21:04:47 %u wrote:
> > remove takes a range type which is the range type for the
> 
> container that it's on. That makes sense. It's obviously not going
> to be able to a take an arbitrary range and remove that from itself.
> How would it know which elements in that range corresponded to which
> elements in itself - especially when it could be a range which skips
> elements or something similar? So, _that_ part makes sense.
> 
> I'm wondering, what was the reason for making a "remove range X from
> Y" method in the first place? To me, that's not a "remove"
> operation; if anything, it's the "removeAll" operation, but I'd
> actually call it "subtract" (because it's the set subtraction
> operation... although there can be duplicates).
> 
> If we change remove to only take in a single element, wouldn't its
> implementation then be trivial? (For a range, just return a filter
> that removes the element. For a container, actually remove the
> element. For ranges that also behave like containers -- like arrays
> -- I have no idea.)
> 
> Does this sound like a good idea?

Removing a range is not necessarily related to removing an element with a 
particular value. Sure, a method could be added to the various container types 
which takes a value and removes the first instance of that value from the 
container, but that's not what remove is necessarily trying to do. It's like 
erase in C++'s standard library which takes two iterators. There are plenty of 
situations where removing a range makes sense. The odd bit with remove in Phobos 
vs erase in the STL is that with iterators, you can give a single iterator to 
indicate a single element whereas you can't do that with a range. So, remove 
takes a range, and if you want to remove a single element, that range only has 
one element in it. It's a bit weird, but that's fine.

The typical way to remove an element in the STL is to use find to find an element 
and then erase to remove it. remove in std.container is doing the same thing. 
The problem is that you can't give the result of find to remove, because instead 
of a single iterator, find gives you a whole range, and you probably don't want 
to remove that whole range. You generally either want to remove the first element 
or some set of elements at the front of the range. So, you need to able to 
remove the other elements from the range so that you can give that range to the 
remove method on the container. That's fine in principle, but since so many of 
the range-based algorithms give you a new type back and you need the exact type 
that that container uses for its ranges, it's very hard to get a range of the 
correct type with the correct elements in it.

So, while find will give you the beginning of the range you want, there's not an 
easy way to remove the end of that range. Functions like take and takeExactly 
return new types, so they don't work. The more I look at it, the more I'm 
convinced that we really need to add a primitive to forward ranges that returns 
the first n elements of that range. Without that, I don't see how you can get a 
range of the correct type with only those elements in it unless it's also a 
bidirectional range, which some ranges (like SList's range) aren't.

- Jonathan M Davis
February 22, 2011
Re: We need to rethink remove in std.container
> The more I look at it, the more I'm convinced that we really need to
add a primitive to forward ranges that returns the first n elements of
that range. Without that, I don't see how you can get a range of the
correct type with only those elements in it unless it's also a
bidirectional range, which some ranges (like SList's range) aren't.

Huh, yeah I think I agree. It seems like ranges are just delayed
evaluation, and that, at some point, we need to be able to force their
evaluation -- otherwise, it's like combining Scheme's (delay X) and
(force Y) operations with mutation, which just doesn't work sensibly
in non-functional programming.
February 22, 2011
Re: We need to rethink remove in std.container
You know, I'm actually now questioning whether STL did the right thing
with requiring iterators for the erase() method. It actually seems
quite pointless -- there's no reason why integer indices wouldn't
work. I don't think we really need to follow suit with STL here... is
there some situation with erase() that I'm missing that can't be
covered with plain integer (or rather, size_t) indices, and that
requires ranges?
February 22, 2011
Re: We need to rethink remove in std.container
On Monday 21 February 2011 21:51:51 %u wrote:
> You know, I'm actually now questioning whether STL did the right thing
> with requiring iterators for the erase() method. It actually seems
> quite pointless -- there's no reason why integer indices wouldn't
> work. I don't think we really need to follow suit with STL here... is
> there some situation with erase() that I'm missing that can't be
> covered with plain integer (or rather, size_t) indices, and that
> requires ranges?

Some of them do take indices, but you can't generally operate on indices. To do 
that, you need the original container. But you can operate on an iterate just 
fine. So, it's generally easier to deal with iterators.

However, the big thing would be that it actually doesn't make sense to use 
indices in many cases. Think about RedBlackTree for a moment. If you remove 
something from it or add something to it, that doesn't invalidate any range that 
you have which points to it (assuming that the end points haven't changed). You 
can still take that range and use it any any algorithm that you want - including 
remove (assuming that the algorithm takes the appropriate kind of range of 
course), and it will work. But using indices wouldn't work. They would have 
changed. Adding or removing anything to a container at or before an index that 
you have would make it so that that index no longer points to the same element. 
However, for many containers (though not all), any iterators or ranges would 
still be valid.

So, on the whole, using iterators or ranges makes great sense. I have no problem 
with how the STL does that. The problem is transitioning that to ranges. The STL 
doesn't go and change your iterator types on you when you feed them to 
algorithms, but std.range and std.algorithm typically do. So, the result is sub-
optimal in this case, to say the least.

- Jonathan M Davis
February 22, 2011
Re: We need to rethink remove in std.container
Hm... so the entire issue here seems to be the capability to do
iteration and modification in a concurrent manner, right?

IMHO that may not be worth the costs we're paying -- I would argue
that you normally shouldn't be modifying a collection that you're
iterating over in the first place; it just doesn't make any sense
when you're delaying everything. Sure, it would work fine if you
couldn't have more than one reference to any container (since every
range would would then have the same view of the container), but
when we're introducing delayed evaluation with ranges, I just don't
see how (or even why) it should be combinable with modification.
It's like trying to read from a database and concurrently modifying
the underlying storage by hand, bypassing the database... it just
doesn't work that way, outside of functional programming. So is it
really worth paying the price?
February 22, 2011
Re: We need to rethink remove in std.container
On Monday 21 February 2011 22:51:25 %u wrote:
> Hm... so the entire issue here seems to be the capability to do
> iteration and modification in a concurrent manner, right?
> 
> IMHO that may not be worth the costs we're paying -- I would argue
> that you normally shouldn't be modifying a collection that you're
> iterating over in the first place; it just doesn't make any sense
> when you're delaying everything. Sure, it would work fine if you
> couldn't have more than one reference to any container (since every
> range would would then have the same view of the container), but
> when we're introducing delayed evaluation with ranges, I just don't
> see how (or even why) it should be combinable with modification.
> It's like trying to read from a database and concurrently modifying
> the underlying storage by hand, bypassing the database... it just
> doesn't work that way, outside of functional programming. So is it
> really worth paying the price?

_Of course_, it's worth it. Do you never alter anything in a range? Functions 
like sort need to be able to alter the elements that the range refers to. Not to 
mention, there's generally no other way to alter elements in a container other 
than through a range to it unless you remove them and add new ones to replace 
them. The one exception would be if the container is random access, and then you 
can use the subscript operator to get at its elements. Ranges definitely need to 
be able to alter the elements that they contain. Sure, there are plenty of times 
when you don't need to do that, but there are times when you definitely do.

And really, the problem here isn't really the concept of ranges - it's their 
implementation. The fact that you can't get the right type of range with the 
right elements in it is the problem. Doing something like building the take 
functionality into forward ranges would fix that problem.

- Jonathan M Davis
February 22, 2011
Re: We need to rethink remove in std.container
On Tue, 22 Feb 2011 05:55:20 +0300, Jonathan M Davis <jmdavisProg@gmx.com>  
wrote:

> Okay, removing elements from a container sucks right now. You can do  
> stuff like
> removeAny (generally pretty useless IMHO) or removeFront just fine, but  
> removing
> an arbitrary range from a container just plain sucks.
>
> remove takes a range type which is the range type for the container that  
> it's
> on. That makes sense. It's obviously not going to be able to a take an  
> arbitrary
> range and remove that from itself. How would it know which elements in  
> that
> range corresponded to which elements in itself - especially when it  
> could be a
> range which skips elements or something similar? So, _that_ part makes  
> sense.
>
> But have you actually tried to get a range of the appropriate type to  
> remove
> from a container? It seems like almost ever function in std.range and
> std.algorithm returns a new range type, making them completely useless  
> for
> processing a range to be removed from a container.
>
> I was looking to remove a single element for a RedBlackTree. The best  
> function
> that I could think to get the proper range was findSplit. The middle  
> portion of
> the return value would be the portion that I was looking for. I'd take  
> that and
> pass it to RedBlackTree's remove. Wrong. It uses takeExactly in its
> implementation and the first two portions of the result of findSplit  
> aren't the
> right range type.
>
> So, what do I do? The _only_ thing that I can think of at the moment is  
> to use
> find to locate the beginning of the range that I want, take the length  
> of the
> range with walkLength and then use popBackN to pop of the elements I  
> don't want.
> e.g.
>
> ======
> import std.algorithm, std.container, std.conv,
>        std.range, std.stdio;
>
> void main()
> {
>     auto rbt = RedBlackTree!int([0, 2, 5, 12, 59, 22]);
>     assert(to!string(rbt[]) == "[0, 2, 5, 12, 22, 59]");
>
>     auto found = find(rbt[], 5);
>     assert(to!string(found) == "[5, 12, 22, 59]");
>
>     popBackN(found, walkLength(found) - 1);
>     assert(to!string(found) == "[5]");
>
>     rbt.remove(found);
>     assert(to!string(rbt[]) == "[0, 2, 12, 22, 59]");
> }
> ======
>
> That's disgusting. All that just to remove one element? And what if the  
> range
> isn't bidirectional (e.g. SList)? Well, then you have no popBack, and as  
> far as
> I can tell, you're screwed, since you can't use either take or  
> takeExactly,
> because both of them return new range types.
>
> In particular, the fact that you can't take a range and create a new one  
> of the
> same type from its first n elements is highly problematic. Maybe we need  
> to add a
> function to ForwardRange which returns a new range with the first n  
> elements (it
> certainly looks like that's the key element that's missing here). I  
> don't know
> if that would be reasonable, but the fact that you can't create a range  
> of the
> same type as the original range when taking just its first n elements  
> seems
> crippling in this situation.
>
> I don't know what the proper solution to this is, but the current  
> situation
> strikes me as untenable. I had to think through this problem for a while  
> before
> I came to a solution that came even close to working, let alone get one  
> that
> actually works. Removing elements from a container should _not_ be this  
> hard.
> The situation with remove _needs_ to be improved.
>
> - Jonathan M Davis

I believe remove operation is better expressed in terms of a Cursor  
(iterator in C++). It also doesn't make much sense (at least of me) for a  
find to return a subrange of the source range. While it does generalize  
well, it's not really what most people expect.

What I'd love to have instead is something like this:

int[] arr = [1, 2, 3, 4, 5];
auto cursor = arr.find(3); // either points to 3 or null

int[] before = arr[0..cursor];
int value = arr[cursor]; // for consistency, C++ uses *cursor
int[] after = arr[arr.advance(cursor)..$];

Note that typeof(cursor) could/should actually be int*, but I don't use  
arr[cursor+1..$] because cursor doesn't know about range bounds, and as  
such it can't advance on it's own.

For dynamic arrays:

V[K] dynamicArray;
V* cursor = key in dynamicArray;
dynamicArray.remove(key); // requires an additional lookup, which is  
sub-optimal
// dynamicArray.remove(cursor); // optimal but doesn't work atm. I think  
it should


I think it's also worth noting that "find" for trees makes even less sense  
(to me). You can't "slice" a tree arbitrarily, yet find tries to do that:

>     auto rbt = RedBlackTree!int([0, 2, 5, 12, 59, 22]);
>     assert(to!string(rbt[]) == "[0, 2, 5, 12, 22, 59]");
>
>     auto found = find(rbt[], 5);
>     assert(to!string(found) == "[5, 12, 22, 59]");

What is the type of "found"? It's clearly not a tree anymore. Don't know  
why, but it's still a range. How is it implemented? Without looking at the  
source I can tell it's implemented as an original tree + a cursor. And I  
believe the two are better be separated.

As a side note, in my own code I have 2 version of remove - remove and  
removeUnstable, which work differently. Simple example:

// swaps current element with last one and pops last
void removeUnstable(T)(ref T[] array, T* cursor)
{
    auto length = array.length;
    assert(length > 0);
    --length;

    T* last = array.ptr + length;
    assert(array.ptr <= cursor && cursor <= last);
    *cursor = *last;

    array.length = length;
}

// Moves elements
T* remove(T)(ref T[] array, T* cursor)
{
    auto length = array.length;
    assert(length > 0);
    --length;
	
    assert(array.ptr <= cursor && cursor <= array.ptr + length);
	
    auto numElementsToMove = length - (cursor - array.ptr);
    memmove(cursor, cursor + 1, numElementsToMove * T.sizeof);

    array.length = length;

    return cursor;
}
February 22, 2011
Re: We need to rethink remove in std.container
Jonathan M Davis wrote:

> Okay, removing elements from a container sucks right now. You can do stuff
> like removeAny (generally pretty useless IMHO) or removeFront just fine,
> but removing an arbitrary range from a container just plain sucks.
> 
> remove takes a range type which is the range type for the container that
> it's on. That makes sense. It's obviously not going to be able to a take
> an arbitrary range and remove that from itself. How would it know which
> elements in that range corresponded to which elements in itself -
> especially when it could be a range which skips elements or something
> similar? So, _that_ part makes sense.

I don't understand why not. Given, as you said, that a lot of functions  
return a new type it would make sense. My answer as to how would be 
something like:

while range not empty
   container.remove range.front
   range.popFront

Is there a problem with that? 

> But have you actually tried to get a range of the appropriate type to
> remove from a container? It seems like almost ever function in std.range
> and std.algorithm returns a new range type, making them completely useless
> for processing a range to be removed from a container.
> 
> I was looking to remove a single element for a RedBlackTree. The best
> function that I could think to get the proper range was findSplit. The
> middle portion of the return value would be the portion that I was looking
> for. I'd take that and pass it to RedBlackTree's remove. Wrong. It uses
> takeExactly in its implementation and the first two portions of the result
> of findSplit aren't the right range type.
> 
> So, what do I do? The _only_ thing that I can think of at the moment is to
> use find to locate the beginning of the range that I want, take the length
> of the range with walkLength and then use popBackN to pop of the elements
> I don't want. e.g.

The table in the docs mention stableRemoveAny(v) which says "Same as 
c.removeAny(v), but is guaranteed to not invalidate any iterators." 

Though c.removeAny(v) itself is not listed in the table nor implemented in 
RedBlackTree, isn't this the right function for the job? (I take it v stands 
for a value of the container, rather than a range). builtin aa's also 
implement this function.
« First   ‹ Prev
1 2 3 4
Top | Discussion index | About this forum | D home