to invalidate a range (page 2)

August 12, 2011

Re: to invalidate a range

Posted by Jonathan M Davis
in reply to Ellery Newcomer

Permalink

Jonathan M Davis

Posted in reply to Ellery Newcomer

Permalink

On Friday, August 12, 2011 16:16 Ellery Newcomer wrote:
> On 08/12/2011 05:51 PM, Jonathan M Davis wrote:
> > An implementation can guarantee it as long as your range doesn't directly point to an element being removed (i.e. as long as the element isn't on the ends - or maybe one past the end, depending on the implementation). But _no_ container can guarantee that an iterator or range which directly references an element which is removed is going to stay valid - not without playing some serious games internally which make iterators and ranges too inefficent, and possibly not even then. So, stableRemove is only going to guarantee that a range stays valid on as long as the end points of that range aren't what was being removed.
> 
> Forgive my being dense, but where is this 'as long as' coming from? If your range only points to ends in e.g. a linked list, how is it supposed to retrieve elements in the middle? I'm having a hard time visualizing a range over a node based container that doesn't point to a node in the middle (at some point in time). The range points to the node to retrieve the current front quickly, the node can get removed, the removed function won't know its invalidating the range without playing yon internal games, ergo stable remove cannot be.

Are you familiar with iterators? This will be a lot easier if you are. An iterator points to one element and one element only. In C++, you tend to pass around pairs of iterators - one pointing to the first element in a range of elements and one pointing to one past the end. You then usually iterate by incrementing the first iterator until it equals the second.

Ranges at their most basic level are a pair of iterators - one pointing to the first element in the range and one pointing either to the last element or one past that, depending on the implementation. The range API and concept is actually much more flexible than that, allowing us to implement stuff like the fibonacci sequence as a range, but when it comes to containers, they're almost certainly doing what C++ by using two iterators, except that it's wrapped them so that you don't ever have to worry about them pointing to separate containers or the first iterator actually being past the second one. Wrapping them as a range makes it much cleaner, but ultimately, for containers at least, you're still going to have those iterators internally.

So, front returns the element that the first iterator points to and popFront just increments that iterator by one. If the range has back, then back points to the last element of the range (though the internal iterator may point one elment past that), and popBack decrements that iterator by one. It doesn't directly refer to _any_ elements in the middle.

So, in a node-based container, adding or removing elements in the middle of the range will have no effect on the internal iterators. It'll have an effect on what elements you ultimately iterate over if you iterate over the range later, but the two iterators are still completely valid and point to the same elements that they always have. However, if you remove either of the elements that the iterators point to, _then_ the range is going to be invalidated, because its iterators are invalid. They don't point to valid elements in the container anymore.

In a contiguous container, such as Array, adding or removing elements is more disruptive, since the elements get copied around inside of the contiguous block of memory, and while the iterators may continue to point at the same indices as before, the exact elements will have shifted. So, they're still valid in the sense that they point to valid elements, but they don't point to the same elements. The two places where they end up no longer pointing to valid elements are when you remove enough elements that one or both iterators points at an index which is greater than the size of the container and when the container has to be reallocated (typically when you append enough elements that it no longer has the capacity to resize in place). So, in general, altering contiguous containers, such as Array, risks invalidating all iterators or ranges which point to them, whereas with node-based containers it's only when the end point of a range gets removed that you run into that kind of trouble.

Really, to understand range invalidation, you probably should have a fair understanding of iterators. But again, as long as you don't keep any ranges around when you alter a container by adding or removing elements from it, you don't have anything to worry about.

- Jonathan M Davis

On 08/12/2011 06:34 PM, Jonathan M Davis wrote: >> >> Forgive my being dense, but where is this 'as long as' coming from? If >> your range only points to ends in e.g. a linked list, how is it supposed >> to retrieve elements in the middle? I'm having a hard time visualizing a >> range over a node based container that doesn't point to a node in the >> middle (at some point in time). The range points to the node to retrieve >> the current front quickly, the node can get removed, the removed >> function won't know its invalidating the range without playing yon >> internal games, ergo stable remove cannot be. > > Are you familiar with iterators? This will be a lot easier if you are. An > iterator points to one element and one element only. In C++, you tend to pass > around pairs of iterators - one pointing to the first element in a range of > elements and one pointing to one past the end. You then usually iterate by > incrementing the first iterator until it equals the second. > Now you're just bludgeoning me into apathy (though my ability to communicate seems lacking). The iterator is an abstraction. Beneath it, in a node based container, [I expect] will be a pointer to a node, which might point to any node in the container. This means that removing any node could potentially invalidate a range somewhere. When such a conflict arises, you cannot both perform the removal and keep a valid range, regardless of whether you even knew of the conflict.

On Friday, August 12, 2011 20:03:59 Ellery Newcomer wrote: > On 08/12/2011 06:34 PM, Jonathan M Davis wrote: > >> Forgive my being dense, but where is this 'as long as' coming from? If > >> your range only points to ends in e.g. a linked list, how is it > >> supposed > >> to retrieve elements in the middle? I'm having a hard time visualizing > >> a > >> range over a node based container that doesn't point to a node in the > >> middle (at some point in time). The range points to the node to > >> retrieve > >> the current front quickly, the node can get removed, the removed > >> function won't know its invalidating the range without playing yon > >> internal games, ergo stable remove cannot be. > > > > Are you familiar with iterators? This will be a lot easier if you are. > > An > > iterator points to one element and one element only. In C++, you tend to > > pass around pairs of iterators - one pointing to the first element in a > > range of elements and one pointing to one past the end. You then > > usually iterate by incrementing the first iterator until it equals the > > second. > > Now you're just bludgeoning me into apathy (though my ability to communicate seems lacking). The iterator is an abstraction. Beneath it, in a node based container, [I expect] will be a pointer to a node, which might point to any node in the container. This means that removing any node could potentially invalidate a range somewhere. When such a conflict arises, you cannot both perform the removal and keep a valid range, regardless of whether you even knew of the conflict. It means that if you're dealing with a node-based container, and you remove an element from that container, and you have a range which does not have that element at either of its ends, then you know that your range is valid. If you're keeping random ranges around and removing elements from the container, then no, you can't know whether the range is still valid or not. What it really comes down to is that you don't keep ranges around long term, and that if you're altering a container, and you're using a range over that container at the same time, you need to be sure of what you're doing. If you are, then you can use the range in spite of altering the container. If you're not, then you're in trouble. The simplest thing to do, of course, is to just not alter a container while you have ranges which refer to it (or to get rid of any ranges that you have over a container when you do alter that container). - Jonathan M Davis

On Fri, 12 Aug 2011 16:58:15 -0400, Ellery Newcomer <ellery-newcomer@utulsa.edu> wrote: > On 08/12/2011 03:29 PM, Steven Schveighoffer wrote: >> On Fri, 12 Aug 2011 15:54:53 -0400, Ellery Newcomer >> <ellery-newcomer@utulsa.edu> wrote: >> >>> in std.container, the stable* container functions advocate that they >>> do not invalidate the ranges of their containers. What does it mean to >>> invalidate a range? >> >> Say for example, you are iterating a red black tree, and your current >> "front" points at a certain node. Then that node is removed from the >> tree. That range is now invalid, because the node it points to is not >> valid. > > Then there is no way to implement a stable remove from a node based container? Not one that guarantees stability. However, you can implement a remove that can be proven to be stable for certain cases (basically, as long as you don't remove one of the endpoints). >> Another example of an invalidated range, let's say you have a hash map. >> The range has a start and a finish, with the finish being iterated after >> the start. If you add a node, it could cause a rehash, which could >> potentially put the finish *before* the start! > > Then the invalidation is that the range failed to produce an element of the container? No, it may crash the program, for example if empty does: return this.beginNode is this.endNode; If beginNode is sequentially after endNode, this condition will never be true. But there are other definitions of "invalid", I'd call any of these cases invalidated ranges: * it fails to iterate valid nodes that were in the range before the operation, and are still valid after the operation. * it iterates a node more than once (for example, iterates a node before the operation, then iterates it again after the operation) * it iterates invalid nodes (nodes that have been removed). -Steve

On Fri, 12 Aug 2011 18:29:00 -0400, Ellery Newcomer <ellery-newcomer@utulsa.edu> wrote: > On 08/12/2011 03:54 PM, Jonathan M Davis wrote: >> >> In the case of container that uses nodes - such as a linked list - because you >> can add and remove elements without affecting other elements, iterators and >> ranges don't tend to get invalidated as easily. As long as you don't remove >> the element (or elements in the case of a range - assuming that it keeps track >> of its two end points, as is likely) that it points to, then adding or >> removing elements from the container shouldn't invalidate the iterator/range. > > "shouldn't" isn't a guarantee. Where there is "shouldn't", there can't be stableRemove*, no? I don't think it's possible to implement stableRemove IMO. I believe SList does claim it, but IMO once you remove an element from a container, any range that iterates that element is invalid. Once we get custom allocators, this is going to become a lot dicier, because removing elements actually may deallocate them. stableAdd is more possible for implementing, as long as adding does not significantly change the topology of the container (for example, adding to a hash may do a rehash which changes the topology). -Steve

Forums