Container insertion and removal (page 3) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Container insertion and removal (page 3)

March 08, 2010

Re: Container insertion and removal

Posted by Steven Schveighoffer
in reply to Robert Jacques

Steven Schveighoffer

Posted in reply to Robert Jacques

On Sun, 07 Mar 2010 12:43:09 -0500, Robert Jacques <sandford@jhu.edu> wrote:

> On Sun, 07 Mar 2010 08:23:03 -0500, Steven Schveighoffer <schveiguy@yahoo.com> wrote:
>>
>> How do you know when to stop?  A range has a beginning and an ending, otherwise it's an iterator.  Whether you store it via a pointer to the last (not-to-be-iterated) node, or via a length, or via a value to compare with for stopping, you have to use something.  Or are you asserting the only useful range for a linked list iterates the entire list?
>>
>> Think of it as the equivalent of a slice of an array.
>
> Please define for me an O(1) slice or index operation for a linked-list.

One for which you have references to the slice end points.  I think this will work, and I was planning on providing it in the upcoming dcollections port.  The only thing you cannot guarantee is that the order is correct.

> By the way, having ranges detect if they reach their end nodes or not is fairly easy to do.

you are correct in that point, you could throw an exception as long as the end point is part of the range structure.  If you just use a current node plus a length, then you cannot do that.  But soft ops are not necessary to create this.

>
>> I still fail to see the difference between "soft" operations and non-soft.  What does soft guarantee?  Give me a concrete definition, an example would help too.
>
> There are a couple of possible definitions for soft operations: 1) the memory safety of the ranges of a collection are guaranteed. 2) That for the topology viewed by a range isn't logically changed. i.e. the range will continue to perform the same logical function if the topology its operating on is updated 3) That for the topology viewed by a range isn't actually changed and all elements selected at range creation will be viewed. 4) Like 3, but with all values being viewed.
>
> For example, modifying an array in any way doesn't change 1, 2 or 3 for any of its slices.
> For a linked list defining a forward range, mutation, insertion and removal can be done under 1 & 2.
> The same can be said about doubly linked lists and bidirectional ranges.
> For other containers, such as a sorted tree, mutation can break a 2/3 though insertion and deletion don't break 2.
> Although, the ranges will see many values, they may not see all the values currently in the collection nor all the values in the collection when the iterator was generated. So code that relies on such properties would be logically invalid.
>
> I'd probably define hard ops as being 1) and soft ops at level 2. 4) is really only possible with immutable containers.

Hard ops definitely qualify as #1, since we are in a GC'd language.

I don't really understand 2, "the range will continue to perform the same logical function," what does that mean?  Define that function.  I would define it as #3, so obviously you think it's something else.

#3 would be most useful for soft operations if it could be guaranteed.  I don't think it can.

>>> Wouldn't re-hashing necessitate re-allocation? (Thus the range would see a
>>> stale view)
>>
>> God no.  If my hash collision solution is linked-list based (which it is in dcollections), why should I reallocate all those nodes?  I just rearrange them in a new bucket array.
>
> Sorry, I was assuming that if you were going to implement a hash collection you wouldn't be using a linked list approach, since that's what D's associative arrays already do. The are some really good reasons to not use a list based hash in D due to GC false pointer issues, but basically none to re-implementing (poorly?) D's built-in data structure.

Hm... my hash outperforms builtin AAs by a wide margin.  But this is not technically because my implementation is better, it's because AA's use "dumb" allocation methods.  I don't know about false pointers, the hash nodes in my implementation only contain pointers, so I'm not sure there is any possibility for false ones.

What my hash implementation *does* do that AA's don't is to provide a better interface -- removal while iterating, storing cursors for later reference and removal (i.e. avoid double lookups).

Also, technically AAs are implemented with a tree, not a LL, so no opCmp is required for my hash.

>>> The difference is that algorithms can document in their template
>>> constraints that they need a container with 'soft' properties.
>>
>> What is the advantage?  Why would an algorithm require soft functions?  What is an example of such an algorithm?
>
> Something that uses toUpperCase or toLowerCase, for example.

I guess I won't get a real example.  I'm not sure it matters.  When Andrei starts implementing the soft methods, either they will be a huge win or obviously useless.  If I were a betting man, I'd bet on the latter, but I'm not really good at betting, and Andrei's ideas are usually good :)

>>> I wasn't thinking of multi-threaded containers. I was trying to point out
>>> that version ids have failed in lock-free containers, where things are
>>> happening on the order of a single atomic op or a context switch. Given
>>> the time a range could go unused in standard code, versioning won't work.
>>
>> Are you trying to say that if you don't use your range for exactly 2^32 mutations, it could mistakenly think the range is still valid?  That's a valid, but very very weak point.
>
> Umm, no. That is a valid point that happens in production code to disastrous effects. Worse, it doesn't happen often and there's no good unit test for it. I for one, never want to debug a program that only glitches after days of intensive use in a live environment with real customer data. Integer overflow bugs like this are actually one of the few bugs that have ever killed anyone.

I think you are overestimating the life of ranges on a container.  They will not survive that many mutations, and the worst that happens is the range continues on fully allocated memory (i.e. your #1 above).

You can also force the container to invalidate itself once the first wrap occurs.  This at least will be a hard error.

>>> No it's not. version tags + integer overflow = bug. Doug Lea knew about
>>> the problem and but thought it would never happen in real code. And Bill
>>> Gates thought no one will need more than 640k of ram. They both have been
>>> proven wrong.
>>
>> Overflow != bug.  Wrapping completely to the same value == bug, but is so unlikely, it's worth the possibility.
>
> Statistics 101: do a test enough times and even the highly improbable will happen.

In statistics we generally ignore the outliers.  I normally am on your side in these types of cases, but we are talking about a statistical impossibility -- nobody will leave a range untouched for exactly 2^32 mutations, it simply won't happen.  Your computer will probably be obsolete before it does.

BTW, I'm not advocating adding mutation counters, I don't plan on adding them.  It's just another alternative to "soft" mutations.

-Steve

March 08, 2010

Re: Container insertion and removal

Posted by Steven Schveighoffer
in reply to Andrei Alexandrescu

Steven Schveighoffer

Posted in reply to Andrei Alexandrescu

On Sun, 07 Mar 2010 06:42:10 -0500, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:


> Iterator invalidation has been a notion thoroughly explored by the STL and a commonly-mentioned liability of STL iterators. People find it very jarring that syntactically identical interfaces have distinct effects on iterators, it's a dependency very difficult to track. I'm glad that that experience has already been accumulated and that we can build upon it.

Is there any experience that led to a solution, even in theory?

I'm going to stop arguing this point any further, because the quickest path to resolution is probably for you to stop wasting time debating with me and implement what you think will work.  Then we can see how useful it might be.

If you can solve this problem in a way that is useful, I think it might be as revolutionary as ranges are.

-Steve

March 08, 2010

Re: Container insertion and removal

Posted by dsimcha
in reply to Robert Jacques

dsimcha

Posted in reply to Robert Jacques

== Quote from Robert Jacques (sandford@jhu.edu)'s article
> > The mutation index has been used in Tango forever, and I think was in Doug Lea's original container implementations.  I'm pretty sure it is sound in single-threaded uses.
> No it's not. version tags + integer overflow = bug. Doug Lea knew about the problem and but thought it would never happen in real code. And Bill Gates thought no one will need more than 640k of ram. They both have been proven wrong.

Using a 32-bit version tag probably could lead to overflow in some corner cases, but in the 64-bit case it can be shown that this is absurdly unlikely.  Assume that an increment operation takes about 1 clock cycle (in reality it's more) and that most CPUs are 10 GHz (to make things future-proof in case people figure out how to get the clock speed race going again).

This means that a version tag could, at most, increment at about 10^10 ticks per second.  A 64-bit unsigned int goes up to about 1.8x10^19.  This means that, even if our program does nothing but update the version counter in an empty for loop, it will take 1.8x10^19 / 10^10 = 1.8x10^9 seconds to overflow.  There are about 3.15x10^7 seconds in a year.

Therefore, even under very pessimistic assumptions the fastest you could overflow
a 64-bit unsigned int by incrementing it starting at zero is in about half a century.

March 08, 2010

Re: Container insertion and removal

Posted by Robert Jacques
in reply to Steven Schveighoffer

Robert Jacques

Posted in reply to Steven Schveighoffer

On Sun, 07 Mar 2010 22:07:14 -0500, Steven Schveighoffer <schveiguy@yahoo.com> wrote:
> On Sun, 07 Mar 2010 12:43:09 -0500, Robert Jacques <sandford@jhu.edu> wrote:
>> On Sun, 07 Mar 2010 08:23:03 -0500, Steven Schveighoffer <schveiguy@yahoo.com> wrote:
[snip]
>> Please define for me an O(1) slice or index operation for a linked-list.
>
> One for which you have references to the slice end points.  I think this will work, and I was planning on providing it in the upcoming dcollections port.  The only thing you cannot guarantee is that the order is correct.

The container would have to do an O(N) search to verify the ranges are actually part of the collection. And using two ranges as iterators to create a third range feels very wrong and very bug prone: see all the issues raised during Andrei's iterators vs ranges presentations. Similarly, it feels wrong for something to define slicing and not indexing.

>> By the way, having ranges detect if they reach their end nodes or not is fairly easy to do.
>
> you are correct in that point, you could throw an exception as long as the end point is part of the range structure.  If you just use a current node plus a length, then you cannot do that.  But soft ops are not necessary to create this.

Soft ops are necessary to document in code whether that invalidation is expected to happen.

>>
>>> I still fail to see the difference between "soft" operations and non-soft.  What does soft guarantee?  Give me a concrete definition, an example would help too.
>>
>> There are a couple of possible definitions for soft operations: 1) the memory safety of the ranges of a collection are guaranteed. 2) That for the topology viewed by a range isn't logically changed. i.e. the range will continue to perform the same logical function if the topology its operating on is updated 3) That for the topology viewed by a range isn't actually changed and all elements selected at range creation will be viewed. 4) Like 3, but with all values being viewed.
>>
>> For example, modifying an array in any way doesn't change 1, 2 or 3 for any of its slices.
>> For a linked list defining a forward range, mutation, insertion and removal can be done under 1 & 2.
>> The same can be said about doubly linked lists and bidirectional ranges.
>> For other containers, such as a sorted tree, mutation can break a 2/3 though insertion and deletion don't break 2.
>> Although, the ranges will see many values, they may not see all the values currently in the collection nor all the values in the collection when the iterator was generated. So code that relies on such properties would be logically invalid.
>>
>> I'd probably define hard ops as being 1) and soft ops at level 2. 4) is really only possible with immutable containers.
>
> Hard ops definitely qualify as #1, since we are in a GC'd language.
>
> I don't really understand 2, "the range will continue to perform the same logical function," what does that mean?  Define that function.  I would define it as #3, so obviously you think it's something else.
>
> #3 would be most useful for soft operations if it could be guaranteed.  I don't think it can.

I was thinking of "iterate from the start to the end", for example. One might better describe this concept as the topology of the container relative to the range doesn't change: things before it stay before, things after it stay after and in the case of bidirectional ranges, things in the middle, stay in the middle.

>>>> Wouldn't re-hashing necessitate re-allocation? (Thus the range would see a
>>>> stale view)
>>>
>>> God no.  If my hash collision solution is linked-list based (which it is in dcollections), why should I reallocate all those nodes?  I just rearrange them in a new bucket array.
>>
>> Sorry, I was assuming that if you were going to implement a hash collection you wouldn't be using a linked list approach, since that's what D's associative arrays already do. The are some really good reasons to not use a list based hash in D due to GC false pointer issues, but basically none to re-implementing (poorly?) D's built-in data structure.
>
> Hm... my hash outperforms builtin AAs by a wide margin.  But this is not technically because my implementation is better, it's because AA's use "dumb" allocation methods.  I don't know about false pointers, the hash nodes in my implementation only contain pointers, so I'm not sure there is any possibility for false ones.

The GC isn't precise, so if you have a non-pointer type in a structure with a pointer or in a class, you'll get false pointers. (i.e. the hash value at each node)

>>>> The difference is that algorithms can document in their template
>>>> constraints that they need a container with 'soft' properties.
>>>
>>> What is the advantage?  Why would an algorithm require soft functions?  What is an example of such an algorithm?
>>
>> Something that uses toUpperCase or toLowerCase, for example.
>
> I guess I won't get a real example.  I'm not sure it matters.  When Andrei starts implementing the soft methods, either they will be a huge win or obviously useless.  If I were a betting man, I'd bet on the latter, but I'm not really good at betting, and Andrei's ideas are usually good :)

Though this was the first thing that popped into my head, it is a fairly real example. Consider you are doing some processing involving a container and you call a library function, like toUpperCase, that will perform some mutation. For some containers, this is completely reasonable. But for others, the container topology is going to massively change, invalidating all your current ranges. And you really like to guarantee that both the current implementation and the one 6 months from now don't mess up your ranges and cause a bunch of exceptions to be thrown.

>>>> I wasn't thinking of multi-threaded containers. I was trying to point out
>>>> that version ids have failed in lock-free containers, where things are
>>>> happening on the order of a single atomic op or a context switch. Given
>>>> the time a range could go unused in standard code, versioning won't work.
>>>
>>> Are you trying to say that if you don't use your range for exactly 2^32 mutations, it could mistakenly think the range is still valid?  That's a valid, but very very weak point.
>>
>> Umm, no. That is a valid point that happens in production code to disastrous effects. Worse, it doesn't happen often and there's no good unit test for it. I for one, never want to debug a program that only glitches after days of intensive use in a live environment with real customer data. Integer overflow bugs like this are actually one of the few bugs that have ever killed anyone.
>
> I think you are overestimating the life of ranges on a container.  They will not survive that many mutations, and the worst that happens is the range continues on fully allocated memory (i.e. your #1 above).

A year ago I would have agreed with you, but then I saw a couple of articles about how this unlikely event does start to occur repeatably in certain circumstances. This made a bunch of news since it threw a massive spanner in lock-free containers, which relied on this never happening.

> You can also force the container to invalidate itself once the first wrap occurs.  This at least will be a hard error.

So every container will have a finite number of operation and that's it?

>>>> No it's not. version tags + integer overflow = bug. Doug Lea knew about
>>>> the problem and but thought it would never happen in real code. And Bill
>>>> Gates thought no one will need more than 640k of ram. They both have been
>>>> proven wrong.
>>>
>>> Overflow != bug.  Wrapping completely to the same value == bug, but is so unlikely, it's worth the possibility.
>>
>> Statistics 101: do a test enough times and even the highly improbable will happen.
>
> In statistics we generally ignore the outliers.  I normally am on your side in these types of cases, but we are talking about a statistical impossibility -- nobody will leave a range untouched for exactly 2^32 mutations, it simply won't happen.  Your computer will probably be obsolete before it does.

No, this is a statistical implausibility. One that has already happened to some people. ulongs on the other hand...

> BTW, I'm not advocating adding mutation counters, I don't plan on adding them.  It's just another alternative to "soft" mutations.

I understand.

March 08, 2010

Re: Container insertion and removal

Posted by Steven Schveighoffer
in reply to Robert Jacques

Steven Schveighoffer

Posted in reply to Robert Jacques

On Mon, 08 Mar 2010 00:20:31 -0500, Robert Jacques <sandford@jhu.edu> wrote:

> On Sun, 07 Mar 2010 22:07:14 -0500, Steven Schveighoffer <schveiguy@yahoo.com> wrote:
>> On Sun, 07 Mar 2010 12:43:09 -0500, Robert Jacques <sandford@jhu.edu> wrote:
>>> On Sun, 07 Mar 2010 08:23:03 -0500, Steven Schveighoffer <schveiguy@yahoo.com> wrote:
> [snip]
>>> Please define for me an O(1) slice or index operation for a linked-list.
>>
>> One for which you have references to the slice end points.  I think this will work, and I was planning on providing it in the upcoming dcollections port.  The only thing you cannot guarantee is that the order is correct.
>
> The container would have to do an O(N) search to verify the ranges are actually part of the collection. And using two ranges as iterators to create a third range feels very wrong and very bug prone: see all the issues raised during Andrei's iterators vs ranges presentations. Similarly, it feels wrong for something to define slicing and not indexing.

Not two ranges, two references.  That is what cursors are in dcollections.  They currently are akin to iterators, but are being changed to immovable references.

BTW, I don't plan on adding this as a simple slice operation on a container that cannot quickly verify that the two cursors are ordered, to ensure it isn't accidentally used in functions that want safe slicing.

You don't need O(n) to verify the cursors are part of the collection if the cursors identify the collection they are from.

>> Hm... my hash outperforms builtin AAs by a wide margin.  But this is not technically because my implementation is better, it's because AA's use "dumb" allocation methods.  I don't know about false pointers, the hash nodes in my implementation only contain pointers, so I'm not sure there is any possibility for false ones.
>
> The GC isn't precise, so if you have a non-pointer type in a structure with a pointer or in a class, you'll get false pointers. (i.e. the hash value at each node)

I don't store the hash in the node, it is recalculated when a re-hash is done.

>> I think you are overestimating the life of ranges on a container.  They will not survive that many mutations, and the worst that happens is the range continues on fully allocated memory (i.e. your #1 above).
>
> A year ago I would have agreed with you, but then I saw a couple of articles about how this unlikely event does start to occur repeatably in certain circumstances. This made a bunch of news since it threw a massive spanner in lock-free containers, which relied on this never happening.

references?

>
>> You can also force the container to invalidate itself once the first wrap occurs.  This at least will be a hard error.
>
> So every container will have a finite number of operation and that's it?

That's one solution, if you want to be sure this bug doesn't occur, and you want to be sure your ranges aren't invalidated.

-Steve

March 08, 2010

Re: Container insertion and removal

Posted by Andrei Alexandrescu
in reply to dsimcha

Andrei Alexandrescu

Posted in reply to dsimcha

dsimcha wrote:
> == Quote from Robert Jacques (sandford@jhu.edu)'s article
>>> The mutation index has been used in Tango forever, and I think was in
>>> Doug Lea's original container implementations.  I'm pretty sure it is
>>> sound in single-threaded uses.
>> No it's not. version tags + integer overflow = bug. Doug Lea knew about
>> the problem and but thought it would never happen in real code. And Bill
>> Gates thought no one will need more than 640k of ram. They both have been
>> proven wrong.
> 
> Using a 32-bit version tag probably could lead to overflow in some corner cases,
> but in the 64-bit case it can be shown that this is absurdly unlikely.  Assume
> that an increment operation takes about 1 clock cycle (in reality it's more) and
> that most CPUs are 10 GHz (to make things future-proof in case people figure out
> how to get the clock speed race going again).
> 
> This means that a version tag could, at most, increment at about 10^10 ticks per
> second.  A 64-bit unsigned int goes up to about 1.8x10^19.  This means that, even
> if our program does nothing but update the version counter in an empty for loop,
> it will take 1.8x10^19 / 10^10 = 1.8x10^9 seconds to overflow.  There are about
> 3.15x10^7 seconds in a year.
> 
> Therefore, even under very pessimistic assumptions the fastest you could overflow
> a 64-bit unsigned int by incrementing it starting at zero is in about half a century.

Yah, there are many similar assessments that at least for a couple of centuries going we can assume that 64-bit counters never overflow and 64-bit virtual address space is sparse. First time I saw this was when I read about the Opal OS: http://www.cs.washington.edu/homes/levy/opal/sigops92.ps

Andrei

March 08, 2010

Re: Container insertion and removal

Posted by Andrei Alexandrescu
in reply to Steven Schveighoffer

Andrei Alexandrescu

Posted in reply to Steven Schveighoffer

Steven Schveighoffer wrote:
> On Sun, 07 Mar 2010 06:42:10 -0500, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
> 
> 
>> Iterator invalidation has been a notion thoroughly explored by the STL and a commonly-mentioned liability of STL iterators. People find it very jarring that syntactically identical interfaces have distinct effects on iterators, it's a dependency very difficult to track. I'm glad that that experience has already been accumulated and that we can build upon it.
> 
> Is there any experience that led to a solution, even in theory?

I don't know of one. There are of course the safe iterators that rely on a variety of techniques (some, such as serial numbers, discussed in this thread) to detect invalidation dynamically.

For the style of containers discussed herein, a static solution should be a better fit than a dynamic one. I plan to enact a solution by means of the softXxx primitives.

> I'm going to stop arguing this point any further, because the quickest path to resolution is probably for you to stop wasting time debating with me and implement what you think will work.  Then we can see how useful it might be.
> 
> If you can solve this problem in a way that is useful, I think it might be as revolutionary as ranges are.

Probably not. The way I see it, it would be simply the removal of a large annoyance that stands in the way of writing container-independent generic code.

Andrei

March 08, 2010

Re: Container insertion and removal

Posted by Joel C. Salomon
in reply to Andrei Alexandrescu

Joel C. Salomon

Posted in reply to Andrei Alexandrescu

On 3/6/2010 5:18 PM, Andrei Alexandrescu wrote:
> Good question. In STL, invalidation roughly means undefined behavior if you use it. With GC in tow, the concept could be significantly milder. For example, reallocating an array would leave the old contents of the array sort of around, just obsoleted and also depleted of meaningful content if the data type has a destructor.

So it’s still a bug to use an invalidated iterator/range, and probably still has unusual (if no longer undefined) effects if you do follow it, but it (probably) won’t cause memory corruption.  Is that correct?

—Joel Salomon

March 08, 2010

Re: Container insertion and removal

Posted by Andrei Alexandrescu
in reply to Steven Schveighoffer

Andrei Alexandrescu

Posted in reply to Steven Schveighoffer

Steven Schveighoffer wrote:
> On Sun, 07 Mar 2010 12:43:09 -0500, Robert Jacques <sandford@jhu.edu> wrote:
> 
>> On Sun, 07 Mar 2010 08:23:03 -0500, Steven Schveighoffer <schveiguy@yahoo.com> wrote:
>>> What is the advantage?  Why would an algorithm require soft functions?  What is an example of such an algorithm?
>>
>> Something that uses toUpperCase or toLowerCase, for example.
> 
> I guess I won't get a real example.  I'm not sure it matters.  When Andrei starts implementing the soft methods, either they will be a huge win or obviously useless.  If I were a betting man, I'd bet on the latter, but I'm not really good at betting, and Andrei's ideas are usually good :)

The way people usually take advantage of non-invalidating container primitives is altering the container during an iteration. The canonical way to remove while iterating an erase-non-invalidating container (e.g. map) is this:

typedef ... Container;
Container c;
for (Container::iterator i = c.begin(); i != c.end(); )
{
    if (should_delete_this_guy)
        c.remove(i++);
    else
        ++i;
}

The canonical way to remove while iterating an erase-invalidating container (e.g. vector) is:

typedef ... Container;
Container c;
for (Container::iterator i = c.begin(); i != c.end(); )
{
    if (should_delete_this_guy)
        i = c.remove(i);
    else
        ++i;
}

I know, subtle :o/. Of course, things can get quickly more interesting when calling functions from within loops that may affect the container (think the Observer pattern).

Andrei

March 08, 2010

Re: Container insertion and removal

Posted by Andrei Alexandrescu
in reply to Fawzi Mohamed

Andrei Alexandrescu

Posted in reply to Fawzi Mohamed

Fawzi Mohamed wrote:
> On 2010-03-07 14:23:03 +0100, Steven Schveighoffer <schveiguy@yahoo.com> said:
> 
>> Robert Jacques Wrote:
>>
>>> On Sat, 06 Mar 2010 21:54:50 -0500, Steven Schveighoffer
>>> <schveiguy@yahoo.com> wrote:
>>>> On Sat, 06 Mar 2010 11:19:15 -0500, Robert Jacques <sandford@jhu.edu>
>>>> wrote:
>>>>
>>>>> On Sat, 06 Mar 2010 08:19:36 -0500, Steven Schveighoffer
>>>>> <schveiguy@yahoo.com> wrote:
>>>>>>
>>>>>> How can softRemove not affect iterating ranges?  What if the range is
>>>>>> positioned on the element removed?
>>>>>
>>>>> It always affects ranges in so far as the range and container are
>>>>> inconsistent, but that is a problem of softInsert as well. Removing an
>>>>> element from an array doesn't invalidate the underlying range, since
>>>>> the memory is still there. And so long as you're not trying to use free
>>>>> lists, linked-lists, trees, etc. can be written so that the ranges
>>>>> never enter an invalid state. If they happen to be pointing at the
>>>>> removed node, they just end up stopping early.
>>>>
>>>> If my linked list range has two node pointers as the implementation, and
>>>> you remove the end node, it's going to end later, not early.
>>>
>>> A linked list maps to a forward range: so the range simply points to a
>>> single internal node. If you're going to store a pointer to the end, then
>>> you should be using a doubly linked list, not a singly linked list.
>>
>> How do you know when to stop?  A range has a beginning and an ending, otherwise it's an iterator.  Whether you store it via a pointer to the last (not-to-be-iterated) node, or via a length, or via a value to compare with for stopping, you have to use something.  Or are you asserting the only useful range for a linked list iterates the entire list?
>>
>> Think of it as the equivalent of a slice of an array.
>>
>>>
>>>> Of course, you can get around this by just storing a current node and a
>>>> length, but that may not be what the user is expecting.  For example, if
>>>> you did a range of a sorted tree from "a" to "f" and it stops at "n", I
>>>> think that would be extremely unexpected.
>>>
>>> By this logic, any and all forms of mutation, not just insertions and
>>> deletions must not be allowed.
>>
>> Allowed, yes.  Not invalidating iterators, no.
> 
> I agree with this point of view, it makes sense to make some operations invalidating iteration, and even there there is a hierarcy of "invalidation":
> 
> 1) all iterator are still valid, and iterate on the same set that they were iterating before. This is the best thing from a user, and is what persistent structures do, all the structures described in the book I gave have this property.
> To achieve this you need to basically *always* create a copy when modifying a structure, but fortunately often there are ways to share most of the structure with the old value. Still this has a cost, and normally puts pressure on the memory/gc (all the nodes, or groups of them have to be single objects for the gc).
> 
> 2) iterators are still valid, but they might iterate on a different set of elements than originally planned.
> Array append with atomic update of elements gives this for example, removal in general has more problems, even if the memory might still be valid. Invariants of the array might be violated (for example strict ordering), one could separate this in two sub points, invariant violating and non, but probably it is not worth the effort.
> 
> 3) iterators are fully invalid, and at worst they might iterate on completely wrong/invalid data without detection of failure, hopefully at least in debug mode detection should be an option.

These are great thoughts. We need to explore this niche because STL has a black-and-white approach to invalidation - once invalidated, behavior is undefined. D could get much more nuanced than that.

Perhaps for the beginning we could just say that invalidation entails implementation-specific behavior.

>>>> Stopping early is invalidation also IMO.  If your program logic depends
>>>> on iterating over all the elements you originally intended to iterate,
>>>> then we have big problems if they stop early.
>>>
>>> Stopping early is the result of a logical error by the programmer. The
>>> code itself, however, is completely valid.
>>
>> I still fail to see the difference between "soft" operations and non-soft.  What does soft guarantee?  Give me a concrete definition, an example would help too.
> 
> I suppose soft should guarantee either 1 or 2, from what andrei was saying I suppose he wants 2

I think softXxx functions that add data guarantee that existing ranges continue iterating the new collection correctly. Depending on where insertion was effected, iteration may or may not span the newly-inserted elements. For example, inserting elements in a linked list does preserve ranges so a linked list can afford to define softInsert and softPrepend.

softXxx functions that remove data should guarantee that existing ranges not including the removed elements will continue to operate on the range without seeing the removed elements. There is no guarantee that soft removal offers for ranges spanning the removed elements. (I'm unclear on this last topic.)

Andrei

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation