Safe to remove AA elements while iterating over it via .byKeyValue? (page 2) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » Learn » Safe to remove AA elements while iterating over it via .byKeyValue? (page 2)

September 28, 2020

Re: Safe to remove AA elements while iterating over it via .byKeyValue?

Posted by Per Nordlöw
in reply to ikod

Per Nordlöw

Posted in reply to ikod

On Monday, 28 September 2020 at 10:01:23 UTC, ikod wrote:
> Is it specific to some types? What if collection supports stable "foreach"?

Yes it depends on how collection members (such as insert, find, replace, erase, etc) are implemented.

I presume we need attributes on mutating collection members that describes if the operation invalidates ranges or not.

It is relatively simple to implement checks for range invalidation at run-time. This is basically Rust-style borrow checking at run-time instead of compile-time.

In C++ this is called "iterator invalidation". This attributes are not formally defined in code but part of the documentation.

September 28, 2020

Re: Safe to remove AA elements while iterating over it via .byKeyValue?

Posted by ikod
in reply to Per Nordlöw

ikod

Posted in reply to Per Nordlöw

On Monday, 28 September 2020 at 10:10:10 UTC, Per Nordlöw wrote:
> On Monday, 28 September 2020 at 10:01:23 UTC, ikod wrote:
>> Is it specific to some types? What if collection supports stable "foreach"?
>
> Yes it depends on how collection members (such as insert, find, replace, erase, etc) are implemented.
>
> I presume we need attributes on mutating collection members that describes if the operation invalidates ranges or not.
>


Agree, nice idea!

September 28, 2020

Re: Safe to remove AA elements while iterating over it via .byKeyValue?

Posted by Steven Schveighoffer
in reply to Per Nordlöw

Steven Schveighoffer

Posted in reply to Per Nordlöw

On 9/27/20 4:43 PM, Per Nordlöw wrote:
> On Sunday, 27 September 2020 at 14:23:11 UTC, H. S. Teoh wrote:
>> No.  Modifying a container while iterating over it is, in general, a bad idea (unless the container is designed to be used that way, but even then, such removal is generally restricted), because it often leads to highly counterintuitive results.  In the case of AA's, removing an element may lead to a rehashing, which reorders elements, and your iteration may miss some elements or repeat some elements or terminate prematurely. Even without a rehashing, you may encounter inconsistent behaviours, like some elements going "missing".
> 
> I believe it's high time we start thinking about detecting these violations at compile-time. I recall it's in the spec somewhere so we should start a deprecation process at least for AAs.

So the major problem of removing keys while iterating is that the AA can decide to rehash and shrink. If it does this, your iterator is invalidated -- you might skip elements that are actually in the AA, and you might encounter elements that have already been iterated. Essentially, the ordering has completely been shuffled. Aside from that, there's the performance aspect that you have to re-run the hash lookup for no reason (you literally have a pointer to the element you are iterating).

How do you "detect" this at compile time?

One could write a specific function to iterate and remove. I did it for dcollections (this was actually a very important functionality that I missed from C++ std::map, which is why I created dcollections in the first place).

I don't think it's worth worrying about this at compile time, or even at runtime. Just identify that if you do it, you are subject to peculiarities.

Then, we can introduce a specific mechanism to do this (total strawman, have no idea what the best thing looks like):

auto r = aa.byKeyValuePurging;
while(!r.empty)
{
   if(shouldPurge(r.front.key, r.front.value)) r.popFrontAndDelete;
   else r.popFront;
}

where popFrontAndDelete will move to the next element, remove the prior element, and ensure that a rehash does not occur.

Again, this is not a formal proposal. Just something like this could be possible.

In dcollections, I used a trick of opApply to provide the boolean of whether to remove as a parameter to foreach:

foreach(k, v, ref doPurge; &container.purge)
   doPurge = shouldPurge(k, v);

-Steve

September 28, 2020

Re: Safe to remove AA elements while iterating over it via .byKeyValue?

Posted by ikod
in reply to Steven Schveighoffer

ikod

Posted in reply to Steven Schveighoffer

On Monday, 28 September 2020 at 14:58:15 UTC, Steven Schveighoffer wrote:
> On 9/27/20 4:43 PM, Per Nordlöw wrote:
>> On Sunday, 27 September 2020 at 14:23:11 UTC, H. S. Teoh wrote:
>>> No.  Modifying a container while iterating over it is, in general, a bad idea (unless the container is designed to be

> So the major problem of removing keys while iterating is that the AA can decide to rehash and shrink. If it does this, your iterator is invalidated -- you might skip elements that are
...
>
> How do you "detect" this at compile time?

I think idea was to have some attributes that collection programmer may attach to implementation, like enums: SafeToRemoveOnIteration, SafeToAddOnIteration and so on, which may be checked at compile time when foreach is handled by compiler. Then in the foreach body compiler can refuse to call some unsafe methods. I do not know if it is worth of implementation, just this this was an idea.

>
> One could write a specific function to iterate and remove. I

This looks like dead end to me, as you may not only remove items from arbitrary positions while iterating over collection, but also insert items. Also this can happens not only inside foreach body, but in other kinds of iteration. And also there can be nested iterations over same collection.

> did it for dcollections (this was actually a very important functionality that I missed from C++ std::map, which is why I created dcollections in the first place).
>
> I don't think it's worth worrying about this at compile time, or even at runtime. Just identify that if you do it, you are subject to peculiarities.

Yes, compile-time check is just nice option.

>
> Then, we can introduce a specific mechanism to do this (total strawman, have no idea what the best thing looks like):
>

I have an idea on how to implement this, trading up memory for performance in some rare cases. Just need some time to estimate how it works.

Igor

September 28, 2020

Re: Safe to remove AA elements while iterating over it via .byKeyValue?

Posted by H. S. Teoh
in reply to ikod

H. S. Teoh

Posted in reply to ikod

On Mon, Sep 28, 2020 at 05:18:41PM +0000, ikod via Digitalmars-d-learn wrote:
> On Monday, 28 September 2020 at 14:58:15 UTC, Steven Schveighoffer wrote:
[...]
> > One could write a specific function to iterate and remove. I
> 
> This looks like dead end to me, as you may not only remove items from arbitrary positions while iterating over collection, but also insert items.  Also this can happens not only inside foreach body, but in other kinds of iteration. And also there can be nested iterations over same collection.

The problem with arbitrary, unrestricted modification of a container while iterating over it, is that it inevitably leads to counterintuitive results.

For example, suppose you have a container containing the elements A, B, C, D, E, and you're iterating over it in that order.

1) Suppose you're at element C, and you decide that you need to insert F. Then there's the question: should F be included in the current iteration or not?  What if F sorts before C in the current sort order? What if F sorts after C?  Should the two cases behave similarly (i.e., new elements consistently get included in the current iteration, or they consistently don't get included in the current iteration)?  What if the act of inserting F requires an internal reordering of elements in the container?

2) Suppose you're at element C, and you decide to delete D. "Obviously", the current iteration should not include D. Or should it? If you delete A while you're at C, you've already iterated over A, so there's an inconsistency: deleted elements sometimes get included in the current iteration, sometimes not.  The problem is, which case is the "correct" one depends on the user's purpose, which is information that the container may not have.

Also, what if the act of deleting D requires an internal reorganization of the container?  If this reorg changes the iteration order, how should the current iteration proceed? According to the old order, or the new order?  Note that proceeding with the new order may lead to counterintuitive results: suppose the new order becomes E -> C -> B -> A.  That means the current iteration will proceed with B and A, which have already been encountered before, and skips E.

However, proceeding with the old order may be inefficient: it may require allocating extra storage to preserve the current iteration order because the reorganized container no longer has that information.  Or it may require the container itself to maintain extra information in order to retain the current iteration order.

3) Suppose you're at element A, and you decide to insert F, with the resulting order A -> B -> C -> D -> E -> F.  So you proceed as usual. But at B, you decide to delete F, but that causes the container to reorganize itself to B -> A -> C -> D -> E.  So in the next iteration you encounter A again, and insert F again, which causes the container to reorganize itself to A -> B -> C -> D -> E -> F.  So you get stuck in a loop between A and B, and termination is no longer guaranteed in iteration.

Basically, the whole concept of iteration is undermined when the container can mutate while you're iterating.  It leads to inconsistent behaviour -- sometimes new elements show up in the current iteration, sometimes they don't; or some elements may get visited multiple times or missed altogether -- and iteration may not terminate if your sequence of operations interacts badly with the container's internal reorganizations.

Furthermore, trying to work around this by postponing container reorganizations may lead to other problems, like container inconsistency (some data structures require immediate reorganization in order to maintain correctness, etc.).

//

The only way to support container mutation while iterating over it, is if (1) the allowed operations is constrained (e.g., you can only delete the current element, not any arbitrary element), and (2) the container itself must explicitly support such an operation (otherwise deleting the current element may lead to premature termination of the current iteration, or errors like dangling pointers or some compromise of container consistency).

Mixing iteration with arbitrary, unrestricted container mutation is a road that leads to madness.

T

-- 
Recently, our IT department hired a bug-fix engineer. He used to work for Volkswagen.

September 28, 2020

Re: Safe to remove AA elements while iterating over it via .byKeyValue?

Posted by ikod
in reply to H. S. Teoh

ikod

Posted in reply to H. S. Teoh

On Monday, 28 September 2020 at 19:18:20 UTC, H. S. Teoh wrote:
> On Mon, Sep 28, 2020 at 05:18:41PM +0000, ikod via Digitalmars-d-learn wrote:
>> On Monday, 28 September 2020 at 14:58:15 UTC, Steven Schveighoffer wrote:
> [...]
>> > One could write a specific function to iterate and remove. I
>> 
>> This looks like dead end to me, as you may not only remove items from arbitrary positions while iterating over collection, but also insert items.  Also this can happens not only inside foreach body, but in other kinds of iteration. And also there can be nested iterations over same collection.
>
> The problem with arbitrary, unrestricted modification of a container while iterating over it, is that it inevitably leads to counterintuitive results.

Thanks for your reply, and sorry if I was not clear, but I meant AA. AA is unordered container, so for me intuitive behavior for mutations visibility during iteration is next:

1) you must not see removed keys. Let our keys are (unordered) A D C E F, you stay on D and remove E. Then E must not be seen on any future iteration steps.

2) you may or may not see any inserted keys. As AA is unordered container there is no reason to expect anything about position of key you just inserted - it can fall before or after current iteration position, so implementation is free to show inserted keys or not. I prefer not to see new keys.

3) you expect to visit all not removed keys exactly once.

Is it sane?

>
> For example, suppose you have a container containing the elements A, B, C, D, E, and you're iterating over it in that order.
>
> 1) Suppose you're at element C, and you decide that you need to insert F. Then there's the question: should F be included in
...
>
> 2) Suppose you're at element C, and you decide to delete D. "Obviously", the current iteration should not include D. Or
...

Sure, everything you said is correct for ordered collection.

> Also, what if the act of deleting D requires an internal reorganization of the container?  If this reorg changes the iteration order, how should the current iteration proceed?

Intuitive behavior should follow mentioned three rules regardless of internal implementation.

September 28, 2020

Re: Safe to remove AA elements while iterating over it via .byKeyValue?

Posted by H. S. Teoh
in reply to ikod

H. S. Teoh

Posted in reply to ikod

On Mon, Sep 28, 2020 at 08:04:49PM +0000, ikod via Digitalmars-d-learn wrote:
> On Monday, 28 September 2020 at 19:18:20 UTC, H. S. Teoh wrote:
[...]
> > The problem with arbitrary, unrestricted modification of a container while iterating over it, is that it inevitably leads to counterintuitive results.
> 
> Thanks for your reply, and sorry if I was not clear, but I meant AA. AA is unordered container, so for me intuitive behavior for mutations visibility during iteration is next:
> 
> 1) you must not see removed keys. Let our keys are (unordered) A D C E F, you stay on D and remove E. Then E must not be seen on any future iteration steps.
> 
> 2) you may or may not see any inserted keys. As AA is unordered container there is no reason to expect anything about position of key you just inserted - it can fall before or after current iteration position, so implementation is free to show inserted keys or not. I prefer not to see new keys.
> 
> 3) you expect to visit all not removed keys exactly once.
> 
> Is it sane?
[...]

It sounds reasonable, but it does constrain your implementation. For example, (3) is likely to break during a rehash. But not rehashing may lead to other problems, depending on your hashing implementation.


T

-- 
Time flies like an arrow. Fruit flies like a banana.

September 28, 2020

Re: Safe to remove AA elements while iterating over it via .byKeyValue?

Posted by Steven Schveighoffer
in reply to ikod

Steven Schveighoffer

Posted in reply to ikod

On 9/28/20 1:18 PM, ikod wrote:
> On Monday, 28 September 2020 at 14:58:15 UTC, Steven Schveighoffer wrote:
>> On 9/27/20 4:43 PM, Per Nordlöw wrote:
>>> On Sunday, 27 September 2020 at 14:23:11 UTC, H. S. Teoh wrote:
>>>> No.  Modifying a container while iterating over it is, in general, a bad idea (unless the container is designed to be
> 
>> So the major problem of removing keys while iterating is that the AA can decide to rehash and shrink. If it does this, your iterator is invalidated -- you might skip elements that are
> ....
>>
>> How do you "detect" this at compile time?
> 
> I think idea was to have some attributes that collection programmer may attach to implementation, like enums: SafeToRemoveOnIteration, SafeToAddOnIteration and so on, which may be checked at compile time when foreach is handled by compiler. Then in the foreach body compiler can refuse to call some unsafe methods. I do not know if it is worth of implementation, just this this was an idea.

auto sneaky = aa;
foreach(k, v; aa)
{
   sneaky.remove(k); // oops
}

It doesn't work. You can't restrict this at compile time. Possibly runtime, but not compile time.

> 
>>
>> One could write a specific function to iterate and remove. I
> 
> This looks like dead end to me, as you may not only remove items from arbitrary positions while iterating over collection, but also insert items. Also this can happens not only inside foreach body, but in other kinds of iteration. And also there can be nested iterations over same collection.

What do you mean dead end? You provide what is supported through a controlled interface. It works well, I wrote and used it in dcollections. Using the standard insertion and removal functions is not allowed (at least without invalidating the range).

If you want insertion, then potentially you could expose those methods via the specific iteration construct as well. But you must be prepared to do some really funky things.

If we are talking about AAs, insertion may end up expanding the bucket list, which means rehashing. That leads to iterating over the same keys again. Unless you do some kind of copy-on-write, or combination of iterated container + true container? I don't know.

However, deletion is possible (either delay the rehash until after, or don't do a rehash during that iteration) via the iteration construct (range). You still need to disallow removal via the original type, or you might get a rehash.

-Steve

September 29, 2020

Re: Safe to remove AA elements while iterating over it via .byKeyValue?

Posted by Mike Parker
in reply to Per Nordlöw

Mike Parker

Posted in reply to Per Nordlöw

On Sunday, 27 September 2020 at 13:02:04 UTC, Per Nordlöw wrote:
> Is it safe to remove AA-elements from an `aa` I'm iterating over via aa.byKeyValue?
>
> I'm currently doing this:
>
>     foreach (ref kv; aa.byKeyValue)
>     {
>         if (pred(kv.key))
>             aa.remove(kv.key); // ok?
>     }
>     if (aa.length == 0)
>         aa = null;
>
> Is there a better way?

If you're okay with the allocation that comes with it:

foreach(k; aa.keys)
{
    if(pred(key)) aa.remove(key);
}

September 29, 2020

Re: Safe to remove AA elements while iterating over it via .byKeyValue?

Posted by ikod
in reply to Per Nordlöw

ikod

Posted in reply to Per Nordlöw

Hello,

Sorry if I unintentionally hurt anybody in this thread.
I'll try to implement sane and correct iteration behavior for AA without noticeable performance loss, and propose it if I succeed.

Igor

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation