January 08, 2012
On 08.01.2012 11:27, Jonathan M Davis wrote:
> On Sunday, January 08, 2012 11:02:41 simendsjo wrote:
>> On 08.01.2012 10:43, Jonathan M Davis wrote:
>>> On Sunday, January 08, 2012 03:24:22 Kapps wrote:
>>>> Ah, found the bug / pull request.
>>>>
>>>> https://github.com/D-Programming-Language/dmd/pull/597
>>>> http://d.puremagic.com/issues/show_bug.cgi?id=4523
>>>
>>> Ah, TDPL says that it returns bool. Well, then it definitely needs to be
>>> changed, and it's good to see that that there's a pull request for it.
>>> So, it should be fixed in the next release. In fact, the number of
>>> TDPL-related bugs fixed for the next release should be quite high,
>>> which is great.
>>>
>>> - Jonathan M Davis
>>
>> Wouldn't it make sense to return a pointer to the item being
>> removed/null? This way you can continue to work with the item, and it
>> would be consistent with `in`.
>>
>> if (auto item = aa.remove(key)) {
>>     // some additional work on item
>> } else { // not found
>>     // throw or something
>> }
>
> Since the element has been removed from the container, what would the pointer
> point to? The element doesn't exist anymore.
>
> - Jonathan m Davis

I was thinking it could be a shorthand for the following:

auto item = key in aa;
if (item) key.remove(key);

January 08, 2012
On 1/8/12, simendsjo <simendsjo@gmail.com> wrote:
> Wouldn't it make sense to return a pointer to the item being removed/null?

Seems like that would be even more costly. Personally I think returning bool is unnecessary, if we really want to know if something is in the hash we can check with the `in` operator.

I filed that bug ages ago simply because it was in TDPL, but I didn't really give it much thought.
January 08, 2012
On Sunday, January 08, 2012 11:33:51 simendsjo wrote:
> I was thinking it could be a shorthand for the following:
> 
> auto item = key in aa;
> if (item) key.remove(key);

I take it that you then intend to use item after that? I'm not sure that item is actually valid at that point. It points to memory which _was_ inside of aa but may or may not be now, depending on the AA implementation, and it may or may not be reused by aa. Because it's on the GC heap, the memory itself won't be invalid, but the memory could be reused by the AA for another element, depending on its implementation. If you're lucky, it's memory that's just sitting on the GC heap and now unrelated to the AA, but I wouldn't bet on it.

I would _not_ consider that to be good code. It's just asking for trouble.

- Jonathan M Davis
January 08, 2012
simendsjo wrote:

> it doesn't state return value
Yes, I haven't read carefully enough.

-manfred
January 08, 2012
On 08.01.2012 12:18, Jonathan M Davis wrote:
> On Sunday, January 08, 2012 11:33:51 simendsjo wrote:
>> I was thinking it could be a shorthand for the following:
>>
>> auto item = key in aa;
>> if (item) key.remove(key);
>
> I take it that you then intend to use item after that? I'm not sure that item
> is actually valid at that point. It points to memory which _was_ inside of aa
> but may or may not be now, depending on the AA implementation, and it may or
> may not be reused by aa. Because it's on the GC heap, the memory itself won't
> be invalid, but the memory could be reused by the AA for another element,
> depending on its implementation. If you're lucky, it's memory that's just
> sitting on the GC heap and now unrelated to the AA, but I wouldn't bet on it.
>
> I would _not_ consider that to be good code. It's just asking for trouble.
>
> - Jonathan M Davis

Ah.. This should be documented in http://dlang.org/hash-map.html
The only documentation for remove is:
"
Particular keys in an associative array can be removed with the remove function:
    b.remove("hello");
"
January 08, 2012
On Sunday, January 08, 2012 12:35:27 simendsjo wrote:
> On 08.01.2012 12:18, Jonathan M Davis wrote:
> > On Sunday, January 08, 2012 11:33:51 simendsjo wrote:
> >> I was thinking it could be a shorthand for the following:
> >> 
> >> auto item = key in aa;
> >> if (item) key.remove(key);
> > 
> > I take it that you then intend to use item after that? I'm not sure that item is actually valid at that point. It points to memory which _was_ inside of aa but may or may not be now, depending on the AA implementation, and it may or may not be reused by aa. Because it's on the GC heap, the memory itself won't be invalid, but the memory could be reused by the AA for another element, depending on its implementation. If you're lucky, it's memory that's just sitting on the GC heap and now unrelated to the AA, but I wouldn't bet on it.
> > 
> > I would _not_ consider that to be good code. It's just asking for trouble.
> > 
> > - Jonathan M Davis
> 
> Ah.. This should be documented in http://dlang.org/hash-map.html
> The only documentation for remove is:
> "
> Particular keys in an associative array can be removed with the remove
> function:
>      b.remove("hello");
> "

I confess that it's one of those things that I would have thought would have been fairly obvious, but it certainly wouldn't hurt to add it. The documentation does tend to be sparse of details in any case.

- Jonathan M Davis
January 08, 2012
On 08.01.2012 12:57, Jonathan M Davis wrote:
> On Sunday, January 08, 2012 12:35:27 simendsjo wrote:
>> On 08.01.2012 12:18, Jonathan M Davis wrote:
>>> On Sunday, January 08, 2012 11:33:51 simendsjo wrote:
>>>> I was thinking it could be a shorthand for the following:
>>>>
>>>> auto item = key in aa;
>>>> if (item) key.remove(key);
>>>
>>> I take it that you then intend to use item after that? I'm not sure that
>>> item is actually valid at that point. It points to memory which _was_
>>> inside of aa but may or may not be now, depending on the AA
>>> implementation, and it may or may not be reused by aa. Because it's on
>>> the GC heap, the memory itself won't be invalid, but the memory could
>>> be reused by the AA for another element, depending on its
>>> implementation. If you're lucky, it's memory that's just sitting on the
>>> GC heap and now unrelated to the AA, but I wouldn't bet on it.
>>>
>>> I would _not_ consider that to be good code. It's just asking for
>>> trouble.
>>>
>>> - Jonathan M Davis
>>
>> Ah.. This should be documented in http://dlang.org/hash-map.html
>> The only documentation for remove is:
>> "
>> Particular keys in an associative array can be removed with the remove
>> function:
>>       b.remove("hello");
>> "
>
> I confess that it's one of those things that I would have thought would have
> been fairly obvious, but it certainly wouldn't hurt to add it. The
> documentation does tend to be sparse of details in any case.
>
> - Jonathan M Davis

Certainly not obvious to me :)

auto c = new C();
C[string] aa;
aa["c"] = c;
aa.remove("c");

I guess using c from this point is valid, but if the only reference to c was inside the aa (and we got c using `in`), using c would have been undefined..?

January 08, 2012
On Sunday, January 08, 2012 13:06:06 simendsjo wrote:
> Certainly not obvious to me :)

Well, what's obvious to one person is not always obvious to another. I'm sure that there are plenty of things that Walter thinks should be perfectly obvious which 90% of programmers would never think of. A lot of it depends on what you're experience level is and what you have experience in. In this case, it's very much like dealing with C++ iterators, so if you're used to dealing with them and when they do or don't get invalidate, then the situation with in and remove is probably quite straightforward and intuitive.

> auto c = new C();
> C[string] aa;
> aa["c"] = c;
> aa.remove("c");
> 
> I guess using c from this point is valid, but if the only reference to c was inside the aa (and we got c using `in`), using c would have been undefined..?

Using c is valid regardless. The problem is the in operator. It returns a pointer to the element inside of the AA. So, if you did

C* d = aa["c"];

then d would be a pointer to a reference to a C, and after the call to remove, the pointer is pointing to memory which may or may not be part of the AA anymore and which may or may not hold a reference to what c refers to. It's bit like doing this

int* a()
{
    int b;
    return &b;
}

though the compiler should catch this, since it's obviously wrong. In both cases, you're referring to memory which your really not supposed to be accessing anymore and may point to invalid memory. The case of the AA probably isn't as bad, since either you're pointing to memory on the GC which isn't part of the AA (but hasn't been collected, since you have a pointer to it), or you're pointing to an element in the AA which may still be the same as it was before it was removed, or it may be another element, or it may be garbage memory. You really don't know. It's undefined and therefore shouldn't be used.

I don't believe that it's ever safe to assume that the pointer returned by the in operator is still valid after the AA that it comes from has had an element added or removed. It's the same with iterators or ranges with many iterators and ranges. If they point to elements within a container, and that container is altered, they're invalidated and aren't safe to use.

So, your example is fine, because it doesn't involve the in operator, and therefore doesn't involve any pointers into the AA. It's only once you have pointers into the AA that you get into trouble.

- Jonathan M Davis
January 08, 2012
On 08.01.2012 13:49, Jonathan M Davis wrote:
> On Sunday, January 08, 2012 13:06:06 simendsjo wrote:
>> Certainly not obvious to me :)
>
> Well, what's obvious to one person is not always obvious to another. I'm sure
> that there are plenty of things that Walter thinks should be perfectly obvious
> which 90% of programmers would never think of. A lot of it depends on what
> you're experience level is and what you have experience in. In this case, it's
> very much like dealing with C++ iterators, so if you're used to dealing with
> them and when they do or don't get invalidate, then the situation with in and
> remove is probably quite straightforward and intuitive.
>
>> auto c = new C();
>> C[string] aa;
>> aa["c"] = c;
>> aa.remove("c");
>>
>> I guess using c from this point is valid, but if the only reference to c
>> was inside the aa (and we got c using `in`), using c would have been
>> undefined..?
>
> Using c is valid regardless. The problem is the in operator. It returns a
> pointer to the element inside of the AA. So, if you did
>
> C* d = aa["c"];
>
> then d would be a pointer to a reference to a C, and after the call to remove,
> the pointer is pointing to memory which may or may not be part of the AA
> anymore and which may or may not hold a reference to what c refers to. It's
> bit like doing this
>
> int* a()
> {
>      int b;
>      return&b;
> }
>
> though the compiler should catch this, since it's obviously wrong. In both
> cases, you're referring to memory which your really not supposed to be
> accessing anymore and may point to invalid memory. The case of the AA probably
> isn't as bad, since either you're pointing to memory on the GC which isn't
> part of the AA (but hasn't been collected, since you have a pointer to it), or
> you're pointing to an element in the AA which may still be the same as it was
> before it was removed, or it may be another element, or it may be garbage
> memory. You really don't know. It's undefined and therefore shouldn't be used.
>
> I don't believe that it's ever safe to assume that the pointer returned by the
> in operator is still valid after the AA that it comes from has had an element
> added or removed. It's the same with iterators or ranges with many iterators
> and ranges. If they point to elements within a container, and that container
> is altered, they're invalidated and aren't safe to use.
>
> So, your example is fine, because it doesn't involve the in operator, and
> therefore doesn't involve any pointers into the AA. It's only once you have
> pointers into the AA that you get into trouble.
>
> - Jonathan M Davis

Thanks for your clarifications.

Does this mean even this is undefined?
aa["a"] = new C();
auto c = "a" in aa;
aa["b"] = new C();
// Using c here is undefined as an element was added to aa
January 08, 2012
On Sunday, January 08, 2012 14:24:32 simendsjo wrote:
> Thanks for your clarifications.
> 
> Does this mean even this is undefined?
> aa["a"] = new C();
> auto c = "a" in aa;
> aa["b"] = new C();
> // Using c here is undefined as an element was added to aa

Yes. Depending on the current implementation of AAs and the state of aa when "b" is added to it, c could still point to exactly what it did before, or aa could have been rehashed, which could move any or all of its elements around, in which case who knows what c points to.

- Jonathan M Davis