Jump to page: 1 2
Thread overview
Has AA a bad implementation?
Feb 12, 2021
frame
Feb 12, 2021
welkam
Feb 12, 2021
frame
Feb 12, 2021
frame
Feb 13, 2021
frame
Feb 13, 2021
Paul Backus
Feb 16, 2021
Elronnd
February 12, 2021
Talking about this:

aaA.d: 157
> void clear() pure nothrow
> {
>    import core.stdc.string : memset;
>    // clear all data, but don't change bucket array length
>    memset(&buckets[firstUsed], 0, (buckets.length - firstUsed) * Bucket.sizeof);
>    deleted = used = 0;
>    firstUsed = cast(uint) dim;
> }

I see this is for performance but it also means that an associative array never frees the memory it has allocated for life time - or I'm wrong? I am missing a real empty() method here.
February 12, 2021
On Friday, 12 February 2021 at 12:41:32 UTC, frame wrote:
> Talking about this:
>
> aaA.d: 157
>> void clear() pure nothrow
>> {
>>    import core.stdc.string : memset;
>>    // clear all data, but don't change bucket array length
>>    memset(&buckets[firstUsed], 0, (buckets.length - firstUsed) * Bucket.sizeof);
>>    deleted = used = 0;
>>    firstUsed = cast(uint) dim;
>> }
>
> I see this is for performance but it also means that an associative array never frees the memory it has allocated for life time - or I'm wrong? I am missing a real empty() method here.

There can be slices that point to the memory owned by this array so you cant just free it. If you want manually manage the memory then you should look at dub for different containers.
February 12, 2021
On Friday, 12 February 2021 at 12:48:30 UTC, welkam wrote:

> There can be slices that point to the memory owned by this array so you cant just free it. If you want manually manage the memory then you should look at dub for different containers.

The array is overwritten with 0 so any pointers are lost anyway. Yeah, I can use any element I want. But my problem is that any thirdparty code may not care about. In any static context it produces leaks. This behavior is not documented - anyone thinks that clear() means clear, not: fill with 0.

In some context it's needed to reduce memory. As a standard element it should at least have an explicit ability to free memory when it's really desired.
February 12, 2021
On 2/12/21 3:35 PM, frame wrote:
> On Friday, 12 February 2021 at 12:48:30 UTC, welkam wrote:
> 
>> There can be slices that point to the memory owned by this array so you cant just free it. If you want manually manage the memory then you should look at dub for different containers.
> 
> The array is overwritten with 0 so any pointers are lost anyway. Yeah, I can use any element I want. But my problem is that any thirdparty code may not care about. In any static context it produces leaks. This behavior is not documented - anyone thinks that clear() means clear, not: fill with 0.
> 
> In some context it's needed to reduce memory. As a standard element it should at least have an explicit ability to free memory when it's really desired.

There's a misunderstanding here. The bucket array is not the data itself, its just basically a pointer and the hash. The Bucket struct is here: https://github.com/dlang/druntime/blob/a026ea46088b369f1182c397e7e046f5fc950358/src/rt/aaA.d#L173

When you use clear, it removes the pointers to all the existing key/value pairs. Those data items are separate GC blocks. If you no longer have any pointers to them, they will be garbage collected.

If you want the bucket memory itself to be freed on the next GC cycle, you can just set the AA to null. But the bucket memory shouldn't be that cumbersome.

I don't think there's any current way to explicitly free all the items in the AA. Even aa.remove(key) is not going to free the item.

-Steve
February 12, 2021
On Friday, 12 February 2021 at 20:59:50 UTC, Steven Schveighoffer wrote:

> There's a misunderstanding here. The bucket array is not the data itself, its just basically a pointer and the hash. The Bucket struct is here:

No, I'm talking about those Buckets.

> If you want the bucket memory itself to be freed on the next GC cycle, you can just set the AA to null. But the bucket memory shouldn't be that cumbersome.

Maybe not, but it comes in mind for daemon applications. 10000 items are consuming 600 ~ 800K each array.

Yes, nulling helps to get rid of the Bucket list. Also destroy(). But that's not so intentional than using clear() - this is my point.



February 12, 2021
On 2/12/21 4:56 PM, frame wrote:
> On Friday, 12 February 2021 at 20:59:50 UTC, Steven Schveighoffer wrote:
> 
>> There's a misunderstanding here. The bucket array is not the data itself, its just basically a pointer and the hash. The Bucket struct is here:
> 
> No, I'm talking about those Buckets.
> 
>> If you want the bucket memory itself to be freed on the next GC cycle, you can just set the AA to null. But the bucket memory shouldn't be that cumbersome.
> 
> Maybe not, but it comes in mind for daemon applications. 10000 items are consuming 600 ~ 800K each array.

Your numbers are a bit high, but I don't know what else you would expect. If you have an AA with 10000 elements, it's going to consume a multiple of that number in memory.

> Yes, nulling helps to get rid of the Bucket list. Also destroy(). But that's not so intentional than using clear() - this is my point.

The intent of clear is to reset the array so I DON'T have to reallocate the bucket array.

Before clear existed, you had to remove each element one at a time, and then it would resize down throwing away all that space. If your use case is to clear an aa to fill it up again, then clear is perfect for that. It also has implications if you have multiple references to the same AA.

Note, I thought rehash would shrink the bucket array down, but it looks like it ignores that when the AA is empty. That seems like an omission.

Other types in Phobos treat clear the same way (e.g. appender).

I think you are just misunderstanding what clear is for. It means, reset but don't deallocate.

-Steve
February 12, 2021
On Friday, 12 February 2021 at 22:23:45 UTC, Steven Schveighoffer wrote:
> Note, I thought rehash would shrink the bucket array down, but it looks like it ignores that when the AA is empty. That seems like an omission.

Do we necessarily want that shrinkage by default?  If an AA has had a certain size in the past, and then is cleared, it seems likely that the typical use case is that it will fill up to a similar size again.

For comparison, does `rehash` shrink the bucket array when the AA is non-empty but much, much smaller than the bucket array?

I don't object to the possibility to shrink the bucket array where possible but it should probably require an explicit and unambiguous request.
February 12, 2021
On 2/12/21 6:13 PM, Joseph Rushton Wakeling wrote:
> On Friday, 12 February 2021 at 22:23:45 UTC, Steven Schveighoffer wrote:
>> Note, I thought rehash would shrink the bucket array down, but it looks like it ignores that when the AA is empty. That seems like an omission.
> 
> Do we necessarily want that shrinkage by default?  If an AA has had a certain size in the past, and then is cleared, it seems likely that the typical use case is that it will fill up to a similar size again.

The shrinkage happens by default if you remove an element and it goes below a threshold.

It does not shrink on clear (on purpose, for the reason you say).

> 
> For comparison, does `rehash` shrink the bucket array when the AA is non-empty but much, much smaller than the bucket array?

Yes. If it has one element, then it will resize the array.

> I don't object to the possibility to shrink the bucket array where possible but it should probably require an explicit and unambiguous request.

I think it already does shrink for a rehash. It's just set up to not shrink when the AA is "empty". I'm not really sure if it's a problem, just a weird inconsistency. Technically nobody should care about the bucket array, it's an implementation detail.

-Steve
February 13, 2021
On Friday, 12 February 2021 at 22:23:45 UTC, Steven Schveighoffer wrote:
> On 2/12/21 4:56 PM, frame wrote:

> The intent of clear is to reset the array so I DON'T have to reallocate the bucket array.

I know.

> Other types in Phobos treat clear the same way (e.g. appender).

Is this really the same? This looks more like clearing in Appender:

> void clear() @trusted pure nothrow
> {
>    if (_data)
>    {
>        _data.arr = _data.arr.ptr[0 .. 0];
>    }
> }

> I think you are just misunderstanding what clear is for. It means, reset but don't deallocate.

So it should be called reset() or blank() then.

> I'm not really sure if it's a problem, just a weird inconsistency. Technically nobody should care about the bucket array, it's an implementation detail.

I just don't like a widely used but leaky standard element that imply it does not. Better correct than error. But ok, it's not a big deal. Should be maybe documented that null can be useful.

February 13, 2021
On Saturday, 13 February 2021 at 03:43:13 UTC, frame wrote:
> On Friday, 12 February 2021 at 22:23:45 UTC, Steven Schveighoffer wrote:
>> I think you are just misunderstanding what clear is for. It means, reset but don't deallocate.
>
> So it should be called reset() or blank() then.

Think of it this way: when you "clear" a room, you remove all of its current occupants, but you do not demolish the room itself.
« First   ‹ Prev
1 2