Thread overview
Impressed with Appender - Is there design/implementation description?
Dec 04, 2016
Jon Degenhardt
Dec 06, 2016
thedeemon
Dec 06, 2016
Anonymouse
Dec 06, 2016
Jonathan M Davis
Dec 06, 2016
Jon Degenhardt
December 04, 2016
I've been using Appender quite a bit recently, typically when I need append-only arrays with variable and unknown final sizes. I had been prepared to write a custom data structure when I started using it with large amounts of data, but very nicely this has not surfaced as a need. Appender has held up quite well.

I haven't actually benchmarked it against competing data structures, nor have I studied the implementation. I'd be very interested in understanding the design and how it compares to other data structures. Are there any write-ups or articles describing it?

--Jon
December 06, 2016
On Sunday, 4 December 2016 at 20:03:37 UTC, Jon Degenhardt wrote:
> I've been using Appender quite a bit recently, typically when I need append-only arrays with variable and unknown final sizes. I had been prepared to write a custom data structure when I started using it with large amounts of data, but very nicely this has not surfaced as a need. Appender has held up quite well.
>
> I haven't actually benchmarked it against competing data structures, nor have I studied the implementation. I'd be very interested in understanding the design and how it compares to other data structures. Are there any write-ups or articles describing it?
>
> --Jon

It's rather simple, just take a look at its source code in std.array.
It's an ordinary linear array partially filled with your data. When your data fills it up, it gets resized to make more space. Just two interesting points:

1. When it needs to grow it uses a formula like k = 1 + 10 / log2(newsize) for the growth factor (but with a limit of k <= 2), which means up to 16 KB it doubles the size each time, then tries to grow by a factor of 2/3, then by lower and lower factors.

2. Up until 4 KB it reallocates when growing, but after 4 KB the array lives in a larger pool of memory where it can often grow a lot without reallocating, so in many scenarios where other allocations do not interfere, the data array of appender grows in place without copying any data, thanks to GC.extend() method.
December 06, 2016
On Tuesday, 6 December 2016 at 10:52:44 UTC, thedeemon wrote:
> It's rather simple, just take a look at its source code in std.array.
> It's an ordinary linear array partially filled with your data.

[...]

> 2. Up until 4 KB it reallocates when growing, but after 4 KB the array lives in a larger pool of memory where it can often grow a lot without reallocating, so in many scenarios where other allocations do not interfere, the data array of appender grows in place without copying any data, thanks to GC.extend() method.

I always assumed it kept its own manually allocated array on a malloc heap :O

Hence a practice of using .dup and .idup on the .data member when you're happy with the result?
December 06, 2016
On Tuesday, December 06, 2016 13:19:22 Anonymouse via Digitalmars-d-learn wrote:
> On Tuesday, 6 December 2016 at 10:52:44 UTC, thedeemon wrote:
> > It's rather simple, just take a look at its source code in
> > std.array.
> > It's an ordinary linear array partially filled with your data.
>
> [...]
>
> > 2. Up until 4 KB it reallocates when growing, but after 4 KB the array lives in a larger pool of memory where it can often grow a lot without reallocating, so in many scenarios where other allocations do not interfere, the data array of appender grows in place without copying any data, thanks to GC.extend() method.
>
> I always assumed it kept its own manually allocated array on a malloc heap :O

No. The main thing that Appender does is reduce the number of checks required for whether there's room for the array to append in place, because that check is a good chunk of why ~= is expensive for arrays. And using ~= isn't all that bad. It's just that it adds up, and when you're clearly going to be doing a lot of appending when you first create a dynamic array, it's just more efficient to use Appender and avoid the extra overhead that ~= has. What Appender does with put is actually pretty close to what druntime does with ~=. It's just that Appender can just look at the member variable to know the capacity of the dynamic array, whereas ~= has to look it up by looking at the data for the memory block that the dynamic array currently points to.

All Appender really is is a way to make appending to dynamic arrays more efficient. It doesn't do anything magical with data structures. At present, this is what it's member variables look like

    private alias T = ElementEncodingType!A;
    private struct Data
    {
        size_t capacity;
        Unqual!T[] arr;
        bool canExtend = false;
    }

    private Data* _data;

The magic is in that extra capacity variable, but otherwise, you're basically dealing with the same array appending you get out of ~=.

> Hence a practice of using .dup and .idup on the .data member when you're happy with the result?

That isn't necessary. I don't think that I ever do it. If you want to keep appending with Appender, and you want a dynamic array that has its own memory to grow into without being affected by Appender, then it would make sense to use dup/idup, but that's true any time that you append to a dynamic array. And you might want dup/idup to get a dynamic array with a different type of mutability than the dynamic array in the Appender just like you might want with a naked dynamic array, but since you define the mutability with Appender, you'd usually just change the array type that you instantiated Appender with.

- Jonathan M Davis

December 06, 2016
On Tuesday, 6 December 2016 at 15:29:59 UTC, Jonathan M Davis wrote:
> On Tuesday, December 06, 2016 13:19:22 Anonymouse via Digitalmars-d-learn wrote:
>> On Tuesday, 6 December 2016 at 10:52:44 UTC, thedeemon wrote:
>>
>> [...]
>>
>> > 2. Up until 4 KB it reallocates when growing, but after 4 KB the array lives in a larger pool of memory where it can often grow a lot without reallocating, so in many scenarios where other allocations do not interfere, the data array of appender grows in place without copying any data, thanks to GC.extend() method.
>>
>> I always assumed it kept its own manually allocated array on a malloc heap :O
>
> No. The main thing that Appender does is reduce the number of checks required for whether there's room for the array to append in place, because that check is a good chunk of why ~= is expensive for arrays.
> [...]

Thanks everyone for the explanations. I should probably look into my data and see how often I'm reaching the 4kb size triggering GC.extend() use.

--Jon