August 14, 2014
On Thursday, 14 August 2014 at 21:34:04 UTC, Jonathan M Davis wrote:
> On Thursday, 14 August 2014 at 21:11:51 UTC, safety0ff wrote:
>> IIRC it manages the capacity information manually instead of calling the runtime which reduces appending overhead.
>
> That would make some sense, though it must be completely avoiding ~= then and probably is even GC-mallocing the array itself. Regardless, I clearly need to study the code if I want to know what it's actually doing.
>
> - Jonathan M Davis

I wonder if using plain `Array` instead may be result in better performance where immutability is not needed.
August 15, 2014
On Thursday, 14 August 2014 at 17:16:42 UTC, Philippe Sigaud wrote:
> From time to time, I try to speed up some array-heavy code by using std.array.Appender, reserving some capacity and so on.
>
> It never works. Never. It gives me executables that are maybe 30-50% slower than bog-standard array code.
>
> I don't do anything fancy: I just replace the types, use clear() instead of = null...
>
> Do people here get good results from Appender? And if yes, how are you using it?

Quick test...

----------------
import std.array;
import std.datetime;
import std.stdio;

enum size = 1000;

void test1()
{
    auto arr = appender!(int[])();
    foreach(n; 0 .. size)
        arr.put(n);
}

void test2()
{
    int[] arr;
    foreach(n; 0 .. size)
        arr ~= n;
}

void test3()
{
    auto arr = new int[](size);
    foreach(n; 0 .. size)
        arr[n] = n;
}

void test4()
{
    auto arr = uninitializedArray!(int[])(size);
    foreach(n; 0 .. size)
        arr[n] = n;
}

void main()
{
    auto result = benchmark!(test1, test2, test3, test4)(10_000);
    writeln(cast(Duration)result[0]);
    writeln(cast(Duration)result[1]);
    writeln(cast(Duration)result[2]);
    writeln(cast(Duration)result[3]);
}

----------------

With size being 1000, I get

178 ms, 229 μs, and 4 hnsecs
321 ms, 272 μs, and 8 hnsecs
27 ms, 138 μs, and 7 hnsecs
24 ms, 970 μs, and 9 hnsecs

With size being 100,000, I get

15 secs, 731 ms, 499 μs, and 1 hnsec
29 secs, 339 ms, 553 μs, and 8 hnsecs
2 secs, 602 ms, 385 μs, and 2 hnsecs
2 secs, 409 ms, 448 μs, and 9 hnsecs

So, it looks like using Appender to create an array of ints is about twice as fast as appending to the array directly, and unsurprisingly, creating the array at the correct size up front and filling in the values is far faster than either, whether the array's elements are default-initialized or not. And the numbers are about the same if it's an array of char rather than an array of int.

Also, curiously, if I add a test which is the same as test2 (so it's just appending to the array) except that it calls reserve on the array with size, the result is almost the same as not reserving. It's a smidgeon faster but probably not enough to matter. So, it looks like reserve doesn't buy you much for some reason. Maybe the extra work for checking whether it needs to reallocate or whatever fancy logic it has to do in ~= dwarfs the extra cost of the reallocations? That certainly seems odd to me, but bizarrely, the evidence does seem to say that reserving doesn't help much. That should probably be looked into.

In any case, from what I can see, if all you're doing is creating an array and then throwing away the Appender, it's faster to use Appender than to directly append to the array. Maybe that changes with structs? I don't know. It would be interesting to know what's different about your code that's causing Appender to be slower, but from what I can see, it's definitely faster to use Appender unless you know the target size of the array up front.

- Jonathan M Davis
August 15, 2014
> Quick test...

Ah, thanks a lot Jonathan. I kept telling me I should probably test it
on a simple case.
OK, the good news is, Appender works in these cases (I mean, that's
good news for Phobos).
Now, I just have to find out why it's slower in my case :)



> ----------------
> import std.array;
> import std.datetime;
> import std.stdio;
>
> enum size = 1000;
>
> void test1()
> {
>     auto arr = appender!(int[])();
>     foreach(n; 0 .. size)
>         arr.put(n);
> }


I used ~= to append to an Appender. The examples use .put, but ~= is documented so I considered the operator was just syntax sugar. I didn't have a look at its implementation.


> void test2()
> {
>     int[] arr;
>     foreach(n; 0 .. size)
>         arr ~= n;
> }
>
> void test3()
> {
>     auto arr = new int[](size);
>     foreach(n; 0 .. size)
>         arr[n] = n;
> }

This one is simple and interesting. In my case I don't know the length in advance (I'm doing some parsing and predicting the size of the parse forest based on the input length is somewhat tricky), but I can pre-allocate some, based on a simple heuristics.

> void test4()
> {
>     auto arr = uninitializedArray!(int[])(size);
>     foreach(n; 0 .. size)
>         arr[n] = n;
> }

Oh, I didn't know this one.


> With size being 1000, I get
>
> 178 ms, 229 μs, and 4 hnsecs
> 321 ms, 272 μs, and 8 hnsecs
> 27 ms, 138 μs, and 7 hnsecs
> 24 ms, 970 μs, and 9 hnsecs

Same here, good.
Using ~= n instead of .put(n) gives me results consistently slightly
slower for Appender (170 ms -> 190 ms, repeatedly, even going back and
forth between the two possibilities.
I created a test1prime to test that.


> With size being 100,000, I get
>
> 15 secs, 731 ms, 499 μs, and 1 hnsec
> 29 secs, 339 ms, 553 μs, and 8 hnsecs
> 2 secs, 602 ms, 385 μs, and 2 hnsecs
> 2 secs, 409 ms, 448 μs, and 9 hnsecs

Ditto. OK, that's good. Also, it's still slower with using Appender ~=
n instead of Appender.put. (18s instead of 15, 20% slower)
So, kids: don't do that.

> So, it looks like using Appender to create an array of ints is about twice as fast as appending to the array directly, and unsurprisingly, creating the array at the correct size up front and filling in the values is far faster than either, whether the array's elements are default-initialized or not. And the numbers are about the same if it's an array of char rather than an array of int.

Perfect. That also means my go-to method will still be using standard
arrays, but with pre-allocation.
I just feel stupid writing that, because it's obvious that reserving
things should give it better speed.


> Also, curiously, if I add a test which is the same as test2 (so it's just appending to the array) except that it calls reserve on the array with size, the result is almost the same as not reserving. It's a smidgeon faster but probably not enough to matter. So, it looks like reserve doesn't buy you much for some reason. Maybe the extra work for checking whether it needs to reallocate or whatever fancy logic it has to do in ~= dwarfs the extra cost of the reallocations? That certainly seems odd to me, but bizarrely, the evidence does seem to say that reserving doesn't help much. That should probably be looked into.

Yeah, I get a small boost of 5% compared to not reserving at size
1000, which completely disappears for longer arrays.
(No different for size 100_000).


> In any case, from what I can see, if all you're doing is creating an array and then throwing away the Appender, it's faster to use Appender than to directly append to the array.

Indeed.

> Maybe that changes with structs? I don't know.

I'm using classes, because what I'm pushing into the Appender are graph nodes and I got fed up with tracing pointer to strucs everywhere. Maybe I should change back to structs and see what it does.


> It would be interesting to know what's different about your code that's causing Appender to be slower, but from what I can see, it's definitely faster to use Appender unless you know the target size of the array up front.

Good conclusion. Thanks for your help. My takeaway idea is that I'll use arrays, but 'new T[](size)' them before. If that makes heavy concatenation 10 times faster, it should have a positive effect (I'm not of course waiting for anything near a 10x boost in my computation time).

August 15, 2014
> I wonder if using plain `Array` instead may be result in better performance where immutability is not needed.

Hmm, no:

module appendertest;

import std.array;
import std.datetime;
import std.stdio;
import std.container;

enum size = 1_000;

void test1()
{
    auto arr = appender!(int[])();
    foreach(n; 0 .. size)
        arr.put(n);
}

void test1prime()
{
    auto arr = appender!(int[])();
    foreach(n; 0 .. size)
        arr ~= n;
}

void test2()
{
    int[] arr;
    foreach(n; 0 .. size)
        arr ~= n;
}

void test2prime()
{
    int[] arr;
    arr.reserve(size);
    foreach(n; 0 .. size)
        arr ~= n;
}

void test3()
{
    Array!int arr;
    foreach(n; 0 .. size)
        arr ~= n;
}

void test3prime()
{
    Array!int arr;
    arr.reserve(size);
    foreach(n; 0 .. size)
        arr ~= n;
}

void test4()
{
    auto arr = new int[](size);
    foreach(n; 0 .. size)
        arr[n] = n;
}

void test5()
{
    auto arr = uninitializedArray!(int[])(size);
    foreach(n; 0 .. size)
        arr[n] = n;
}

void main()
{
    auto result = benchmark!(test1, test1prime,
                             test2, test2prime,
                             test3, test3prime,
                             test4,
                             test5
                            )(10_000);

    writeln("Appender.put       :", cast(Duration)result[0]);
    writeln("Appender ~=        :", cast(Duration)result[1]);
    writeln("Std array          :", cast(Duration)result[2]);
    writeln("Std array.reserve  :", cast(Duration)result[3]);
    writeln("Array              :", cast(Duration)result[4]);
    writeln("Array.reserve      :", cast(Duration)result[5]);
    writeln("new T[]()          :", cast(Duration)result[6]);
    writeln("uninitializedArray :", cast(Duration)result[7]);
}

Times:

Appender.put       :157 ms, 602 μs, and 3 hnsecs
Appender ~=        :182 ms, 807 μs, and 1 hnsec
Std array          :256 ms, 210 μs, and 7 hnsecs
Std array.reserve  :244 ms, 770 μs, and 4 hnsecs
Array              :336 ms, 207 μs, and 3 hnsecs
Array.reserve      :321 ms, 500 μs, and 6 hnsecs
new T[]()          :28 ms, 496 μs, and 6 hnsecs
uninitializedArray :26 ms and 620 μs

August 15, 2014
On Thu, Aug 14, 2014 at 11:33 PM, Joseph Rushton Wakeling via Digitalmars-d-learn <digitalmars-d-learn@puremagic.com> wrote:
> On 14/08/14 19:16, Philippe Sigaud via Digitalmars-d-learn wrote:
>>
>> Do people here get good results from Appender? And if yes, how are you using it?
>
>
> An example where it worked for me: http://braingam.es/2013/09/betweenness-centrality-in-dgraph/
>
> (You will have to scroll down a bit to get to the point where appenders get
> introduced.)

I remember reading this and loving it! Any news on this? Do you have new algorithms?
August 15, 2014
On Friday, 15 August 2014 at 08:35:41 UTC, Philippe Sigaud via Digitalmars-d-learn wrote:
>> I wonder if using plain `Array` instead may be result in better performance
>> where immutability is not needed.
>
> Hmm, no:
>
> ...

It is very different with better compiler though :

$ ldmd2 -release -O a.d
$ ./appendertest
Appender.put       :378 ms, 794 μs, and 9 hnsecs
Appender ~=        :378 ms, 416 μs, and 3 hnsecs
Std array          :2 secs, 222 ms, 256 μs, and 2 hnsecs
Std array.reserve  :2 secs, 199 ms, 64 μs, and 5 hnsecs
Array              :349 ms and 250 μs
Array.reserve      :347 ms, 694 μs, and 1 hnsec
new T[]()          :37 ms, 989 μs, and 8 hnsecs
uninitializedArray :39 ms, 333 μs, and 3 hnsecs

(size 10_000)

I am still somewhat disappointed though because I'd expect static Array to get close in performance to pre-allocated array of exact size over many iterations - which doesn't happen. Need to investigate.
August 15, 2014
On Thursday, 14 August 2014 at 18:31:15 UTC, Dicebot wrote:
> I don't know much about Phobos appender implementation details but the key thing with reusable buffer is avoid freeing them. AFAIR Appender.clear frees the allocated memory but `Appender.length = 0` does not, making it possible to just overwrite stuff again and again.
>
> Won't promise you anything though!

Appender has no .length property, and .clear does rewind the write pointer without deallocating memory, thus allowing you to reuse the buffer.
August 15, 2014
> It is very different with better compiler though :
>
> $ ldmd2 -release -O a.d
> $ ./appendertest
> Appender.put       :378 ms, 794 μs, and 9 hnsecs
> Appender ~=        :378 ms, 416 μs, and 3 hnsecs
> Std array          :2 secs, 222 ms, 256 μs, and 2 hnsecs
> Std array.reserve  :2 secs, 199 ms, 64 μs, and 5 hnsecs
> Array              :349 ms and 250 μs
> Array.reserve      :347 ms, 694 μs, and 1 hnsec
> new T[]()          :37 ms, 989 μs, and 8 hnsecs
> uninitializedArray :39 ms, 333 μs, and 3 hnsecs
>
> (size 10_000)

OK, same here, except std array gives me only 1.4 s, while the other
timings are about the same. (0.14 alpha2 : ldmd2 -O -inline).
Drat, that means testing on many different compilers. Oh well, let's
start small: pre-allocating, better algorithms, then I'll do real
speed instrumentation.

August 15, 2014
On Friday, 15 August 2014 at 10:31:59 UTC, Dicebot wrote:
> On Friday, 15 August 2014 at 08:35:41 UTC, Philippe Sigaud via Digitalmars-d-learn wrote:
>>> I wonder if using plain `Array` instead may be result in better performance
>>> where immutability is not needed.
>>
>> Hmm, no:
>>
>> ...
>
> It is very different with better compiler though :
>
> $ ldmd2 -release -O a.d
> $ ./appendertest
> Appender.put       :378 ms, 794 μs, and 9 hnsecs
> Appender ~=        :378 ms, 416 μs, and 3 hnsecs
> Std array          :2 secs, 222 ms, 256 μs, and 2 hnsecs
> Std array.reserve  :2 secs, 199 ms, 64 μs, and 5 hnsecs
> Array              :349 ms and 250 μs
> Array.reserve      :347 ms, 694 μs, and 1 hnsec
> new T[]()          :37 ms, 989 μs, and 8 hnsecs
> uninitializedArray :39 ms, 333 μs, and 3 hnsecs
>
> (size 10_000)

T[size] beats all of those on dmd head, though it is inarguably a
bit limiting.
August 15, 2014
On Fri, Aug 15, 2014 at 1:57 PM, Messenger via Digitalmars-d-learn <digitalmars-d-learn@puremagic.com> wrote:

> T[size] beats all of those on dmd head, though it is inarguably a bit limiting.

I confirm (even with 2.065). With ldc2 it's optimized out of the way,
so it gives 0 hnsecs :-)
Hmm, what about a sort of linked list of static arrays, that allocates
a new one when necessary?