Appender is ... slow (page 2) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » Learn » Appender is ... slow (page 2)

August 14, 2014

Re: Appender is ... slow

Posted by Philippe Sigaud
in reply to Dicebot

Philippe Sigaud

Posted in reply to Dicebot

> Thanks! Repeating what I have mentioned during DConf talk - have you ever considered proposing Pegged for Phobos inclusion? It feels like important bit of infrastructure to me.

At the time, it was considered (rightfully) far too slow and memory-hogging. I think having a generic lexer and a standard D parser in Phobos would already be a great step forward.

August 14, 2014

Re: Appender is ... slow

Posted by Dicebot
in reply to Philippe Sigaud

Dicebot

Posted in reply to Philippe Sigaud

On Thursday, 14 August 2014 at 20:42:08 UTC, Philippe Sigaud via Digitalmars-d-learn wrote:
> You mean by using the shrinkTo method? (Appender does not have a
> length, that's a bit of a bother btw).
> I just did, it does not change anything. Too bad.
>
> Heck, my code is simpler to read and use *and* faster by using a
> bog-standard D array. I'll keep my array for now :)

Oh crap I had std.array.Array in mind which does have `length` exposes. And Appender seems to indeed clear / shrink data in a horrible way:

https://github.com/D-Programming-Language/phobos/blob/master/std/array.d#L2597
https://github.com/D-Programming-Language/phobos/blob/master/std/array.d#L2611

Wow, this thing is real garbage!

August 14, 2014

Re: Appender is ... slow

Posted by Dicebot
in reply to Dicebot

Dicebot

Posted in reply to Dicebot

On Thursday, 14 August 2014 at 20:50:37 UTC, Dicebot wrote:
> Oh crap I had std.array.Array in mind which does have `length` exposes. And Appender seems to indeed clear / shrink data in a horrible way:
>
> https://github.com/D-Programming-Language/phobos/blob/master/std/array.d#L2597
> https://github.com/D-Programming-Language/phobos/blob/master/std/array.d#L2611
>
> Wow, this thing is real garbage!

OK, looks like Appender is written to be fully GC-compatible and usable with immutable data which kind of explain such implementation. But that makes it inherently slow compared to plain `Array` which owns its memory and gets it via malloc.

August 14, 2014

Re: Appender is ... slow

Posted by Jonathan M Davis
in reply to Brad Anderson

Jonathan M Davis

Posted in reply to Brad Anderson

On Thursday, 14 August 2014 at 19:47:33 UTC, Brad Anderson wrote:
> On Thursday, 14 August 2014 at 19:10:18 UTC, Jonathan M Davis wrote:
>> I've never really tried to benchmark it, but it was my understanding that the idea behind Appender was to use it to create the array when you do that via a lot of appending, and then you use it as a normal array and stop using Appender. It sounds like you're trying to use it as a way to manage reusing the array, and I have no idea how it works for that. But then again, I've never actually benchmarked it for just creating arrays via appending. I'd just assumed that it was faster than just using ~=, because that's what it's supposedly for. But maybe I just completely misunderstood what the point of Appender was.
>>
>> - Jonathan M Davis
>
> I too have trouble understanding what Appender does that supposedly makes it faster (at least from the documentation). My old, naive thought was that it was something like a linked list of fixed size arrays so that appends didn't have to move existing elements until you were done appending, at which point it would bake it into a regular dynamic array moving each element only once looking at the code it appeared to be nothing like that (an std::deque with a copy into a vector in c++ terms).
>
> Skimming the code it appears to be more focused on the much more basic "~= always reallocates" performance problem. It seems it boils down to doing essentially this (someone feel free to correct me) in the form of an output range:
>
> auto a = /* some array */;
> auto b = a;
> a = a.array();
> for(...)
>   b.assumeSafeAppend() ~= /* element */;

It was my understanding that that was essentially what it did, but I've never really studied its code, and I don't know if there are any other tricks that it's able to pull. It may be that it really doesn't do anything more than be  wrapper which handles assumeSafeAppend for you correctly. But if that's the case, I wouldn't expect operating on arrays directly to be any faster. Maybe it would be _slightly_ faster, because there are no wrapper functions to inline away, but it wouldn't be much faster, it would ensure that you used assumeSafeAppend correctly. If it's really as much slower as Phillippe is finding, then I have no idea what's going on. Certainly, it merits a bug report and further investigation.

> (assumeSafeAppend's documentation doesn't say whether or not it'll reallocate when capacity is exhausted, I assume it does).

All assumeSafeAppend does is tell the runtime that this particular array is the array the farthest into the memory block (i.e. that any of the slices which referred to farther in the memory block don't exist anymore). So, the value in the runtime that keeps track of the farthest point into the memory block which has been referred to by an array is then set to the end of the array that you passed to assumeSafeAppend. And because it's now the last array in that block, it's safe to append to it without reallocating. But the appending then works the same way that it always does, and it'll reallocate when there's no more capacity. The whole idea is to just make it so that the runtime doesn't think that the memory after that array is unavailable for that array to expand into.

- Jonathan M Davis

August 14, 2014

Re: Appender is ... slow

Posted by safety0ff
in reply to Jonathan M Davis

safety0ff

Posted in reply to Jonathan M Davis

IIRC it manages the capacity information manually instead of calling the runtime which reduces appending overhead.

August 14, 2014

Re: Appender is ... slow

Posted by Jonathan M Davis
in reply to Jonathan M Davis

Jonathan M Davis

Posted in reply to Jonathan M Davis

On Thursday, 14 August 2014 at 21:00:55 UTC, Jonathan M Davis wrote:
> On Thursday, 14 August 2014 at 19:47:33 UTC, Brad Anderson wrote:
>> On Thursday, 14 August 2014 at 19:10:18 UTC, Jonathan M Davis wrote:
>>> I've never really tried to benchmark it, but it was my understanding that the idea behind Appender was to use it to create the array when you do that via a lot of appending, and then you use it as a normal array and stop using Appender. It sounds like you're trying to use it as a way to manage reusing the array, and I have no idea how it works for that. But then again, I've never actually benchmarked it for just creating arrays via appending. I'd just assumed that it was faster than just using ~=, because that's what it's supposedly for. But maybe I just completely misunderstood what the point of Appender was.
>>>
>>> - Jonathan M Davis
>>
>> I too have trouble understanding what Appender does that supposedly makes it faster (at least from the documentation). My old, naive thought was that it was something like a linked list of fixed size arrays so that appends didn't have to move existing elements until you were done appending, at which point it would bake it into a regular dynamic array moving each element only once looking at the code it appeared to be nothing like that (an std::deque with a copy into a vector in c++ terms).
>>
>> Skimming the code it appears to be more focused on the much more basic "~= always reallocates" performance problem. It seems it boils down to doing essentially this (someone feel free to correct me) in the form of an output range:
>>
>> auto a = /* some array */;
>> auto b = a;
>> a = a.array();
>> for(...)
>>  b.assumeSafeAppend() ~= /* element */;
>
> It was my understanding that that was essentially what it did, but I've never really studied its code, and I don't know if there are any other tricks that it's able to pull. It may be that it really doesn't do anything more than be  wrapper which handles assumeSafeAppend for you correctly. But if that's the case, I wouldn't expect operating on arrays directly to be any faster. Maybe it would be _slightly_ faster, because there are no wrapper functions to inline away, but it wouldn't be much faster, it would ensure that you used assumeSafeAppend correctly. If it's really as much slower as Phillippe is finding, then I have no idea what's going on. Certainly, it merits a bug report and further investigation.

Okay. This makes no sense actually. You call assumeSafeAppend after you _shrink_ an array and then want to append to it, not when you're just appending to it. So, assumeSafeAppend wouldn't help with something like

auto app = appender!string();
// Append a bunch of stuff to app
auto str = app.data;

So, it must be doing something else (though it may be using assumeSafeAppend in other functions). I'll clearly have to look over the actual code to have any clue about what it's actually doing.

Though in reference to your idea of using a linked list of arrays, IIRC, someone was looking at changing it to do something like that at some point, but it would drastically change what Appender's data property would do, so I don't know if it would be a good idea to update Appender that way rather than creating a new type. But I don't recall what became of that proposal.

- Jonathan M Davis

August 14, 2014

Re: Appender is ... slow

Posted by Joseph Rushton Wakeling
in reply to Philippe Sigaud

Joseph Rushton Wakeling

Posted in reply to Philippe Sigaud

On 14/08/14 19:16, Philippe Sigaud via Digitalmars-d-learn wrote:
> Do people here get good results from Appender? And if yes, how are you using it?

An example where it worked for me:
http://braingam.es/2013/09/betweenness-centrality-in-dgraph/

(You will have to scroll down a bit to get to the point where appenders get introduced.)

August 14, 2014

Re: Appender is ... slow

Posted by Jonathan M Davis
in reply to safety0ff

Jonathan M Davis

Posted in reply to safety0ff

On Thursday, 14 August 2014 at 21:11:51 UTC, safety0ff wrote:
> IIRC it manages the capacity information manually instead of calling the runtime which reduces appending overhead.

That would make some sense, though it must be completely avoiding ~= then and probably is even GC-mallocing the array itself. Regardless, I clearly need to study the code if I want to know what it's actually doing.

- Jonathan M Davis

August 14, 2014

Re: Appender is ... slow

Posted by Joseph Rushton Wakeling

Joseph Rushton Wakeling

On 14/08/14 23:33, Joseph Rushton Wakeling via Digitalmars-d-learn wrote:
> An example where it worked for me:
> http://braingam.es/2013/09/betweenness-centrality-in-dgraph/

I should add that I don't think I ever explored the case of just using a regular array with assumeSafeAppend.  Now that you've raised the question, I think I ought to try it :)

August 14, 2014

Re: Appender is ... slow

Posted by Jonathan M Davis
in reply to Jonathan M Davis

Jonathan M Davis

Posted in reply to Jonathan M Davis

On Thursday, 14 August 2014 at 21:34:04 UTC, Jonathan M Davis wrote:
> On Thursday, 14 August 2014 at 21:11:51 UTC, safety0ff wrote:
>> IIRC it manages the capacity information manually instead of calling the runtime which reduces appending overhead.
>
> That would make some sense, though it must be completely avoiding ~= then and probably is even GC-mallocing the array itself. Regardless, I clearly need to study the code if I want to know what it's actually doing.

It looks like what it does is essentially to set the array's length to the capacity that the GC gave it and then manage the capacity itself (so, basically what you were suggesting) and essentially avoids the runtime overhead of ~= by reimplementing ~=. Whether it does it in a more efficient manner is an open question, and it also begs the question why it would be cheaper to do it this way rather than in the GC. That's not at all obvious to me at the moment, especially because the code for ensureAddable and put in Appender are both fairly complicated.

So, I really have no idea how Appender fairs in comparison to just using ~=, and I have to wonder why something similar can't be done in the runtime itself if Appender actually is faster. I'd have to spend a lot more time looking into that to figure it out.

- Jonathan M Davis

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation