[Issue 5813] [patch] std.array.Appender has severe performance and memory leak problems. (page 2) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » Issues » [Issue 5813] [patch] std.array.Appender has severe performance and memory leak problems. (page 2)

December 02, 2011

[Issue 5813] [patch] std.array.Appender has severe performance and memory leak problems.

Posted by Rob Jacques
in reply to Rob Jacques

Rob Jacques

Posted in reply to Rob Jacques

http://d.puremagic.com/issues/show_bug.cgi?id=5813



--- Comment #10 from Rob Jacques <sandford@jhu.edu> 2011-12-02 12:35:48 PST ---
(In reply to comment #9)
> I did some benchmarks with this and some other variants.
> 
> At least for small strings, this code performs worse than the current appender. Judging by the description it's optimized for large blocks of data, but in the case of many small strings it can perform as bad as 50% worse than the current appender for typical cases (e.g. building a few KB of HTML).
>
> Here is my test code:
> 
> http://dump.thecybershadow.net/a05a2c4dc7cd2a8e21b3a447c7eff150/test2.d http://dump.thecybershadow.net/eff5c7ef81e18bf75d8462ffe16a52e4/appender.d
> 
> I was able to get 25% more performance for my test case from the standard appender by adding a method that accepts multiple arguments (which preallocates only once). My code is a hack, but it would be nice to see something like that in the official appender.

Thank you for the additional benchmarks. The primary purpose of my code was to optimize for memory usage, which in turn optimizes for computation/performance (i.e. number of main memory copies and memory allocations). And, I also optimized appender for use as a dynamic buffer. But this is all big-O stuff.

I did include a short string test in my benchmark set while I was optimizing the code; my first implementations (never posted) did have some major slow downs, which I fixed by re-working the put routine. So there still might still be some little-o issues there. And I will definitely add a void put(U...)(U items) overload.

I don't think it would affect this benchmark in any ways, but my appender is somewhat dependent on my patch to put (Issue 5233).

I've run your benchmarks and played with them a little bit. Running the benchmarks sequentially is definitely generating GC warm-up issues. I've recommend running them cold in the future, though the qualitative results are the same and cold runs are the ideal situation for the old appender.

I want to rewrite my __ctfe version appender, since Don added support for structs and new, so I'll also look at the short string performance of appender while I'm at it I'll do some profiling and look into this some more. I'd really like for appender to be the right solution for anything bigger than `hello` ~ `world`.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

December 03, 2011

[Issue 5813] [patch] std.array.Appender has severe performance and memory leak problems.

Posted by Rob Jacques
in reply to Rob Jacques

Rob Jacques

Posted in reply to Rob Jacques

http://d.puremagic.com/issues/show_bug.cgi?id=5813



--- Comment #11 from Rob Jacques <sandford@jhu.edu> 2011-12-02 21:10:23 PST ---
I did some more digging. I reworked the put routine itself and changed the
growth strategy to be as aggressive as the original appender. This got me down
from a 52% slowdown, to a 20% slowdown. I've still have more work to do, but to
put this into perspective, a 20% slowdown is equivalent to adding an extra if
conditional to the put routine.
Again, thanks Vladimir for these benchmarks.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

December 28, 2011

[Issue 5813] [patch] std.array.Appender has severe performance and memory leak problems.

Posted by Vladimir Panteleev
in reply to Rob Jacques

Vladimir Panteleev

Posted in reply to Rob Jacques

http://d.puremagic.com/issues/show_bug.cgi?id=5813



--- Comment #12 from Vladimir Panteleev <thecybershadow@gmail.com> 2011-12-28 14:56:17 PST ---
I spent some more time working on an optimized appender. Things I found:

1) Extra indirection levels are performance killers
2) I failed to create a chained appender (like the one in this patch) that was
faster than a copying one, for my test cases
3) At least on Windows and with short strings, simple slice copy trumps all
memcpy implementations I tried
4) You can write a nextCapacity function with no branches
5) It's better to store an "end" pointer than a "capacity" integer

I put all my attempts and benchmarking code here: https://github.com/CyberShadow/DAppenderResearch

The version I went with is based on fastappender2 in that repo. The final version is here: https://github.com/CyberShadow/ae/blob/master/utils/appender.d

For a chained appender, I would move the tail members into the main structure. My attempt at a fast chained appender is in fastappender4.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

December 29, 2011

[Issue 5813] [patch] std.array.Appender has severe performance and memory leak problems.

Posted by Andrei Alexandrescu
in reply to Rob Jacques

Andrei Alexandrescu

Posted in reply to Rob Jacques

http://d.puremagic.com/issues/show_bug.cgi?id=5813


Andrei Alexandrescu <andrei@metalanguage.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |andrei@metalanguage.com


--- Comment #13 from Andrei Alexandrescu <andrei@metalanguage.com> 2011-12-29 11:58:44 PST ---
Great work! Why not a pull request?

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

December 30, 2011

[Issue 5813] [patch] std.array.Appender has severe performance and memory leak problems.

Posted by Vladimir Panteleev
in reply to Rob Jacques

Vladimir Panteleev

Posted in reply to Rob Jacques

http://d.puremagic.com/issues/show_bug.cgi?id=5813



--- Comment #14 from Vladimir Panteleev <thecybershadow@gmail.com> 2011-12-29 22:28:34 PST ---
My code is more of a research project. It can't be safely copied, only supports items 1 byte in size, and doesn't care much in terms of UTF encoding. So, it can't be a drop-in replacement without a lot of hammering, and even then you have to decide whether you want it to be fast or copyable. I hope my research will be useful for others (like Rob) trying to improve the Phobos appender, though.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

January 02, 2012

[Issue 5813] [patch] std.array.Appender has severe performance and memory leak problems.

Posted by Rob Jacques
in reply to Rob Jacques

Rob Jacques

Posted in reply to Rob Jacques

http://d.puremagic.com/issues/show_bug.cgi?id=5813



--- Comment #15 from Rob Jacques <sandford@jhu.edu> 2012-01-02 12:02:40 PST ---
Vladimir, the code in
https://github.com/CyberShadow/ae/blob/master/utils/appender.d
seems to be under the MPL, which isn't Phobos compatible. What license is the
code in:
https://github.com/CyberShadow/DAppenderResearch
under? If it's not under a Boost compatible license, could you make it
available under one?

Thanks for all this work.

(In reply to comment #12)
> 1) Extra indirection levels are performance killers
I agree.

> 2) I failed to create a chained appender (like the one in this patch) that was
> faster than a copying one, for my test cases
The primary purpose of a chained appender is to minimize memory leaks, memory usage and maximizing large scale performance. That said, I did write a faster chained appender for your benchmarks; however I did this by initializing the appender with a page of memory, which definitely should not be the default behavior. That said, one viable strategy for appender would be to keep 1 page of memory around as a scratch pad.

> 3) At least on Windows and with short strings, simple slice copy trumps all
> memcpy implementations I tried
> 4) You can write a nextCapacity function with no branches
Good to know.

> 5) It's better to store an "end" pointer than a "capacity" integer
I'll look into this, but this you can not make this judgment call based on the performance of a char appender. The issue is that anything with a power of 2 T.sizeof performs integer multiplication/division using shift operations instead of the actual underlying instructions, both of which are very slow.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

January 02, 2012

[Issue 5813] [patch] std.array.Appender has severe performance and memory leak problems.

Posted by Vladimir Panteleev
in reply to Rob Jacques

Vladimir Panteleev

Posted in reply to Rob Jacques

http://d.puremagic.com/issues/show_bug.cgi?id=5813



--- Comment #16 from Vladimir Panteleev <thecybershadow@gmail.com> 2012-01-02 12:10:18 PST ---
(In reply to comment #15)
> Vladimir, the code in
> https://github.com/CyberShadow/ae/blob/master/utils/appender.d
> seems to be under the MPL, which isn't Phobos compatible. What license is the
> code in:
> https://github.com/CyberShadow/DAppenderResearch
> under? If it's not under a Boost compatible license, could you make it
> available under one?

Sure, consider my code in DAppenderResearch public domain. ae is mainly MPL because it was the least restrictive license other contributors agreed on.

> That said, I did write a faster
> chained appender for your benchmarks; however I did this by initializing the
> appender with a page of memory, which definitely should not be the default
> behavior. That said, one viable strategy for appender would be to keep 1 page
> of memory around as a scratch pad.

Can you elaborate on this (or share your code)?

> > 5) It's better to store an "end" pointer than a "capacity" integer
> I'll look into this, but this you can not make this judgment call based on the performance of a char appender. The issue is that anything with a power of 2 T.sizeof performs integer multiplication/division using shift operations instead of the actual underlying instructions, both of which are very slow.

I'm not following your explanation. I don't see how element size plays against my conclusion - in fact, as far as I can see, calculating a "position-after-append" pointer and comparing it to an "end" pointer is going to be faster, because you will need to update the position anyway.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

January 02, 2012

[Issue 5813] [patch] std.array.Appender has severe performance and memory leak problems.

Posted by Rob Jacques
in reply to Rob Jacques

Rob Jacques

Posted in reply to Rob Jacques

http://d.puremagic.com/issues/show_bug.cgi?id=5813



--- Comment #17 from Rob Jacques <sandford@jhu.edu> 2012-01-02 13:16:19 PST ---
> > That said, I did write a faster
> > chained appender for your benchmarks; however I did this by initializing the
> > appender with a page of memory, which definitely should not be the default
> > behavior. That said, one viable strategy for appender would be to keep 1 page
> > of memory around as a scratch pad.
> 
> Can you elaborate on this (or share your code)?

For part 1 (fastest possible 'chained' appender): Simply construct specifying a
large number of elements. i.e.
auto app = Appender!string(4096);
FastestAppender7 seems to do something similar (enum MIN_SIZE  = 4096;)

As for part 2 (fastest practical 'chained' appender) I haven't written it yet. In essence you'd have two TLS variables, a scratch node and memory page and an in use flag.

private static void* __Appender_scratch_node;
private static bool  __Appender_scratch_in_use;
Appender(T) {...}

Then when appender is constructed instead of creating a new node and a little bit of ram each time, a single node and 1 page of memory would be reused. The boolean flag would prevent a second appender from reusing the same scratch pad; (I figure 2+ appenders would be relatively rare). Then, when data is called a single copy would be made of the correct length (Appending after data should also be relatively rare).

> > > 5) It's better to store an "end" pointer than a "capacity" integer
> > I'll look into this, but this you can not make this judgment call based on the performance of a char appender. The issue is that anything with a power of 2 T.sizeof performs integer multiplication/division using shift operations instead of the actual underlying instructions, both of which are very slow.
> 
> I'm not following your explanation. I don't see how element size plays against my conclusion - in fact, as far as I can see, calculating a "position-after-append" pointer and comparing it to an "end" pointer is going to be faster, because you will need to update the position anyway.

I'm referring to the fact that ptr + i => cast(T*)(cast(size_t)ptr +
i*T.sizeof) and ptrA - ptrB => (cast(size_t)ptrA - cast(size_t)ptrB) /
T.sizeof. Which can lead to a subtle performance issues when T.sizeof != 1 or a
power of 2. But, my first code review doesn't reveal any worrying usages in the
primary code path; most cases of ptrA-ptrB seem to be behind rarely used if
conditionals.

P.S.
Regarding your previous assertion:
> 4) You can write a nextCapacity function with no branches
I'm having trouble finding this in the code. All I can find is:

auto newCapacity = newSize < MIN_SIZE ? MIN_SIZE : newSize * 2;

which contains a branch. (i.e. the ?: operator).

Also, I know understand better what you meant by
> 1) Extra indirection levels are performance killers
Unfortunately, your attempt to reduce the number of indirections has broken one of the invariants of Appender; specifically that it is a reference type. Putting cursors, etc. into the Appender struct will result in data stomping.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

January 02, 2012

[Issue 5813] [patch] std.array.Appender has severe performance and memory leak problems.

Posted by Vladimir Panteleev
in reply to Rob Jacques

Vladimir Panteleev

Posted in reply to Rob Jacques

http://d.puremagic.com/issues/show_bug.cgi?id=5813



--- Comment #18 from Vladimir Panteleev <thecybershadow@gmail.com> 2012-01-02 13:32:01 PST ---
(In reply to comment #17)
> For part 1 (fastest possible 'chained' appender): Simply construct specifying a
> large number of elements. i.e.
> auto app = Appender!string(4096);
> FastestAppender7 seems to do something similar (enum MIN_SIZE  = 4096;)

The 4th one does that too, albeit implicitly. MIN_SIZE is half a page, but it will always allocate a little over that; which will cause the GC to return a whole page. MIN_SIZE was chosen to be the smallest size that results in an expandable allocation.

Note that the 7th experiment is the least GC-friendly of the whole.

> As for part 2 (fastest practical 'chained' appender) I haven't written it yet. In essence you'd have two TLS variables, a scratch node and memory page and an in use flag.

Sounds like a nice idea.

> I'm referring to the fact that ptr + i => cast(T*)(cast(size_t)ptr +
> i*T.sizeof) and ptrA - ptrB => (cast(size_t)ptrA - cast(size_t)ptrB) /
> T.sizeof. Which can lead to a subtle performance issues when T.sizeof != 1 or a
> power of 2. But, my first code review doesn't reveal any worrying usages in the
> primary code path; most cases of ptrA-ptrB seem to be behind rarely used if
> conditionals.

Yes, that was my point. Only one multiplication by T.sizeof and one branch are necessary on the hot path when appending a single item (I see that my code doesn't follow this due to its usage of slice copies).

I wonder if putting the "cursorEnd > end" check inside the per-item loop in such cases would be faster - it would be exchanging a multiplication with a branch.

> Regarding your previous assertion:
> > 4) You can write a nextCapacity function with no branches
> I'm having trouble finding this in the code. All I can find is:
> 
> auto newCapacity = newSize < MIN_SIZE ? MIN_SIZE : newSize * 2;
> 
> which contains a branch. (i.e. the ?: operator).

The main idea is in fastappender2.d. The "if" at the start can be replaced with another bitwise or. It doesn't even matter, because that code will not be executed as often to make a significant difference; I stated it more as a curiosity.

> Also, I know understand better what you meant by
> > 1) Extra indirection levels are performance killers
> Unfortunately, your attempt to reduce the number of indirections has broken one of the invariants of Appender; specifically that it is a reference type. Putting cursors, etc. into the Appender struct will result in data stomping.

I know, I said so in my answer to Andrei's comment. This is fine for my uses.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

March 19, 2012

[Issue 5813] [patch] std.array.Appender has severe performance and memory leak problems.

Posted by Rob Jacques
in reply to Rob Jacques

Rob Jacques

Posted in reply to Rob Jacques

http://d.puremagic.com/issues/show_bug.cgi?id=5813



--- Comment #19 from Rob Jacques <sandford@jhu.edu> 2012-03-19 14:02:01 PDT ---
https://github.com/D-Programming-Language/phobos/pull/502

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation