[Issue 5623] Slow GC with large heaps (page 2)

http://d.puremagic.com/issues/show_bug.cgi?id=5623 Sean Kelly <sean@invisibleduck.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED CC| |sean@invisibleduck.org --- Comment #10 from Sean Kelly <sean@invisibleduck.org> 2011-02-22 15:01:51 PST --- I think the separation of pools for large and small allocations is a good thing. In fact, the current collector will return entirely free pools to the OS at the end of a collection cycle, so the two are already logically separate. I can't think of a case where performance would be worse than before, but I'll give the patch a once-over to be sure. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------

http://d.puremagic.com/issues/show_bug.cgi?id=5623 Steven Schveighoffer <schveiguy@yahoo.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |schveiguy@yahoo.com --- Comment #11 from Steven Schveighoffer <schveiguy@yahoo.com> 2011-02-23 20:10:03 PST --- Cursory glance at the patch, it looks like it won't affect array appending. BTW, I had a very similar thought with storing the value to be able to jump back to find the PAGEPLUS start a while ago, but I thought of a different method. First, the Bins value is already stored for every page, it's an int, and we're using exactly 13 of the 4 billion possible values. My idea was to remove B_PAGEPLUS from the enum. If the Bins value was anything other than the given enums, it would be a number of pages to jump back + B_MAX. This saves having to keep/update a separate array. In addition, your statement that we only get 16 TB of space doesn't matter. It means the *jump size* is 16 TB. That is, if you exceed 16 TB of space for a block, then you just store the maximum. The algorithm just has to be adjusted to jump back that amount, then check the page at that location (which will also know how far to jump back), and continue on. Can you imagine how awesome the performance would be on a system with a 16TB block with the linear search? ;) I think this patch should be applied (will be voting shortly). -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------

http://d.puremagic.com/issues/show_bug.cgi?id=5623 Steven Schveighoffer <schveiguy@yahoo.com> changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|nobody@puremagic.com |sean@invisibleduck.org --- Comment #12 from Steven Schveighoffer <schveiguy@yahoo.com> 2011-02-23 20:12:09 PST --- For some reason was marked as assigned, but not to anyone. I'm guessing you wanted it assigned to you Sean, since you change the bug to Assigned? -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------

http://d.puremagic.com/issues/show_bug.cgi?id=5623 --- Comment #13 from David Simcha <dsimcha@yahoo.com> 2011-02-23 20:23:23 PST --- One logistical point: Since I rebuilt my mental model of how the GC works, I've come up with a few other small ideas for optimization. These don't have nearly the impact that this patch does (they only have an effect of a few percent). I think we should make this patch a high priority for the next release, since the effect is huge. For the smaller optimizations, I'll fork the druntime git and commit them as I get around to testing and benchmarking, and we can merge then back later. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------

http://d.puremagic.com/issues/show_bug.cgi?id=5623 --- Comment #14 from David Simcha <dsimcha@yahoo.com> 2011-02-23 20:26:47 PST --- (In reply to comment #11) > In addition, your statement that we only get 16 TB of space doesn't matter. It means the *jump size* is 16 TB. That is, if you exceed 16 TB of space for a block, then you just store the maximum. The algorithm just has to be adjusted to jump back that amount, then check the page at that location (which will also know how far to jump back), and continue on. > I'd rather just assume that, in the near future, noone is ever going to allocate 16 TB in one allocation and in the distant future, we can switch to size_t, though hopefully we'll have a better GC by the time anyone has enough RAM to allocate 16 TB in one allocation. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------

http://d.puremagic.com/issues/show_bug.cgi?id=5623 --- Comment #15 from David Simcha <dsimcha@yahoo.com> 2011-02-24 17:56:20 PST --- https://github.com/dsimcha/druntime/wiki/Druntime-GC-Optimization-Fork -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------

February 25, 2011

[Issue 5623] Slow GC with large heaps

Posted by Steven Schveighoffer
in reply to David Simcha

Permalink

Steven Schveighoffer

Posted in reply to David Simcha

Permalink

http://d.puremagic.com/issues/show_bug.cgi?id=5623



--- Comment #16 from Steven Schveighoffer <schveiguy@yahoo.com> 2011-02-24 18:45:01 PST ---
(In reply to comment #15)
> https://github.com/dsimcha/druntime/wiki/Druntime-GC-Optimization-Fork

from that wiki page:

Also note that a Tree2 benchmark also exists, but it seems to run in either 12 or 0 seconds, randomly, no matter what patches are applied, for reasons I don't understand.

this pertains to bug 5650

I have seen the same anomaly (although mine must be slower hardware, it varies between 20+ and .9 seconds).

My theory is that false pointers are keeping the elements in memory, so the GC never frees them.  It is *definitely* the GC's fullCollect that is causing the slowdown, because while debugging (and printing out every 100 loops), you can ctrl-c to pause, and it's always in the collect cycle.

Basically, the program is so deterministic, that the only think I can think of that changes between good and bad runs is the address space given to the heap by the OS.

I sort of tested this theory by adding this line to the code:

writeln((new int[1]).ptr);

Here are the results of running a bunch of times (the times follow the address
printout):

112E40

real    0m26.723s
user    0m26.554s
sys    0m0.052s
A50E40

real    0m0.911s
user    0m0.908s
sys    0m0.000s
26FE40

real    0m23.852s
user    0m23.737s
sys    0m0.096s
112E40

real    0m20.139s
user    0m20.065s
sys    0m0.040s
58BE40

real    0m19.932s
user    0m19.841s
sys    0m0.080s
EBDE40

real    0m0.897s
user    0m0.880s
sys    0m0.012s
724E40

real    0m25.801s
user    0m25.762s
sys    0m0.024s
3F2E40

real    0m0.907s
user    0m0.904s
sys    0m0.000s
AC9E40

real    0m0.891s
user    0m0.884s
sys    0m0.000s
DA4E40

real    0m0.906s
user    0m0.888s
sys    0m0.016s
26FE40

real    0m29.869s
user    0m29.770s
sys    0m0.084s
799E40

real    0m0.900s
user    0m0.896s
sys    0m0.000s
58DE40

real    0m39.999s
user    0m39.802s
sys    0m0.152s
138E40

real    0m34.000s
user    0m33.906s
sys    0m0.032s
65CE40

real    0m19.246s
user    0m19.201s
sys    0m0.032s
1B0E40

real    0m28.394s
user    0m28.350s
sys    0m0.028s
D62E40

real    0m0.910s
user    0m0.900s
sys    0m0.008s
AB6E40

real    0m0.904s
user    0m0.904s
sys    0m0.000s
26FE40

real    0m38.978s
user    0m38.834s
sys    0m0.124s
367E40

real    0m27.100s
user    0m27.010s
sys    0m0.076s
9DEE40

real    0m0.899s
user    0m0.888s
sys    0m0.004s
112E40

real    0m40.536s
user    0m40.419s
sys    0m0.088s
401E40

real    0m0.901s
user    0m0.896s
sys    0m0.000s
A18E40

real    0m0.911s
user    0m0.900s
sys    0m0.004s
7A1E40

real    0m0.908s
user    0m0.904s
sys    0m0.004s
112E40

real    0m26.441s
user    0m26.330s
sys    0m0.100s
611E40

real    0m23.135s
user    0m23.041s
sys    0m0.068s
3D7E40

real    0m0.905s
user    0m0.900s
sys    0m0.000s
138E40

real    0m38.311s
user    0m38.242s
sys    0m0.044s
112E40

real    0m24.372s
user    0m24.314s
sys    0m0.028s
270E40

real    0m34.142s
user    0m33.998s
sys    0m0.092s
9ACE40

real    0m0.911s
user    0m0.908s
sys    0m0.004s
C8DE40

real    0m0.898s
user    0m0.892s
sys    0m0.000s
284E40

real    0m20.744s
user    0m20.621s
sys    0m0.096s
3E0E40

real    0m0.910s
user    0m0.900s
sys    0m0.004s
386E40

real    0m20.044s
user    0m19.921s
sys    0m0.108s

Most of the time, the smaller the number, the more likely it is to run slowly. There are, however, some outliers.

What does this data mean?  It may mean nothing, but it does seem to strongly correlate with address selection.  I can't think of any other reason the code would be so drastically different from one run to the next.  Does this mean false pointers are the problem?  Not sure, but it's all I can think of for now.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

http://d.puremagic.com/issues/show_bug.cgi?id=5623 David Simcha <dsimcha@yahoo.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |FIXED --- Comment #17 from David Simcha <dsimcha@yahoo.com> 2011-08-27 07:29:44 PDT --- I'm marking this as resolved. My pull request was merged a long time ago and Steve's issue is probably related to false pointers. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------

Forums