July 23, 2010 [Issue 3463] Integrate Precise Heap Scanning Into the GC | ||||
---|---|---|---|---|
| ||||
Posted in reply to David Simcha | http://d.puremagic.com/issues/show_bug.cgi?id=3463 --- Comment #40 from Leandro Lucarella <llucax@gmail.com> 2010-07-22 20:26:48 PDT --- Also, I think queryNoSync() subtract the bitmask size twice, because getInfo() already subtract it from the block size. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
July 23, 2010 [Issue 3463] Integrate Precise Heap Scanning Into the GC | ||||
---|---|---|---|---|
| ||||
Posted in reply to David Simcha | http://d.puremagic.com/issues/show_bug.cgi?id=3463 nfxjfg@gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #692 is|0 |1 obsolete| | --- Comment #41 from nfxjfg@gmail.com 2010-07-23 11:33:38 PDT --- Created an attachment (id=695) D1 - patch for dmd for creating pointer bitmasks -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
July 23, 2010 [Issue 3463] Integrate Precise Heap Scanning Into the GC | ||||
---|---|---|---|---|
| ||||
Posted in reply to David Simcha | http://d.puremagic.com/issues/show_bug.cgi?id=3463 nfxjfg@gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #693 is|0 |1 obsolete| | --- Comment #42 from nfxjfg@gmail.com 2010-07-23 11:35:23 PDT --- Created an attachment (id=696) D1 - patch for Tango's runtime to enable precise GC scanning - fix bugs mentioned by comments 38-40 - add a bitmask for moveable pointers -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
July 23, 2010 [Issue 3463] Integrate Precise Heap Scanning Into the GC | ||||
---|---|---|---|---|
| ||||
Posted in reply to David Simcha | http://d.puremagic.com/issues/show_bug.cgi?id=3463 --- Comment #43 from nfxjfg@gmail.com 2010-07-23 11:39:19 PDT --- You're right, there seem to be some places where the bitmask size is added or substracted twice. I don't really know; I took that code over from dsimcha's patch without modification. I was just thankful that I hadn't to write this code. I fixed the three issues you found. It seems to work, though I'm not 100% sure if it's correct now. Regarding moving collectors: I think this would be an interesting experiment and worth trying. My patch now has a second bitmask, where each bit tells whether a word is a moveable pointer or not. Instead of a second bitmap, you could just have a per-memory block flag that tells whether a memory block is a void[] and/or contains unions with pointers. (If the flag is set, the GC won't change any pointers inside the block.) You could maintain that flag as additional bitmap (along NOSCAN etc). This way storing small bitmaps inline would be simpler. Unions or fixed-size void[] are probably seldom enough to justify this simplification. gcx.d can choose either way to implement it using the compiler generated bitmasks. Before actually implementing a moving GC, one should experiment how many pinned pointers the stack/manually added ranges/unmovable pointers/datasegment generate and how they would fragment the heap. Also note that some runtime parts are not ready yet for a moving GC, such as Object.toHash. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
July 23, 2010 [Issue 3463] Integrate Precise Heap Scanning Into the GC | ||||
---|---|---|---|---|
| ||||
Posted in reply to David Simcha | http://d.puremagic.com/issues/show_bug.cgi?id=3463 --- Comment #44 from Leandro Lucarella <llucax@gmail.com> 2010-07-23 12:15:44 PDT --- (In reply to comment #43) > You're right, there seem to be some places where the bitmask size is added or substracted twice. I don't really know; I took that code over from dsimcha's patch without modification. I was just thankful that I hadn't to write this code. I fixed the three issues you found. It seems to work, though I'm not 100% sure if it's correct now. Thanks! > Regarding moving collectors: > > I think this would be an interesting experiment and worth trying. My patch now has a second bitmask, where each bit tells whether a word is a moveable pointer or not. > > Instead of a second bitmap, you could just have a per-memory block flag that tells whether a memory block is a void[] and/or contains unions with pointers. (If the flag is set, the GC won't change any pointers inside the block.) You could maintain that flag as additional bitmap (along NOSCAN etc). This way storing small bitmaps inline would be simpler. Unions or fixed-size void[] are probably seldom enough to justify this simplification. gcx.d can choose either way to implement it using the compiler generated bitmasks. Yes, that can be a nice optimization if void[] and unions are rare. > Before actually implementing a moving GC, one should experiment how many pinned pointers the stack/manually added ranges/unmovable pointers/datasegment generate and how they would fragment the heap. Yup! This patch opens that posibility too! Which is great. > Also note that some runtime parts are not ready yet for a moving GC, such as Object.toHash. Yes, this is just a step in the right direction, not a final solution. Thanks again! A bikeshed suggestion, maybe it would be better to call PointerMap methods this way: isPointerAt() -> mustScanWordAt() isMoveablePointerAt() -> isPointerAt() hasUnmoveablePointers() -> canUpdatePointers() Because the pointers are not "moveable", the memory they point at is the one moveable or not. The pointer, if really pointer, are updateable, and if they are not guaranteed to be pointers, we shouldn't call them that. That's why I think is clearer to call a "may be pointer" a word that should be scanned and a guaranteed pointer (which the GC could update) simply a pointer. By the same means, I'd call pointer_bits and mpointer_bits scan_bits and pointer_bits respectively. BTW, there is already a GC attribute called NO_MOVE, which follows this nomenclature (is supposed to be used for blocks that can't be moved, not for blocks which pointers can't be updated). -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
July 25, 2010 [Issue 3463] Integrate Precise Heap Scanning Into the GC | ||||
---|---|---|---|---|
| ||||
Posted in reply to David Simcha | http://d.puremagic.com/issues/show_bug.cgi?id=3463 --- Comment #45 from Leandro Lucarella <llucax@gmail.com> 2010-07-24 20:08:30 PDT --- Well, I've made a little benchmark for the patch. I'm using the voronoi[1] benchmark, since I think is a good GC benchmark, because it exercises the GC a lot, but it does real work with the data (unlike the typical "tree" benchmark). I find it particularly interesting too because it seems to have a lot of false positives, making the runtime very bound to the addresses returned by mmap(), as shown in the post[2] mentioned earlier in this comments. This benchmarks confirmed that the high variance in runtime presented by this test is due to false positives. [1] http://codepad.org/xGDCS3KO [2] http://www.llucax.com.ar/blog/blog/post/-7a56a111 I've compared 3 binaries, one compiled without any of this patches (voronoi-dnp-tnp, i.e. Dmd Not Precise, to be honest, this is DMD 1.062 distributed by DigitalMars), one compiled with a patched DMD, but unpatched Tango to see if the PointerMaps created by the compiler has any effect on performance (voronoi-dp-tnp) and one compiled with both patched DMD and Tango (voronoi-dp-tp). Patched DMD is svn r580 (D1 branch), Tango is svn trunk r5505, and the patches are not the latest ones, are the previous (sorry, I was working on this before you came up with the new patch, I'll try the new patch when I have some time). Anyway, here are the result: $ for i in {1..10}; do /usr/bin/time -f%e ./voronoi-dnp-tnp -n 30000; done 3.88 3.71 4.28 3.89 3.78 3.63 3.81 4.36 3.82 3.89 min = 3.63 mean = 3.905, std = 0.234343053378 max = 4.36 $ for i in {1..10}; do /usr/bin/time -f%e ./voronoi-dp-tnp -n 30000; done 3.74 3.87 3.69 3.91 4.33 4.07 4.26 4.08 3.78 4.32 min = 3.69 mean = 4.005, std = 0.241994031148 max = 4.33 $ for i in {1..10}; do /usr/bin/time -f%e ./voronoi-dp-tp -n 30000; done 4.05 4.03 4.04 4.03 4.02 4.02 4.02 4.02 4.02 4.05 min = 4.02 mean = 4.03, std = 0.0124721912892 max = 4.05 The differences between the patched an unpatched DMD doesn't look relevant, specially taking into account the high variance and that they are not even the same DMD version. Unfortunately the patched GC fall a little (~3%) behind the unpatched one, mean-wise, and a little more if we look at the min (about 10%), but the max drops about the same (~7%) and the variance drops dramatically (~240%) with the patched GC. Considering the precise scanning patch is the first try, and not very optimized, it looks like a tradeoff that worth paying while trying to improve this timings. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
July 25, 2010 [Issue 3463] Integrate Precise Heap Scanning Into the GC | ||||
---|---|---|---|---|
| ||||
Posted in reply to David Simcha | http://d.puremagic.com/issues/show_bug.cgi?id=3463 --- Comment #46 from David Simcha <dsimcha@yahoo.com> 2010-07-25 09:07:00 PDT --- As far as the (small) drop in mean performance, I say "who cares?". In practice, once false pointers start eating you alive and the heap gets unnecessarily huge, everything memory management-related becomes appallingly slow. This fact is probably not shown in the synthetic benchmarks. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
July 25, 2010 [Issue 3463] Integrate Precise Heap Scanning Into the GC | ||||
---|---|---|---|---|
| ||||
Posted in reply to David Simcha | http://d.puremagic.com/issues/show_bug.cgi?id=3463 --- Comment #47 from Leandro Lucarella <llucax@gmail.com> 2010-07-25 13:43:10 PDT --- (In reply to comment #46) > As far as the (small) drop in mean performance, I say "who cares?". In practice, once false pointers start eating you alive and the heap gets unnecessarily huge, everything memory management-related becomes appallingly slow. This fact is probably not shown in the synthetic benchmarks. Yup, as I said in another comment: Running dil to generate the full Tango documentation can take from 50 to 80 seconds depending on the address range returned by the OS (I suspect because of false pointers; which I hope to prove trying this patch :) I'll try to do some testing with dil, but I don't think it will be in the near future. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
July 26, 2010 [Issue 3463] Integrate Precise Heap Scanning Into the GC | ||||
---|---|---|---|---|
| ||||
Posted in reply to David Simcha | http://d.puremagic.com/issues/show_bug.cgi?id=3463 --- Comment #48 from Leandro Lucarella <llucax@gmail.com> 2010-07-25 17:04:56 PDT --- Well timings for dil are much worse :( This is dil generating the Tango docs, without precise scanning (dmd with the last patch, Tango unpatched): 52.01 53.73 52.21 51.79 50.58 52.62 53.01 48.49 51.51 49.68 min = 48.49 mean = 51.563, std = 1.58236847795 max = 53.73 And this is with precise scanning: 79.47 69.81 72.27 67.12 68.34 83.85 65.28 75.17 77.76 74.73 min = 65.28 mean = 73.38, std = 5.91698309013 max = 83.85 I can't explain why is even less stable with precise scanning, maybe is something to do with it being less cache-friendly. I don't know either why is so stable without precise scanning, when in other tests I've done it wasn't like that. Maybe the majority of false positives are from the static memory and the binary was placed in an address that make them less probable (the binary address doesn't change between runs). It would be nice if other people give this patches a try, specially with programs that have problems with false positives... Here are the profiling results (using perf[1]): Non-precise scanning: http://pastebin.com/zRKyc9mW Precise scanning: http://pastebin.com/jPJZLL8p It looks like findPool() might be used much more often than before? For example, I noticed the mixin code calls findPool() very early, so maybe it's being called in some situations where it's not necessary. Also, with the patch sizeOf() needs to call findPool(), but I don't think that should make a lot of difference (unless is used by the runtime very often for array manipulation). -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
July 26, 2010 [Issue 3463] Integrate Precise Heap Scanning Into the GC | ||||
---|---|---|---|---|
| ||||
Posted in reply to David Simcha | http://d.puremagic.com/issues/show_bug.cgi?id=3463 --- Comment #49 from nfxjfg@gmail.com 2010-07-25 17:21:10 PDT --- @Leandro: Thanks for doing these tests! In D1, the runtime calls gc_query on _every_ array append operation. Though the cache in Gcx.getInfo() speeds it up a bit. But it doesn't look like gc_query/getInfo() needs to do more work because of this patch? You also could try to replace the code in the mark function by the old, non-precise one. This way you can see how much overhead the bitmask manipulation code causes. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
Copyright © 1999-2021 by the D Language Foundation