Thread overview
GC and big non-binary trees
Apr 07, 2005
Thomas Kuehne
Apr 07, 2005
Russ Lewis
Apr 08, 2005
Thomas Kuehne
Apr 07, 2005
Georg Wrede
Apr 08, 2005
Thomas Kuehne
April 07, 2005
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I'm faced with huge non-binary trees:
leafs: > 8000000
subleafs:  1-40
depth: > 17

# class Leaf{
#       Leaf parent;
#       Leaf[] members;
# }

Saddly the parent member is required and results in crossreferences. Occasinally some branches will be dropped.

Should I
1) simply drop the branch by removing it's root from the members list,
   (huge GC costs)

2) walk down the dropped branch and set parent to null,
   (medium dropping costs, medium GC costs)

3) or walk down the dropped branch and set parent and members to null?
   (huge dropping costs)

Tests till now indicate that (2) is the apropiate solution for my
current trees (~20000 leafs), but the tree size will significantly.

Any ideas how the GC will react or how to improve branch dropping and GC performance?

Thomas


-----BEGIN PGP SIGNATURE-----

iD8DBQFCVY6S3w+/yD4P9tIRAqRUAJ4iatJd2DarTOUyqsO7DyWsYvL1vwCePlNY
ZCpVDJxy7mjlMIhwFYFTqtA=
=Xps0
-----END PGP SIGNATURE-----
April 07, 2005
(NOTE: All this assumes the current GC, which is a mark-and-sweep GC).

I assume from what you're saying is that the branch is a substantial portion of the tree.

Remember that, when the GC scans, it only scans the *live* portion of the tree.  So, if you disconnect a large subset of the tree (even if that subset still has links back to the live data), then the GC will never scan any of the nodes of the branch.  So it doesn't matter whether you set the 'parent' parameters in them to null or not.

Concept:
The cost of a pass of the GC is proportional to the size of the live set at the time of the sweep.

On the other hand, having a lot of garbage lingering around means that you will more quickly run out of available memory, which causes the GC to run sooner.  So, if you are *certain* that the entire branch is garbage (no outside references into any node), then you could decide to recursively delete all of the nodes in the branch.  That would increase your 'dropping' costs, but would probably result in better long-term performance.  However, it might not be a good option if you're worried about the real-time performance of your drop operation.



If you're worried about the real-time performance of your 'dropping' operation, but don't want to have big GC sweeps either, then you could implement a delayed drop.  To do this, keep a global structure (perhaps as simple as a dynamic array of references) around which holds a list of dropped branches.  During idle time, you iterate over that array, and recursively delete all of the nodes in each dropped branch.  (Or, just zero the array, and kick off a GC pass.)  Thus, dropping a branch is as simple as removing it from its parent and storing a reference to the branch on this array.  If you want, you can also disable the GC during your critical windows, and re-enable it during your idle times.

The effect of this would be to have very fast real-time performance (drops are quick, and GC never runs during critical windows). Allocation in that window would also be pretty fast.  (Since the GC is disabled, it never stops to sweep memory; if it ever lacks memory, it just gets more from the OS.)  Yet, during idle times, you free up a lot of memory.

You would end up with a system where your total memory allocation is M+C, where M is the maximum amount of memory you would need (typically), and C is the amount of memory that is allocated/freed in a real-time window.  During the real-time portion of your program, the GC allocates from C; during the idle portion, you release objects (or the GC collects them), refilling C.



Hope this makes sense, and at least some of it applies to your work :)



Thomas Kuehne wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> I'm faced with huge non-binary trees:
> leafs: > 8000000 subleafs:  1-40
> depth: > 17
> 
> # class Leaf{
> #       Leaf parent;
> #       Leaf[] members;
> # }
> 
> Saddly the parent member is required and results in crossreferences.
> Occasinally some branches will be dropped.
> 
> Should I
> 1) simply drop the branch by removing it's root from the members list,
>    (huge GC costs)
> 
> 2) walk down the dropped branch and set parent to null,
>    (medium dropping costs, medium GC costs)
> 
> 3) or walk down the dropped branch and set parent and members to null?
>    (huge dropping costs)
> 
> Tests till now indicate that (2) is the apropiate solution for my
> current trees (~20000 leafs), but the tree size will significantly.
> 
> Any ideas how the GC will react or how to improve branch dropping
> and GC performance?
> 
> Thomas
>  
> 
> -----BEGIN PGP SIGNATURE-----
> 
> iD8DBQFCVY6S3w+/yD4P9tIRAqRUAJ4iatJd2DarTOUyqsO7DyWsYvL1vwCePlNY
> ZCpVDJxy7mjlMIhwFYFTqtA=
> =Xps0
> -----END PGP SIGNATURE-----
April 07, 2005
Can't you just make a tree of that size with dummy data, and see how it performs?

Thomas Kuehne wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> I'm faced with huge non-binary trees:
> leafs: > 8000000 subleafs:  1-40
> depth: > 17
> 
> # class Leaf{
> #       Leaf parent;
> #       Leaf[] members;
> # }
> 
> Saddly the parent member is required and results in crossreferences.
> Occasinally some branches will be dropped.
> 
> Should I
> 1) simply drop the branch by removing it's root from the members list,
>    (huge GC costs)
> 
> 2) walk down the dropped branch and set parent to null,
>    (medium dropping costs, medium GC costs)
> 
> 3) or walk down the dropped branch and set parent and members to null?
>    (huge dropping costs)
> 
> Tests till now indicate that (2) is the apropiate solution for my
> current trees (~20000 leafs), but the tree size will significantly.
> 
> Any ideas how the GC will react or how to improve branch dropping
> and GC performance?
> 
> Thomas
>  
> 
> -----BEGIN PGP SIGNATURE-----
> 
> iD8DBQFCVY6S3w+/yD4P9tIRAqRUAJ4iatJd2DarTOUyqsO7DyWsYvL1vwCePlNY
> ZCpVDJxy7mjlMIhwFYFTqtA=
> =Xps0
> -----END PGP SIGNATURE-----
April 08, 2005
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Georg Wrede schrieb am Fri, 08 Apr 2005 00:45:10 +0300:
> Can't you just make a tree of that size with dummy data, and see how it performs?

No, I haven't got enough RAM for that.

Thomas


-----BEGIN PGP SIGNATURE-----

iD8DBQFCVhxq3w+/yD4P9tIRAlxgAJ0ash9wrb3v5FQ2EHc61uP7ZKNd3wCgyjR/
hyhmBwR4s0Esf1kZc5KwEZA=
=hUu2
-----END PGP SIGNATURE-----
April 08, 2005
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Russ Lewis schrieb am Thu, 07 Apr 2005 13:34:46 -0700:
> (NOTE: All this assumes the current GC, which is a mark-and-sweep GC).
>
> I assume from what you're saying is that the branch is a substantial portion of the tree.
>
> Remember that, when the GC scans, it only scans the *live* portion of the tree.  So, if you disconnect a large subset of the tree (even if that subset still has links back to the live data), then the GC will never scan any of the nodes of the branch.  So it doesn't matter whether you set the 'parent' parameters in them to null or not.

Arg, why did I thought it was a reference counting GC??? Back to the drawing board!

> Concept:
> The cost of a pass of the GC is proportional to the size of the live set
> at the time of the sweep.
>
> On the other hand, having a lot of garbage lingering around means that you will more quickly run out of available memory, which causes the GC to run sooner.  So, if you are *certain* that the entire branch is garbage (no outside references into any node), then you could decide to recursively delete all of the nodes in the branch.  That would increase your 'dropping' costs, but would probably result in better long-term performance.  However, it might not be a good option if you're worried about the real-time performance of your drop operation.

It's not a realtime application, rather a small memory eating animal! As I know the branch size at dropping time I'll try to delete all leafs in big branches and leaf small branches to the GC.

<snip>

> Hope this makes sense, and at least some of it applies to your work :)

Thanks, you gave me some ideas worth playing around with.

Thomas


-----BEGIN PGP SIGNATURE-----

iD8DBQFCViBj3w+/yD4P9tIRAgCSAJ0c+ZihWUGp7s1mn9tWjJAPKHpdgwCglsY6
XGBF0GNp32znHoE1CKqATG8=
=GVWQ
-----END PGP SIGNATURE-----