| Thread overview | ||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
December 10, 2007 A smaller GC benchmark | ||||
|---|---|---|---|---|
| ||||
This tiny benchmark focus on GC performance and shows similar results to the "binary trees" benchmark I have shown here few weeks ago, but it's quite shorter, so it may be more useful. This code isn't meant to represent normal programs with thousands of classes, etc, it just shows something quite limited. This is an adaptation of the "Object Test" that you can find used and discussed here: http://www.twistedmatrix.com/users/glyph/rant/python-vs-java.html http://blog.snaplogic.org/?p=55 http://programming.reddit.com/info/24ynh/comments Note that I test the GC of Phobos only, I don't use Tango yet. Object Test timings (on PIII @ 500 MHz), best of 3 runs (seconds, approximate Mbytes memory used), n=1_000, m=10_000: seconds MB DMD class: 18.95 1.7 GDC class: 17.91 1.8 DMD struct: 11.77 1.7 GDC struct: 12.31 1.8 Python: 37.10 3.1 (ShedSkin version) Psyco: 15.68 3.5 ShedSkin 6.35 1.6 Java (1): 2.19 7.3 Java (2): 2.10 7.8 See below for the sources. You can see the Java is about 9 (~= 18.95 / 2.10) times faster than the program produced by DMD. Even Psyco (the JIT for the turtle-slow language Python) gives faster performance (and the Python GC is simple, it's just a reference count plus cycle detection). You can also see Java (as usual) uses much more RAM (7.8/1.7 ~= 4.6). Note that with some manual GC optimization the running time for the DMD class benchmark can become about one-two seconds smaller, but the memory used can be about 4 MB: import std.gc; class ObjectTest { ObjectTest next; } void main() { for (uint k = 0; k < 50; k++) { std.gc.disable(); for (uint i = 0; i < 20; i++) { auto root = new ObjectTest(); for (uint j = 0; j < 10_000; j++) { root.next = new ObjectTest(); root = root.next; } } std.gc.fullCollect(); } } ------------------- COMPILED/RUN WITH: gcc version 3.4.5 (mingw special) (gdc 0.24, using dmd 1.020) DMD v1.024 gcc version 3.4.2 (mingw-special) javac 1.6.0_03 java version "1.6.0_03" Python 2.5.1 ShedSkin 0.0.25 (shedskin.sourceforge.net , a Python to C++ compiler) ------------------- COMPILATION/RUNNING PARAMETERS: GDC used as: gdc -O3 -s -frelease -finline-functions -ffast-math -fomit-frame-pointer -funroll-loops -march=pentiumpro obj_test_class.d -o obj_test_class_gdc DMD used as: dmd -O -release -inline obj_test_class.d -ofobj_test_class_dmd.exe Java compiled with: javac ObjectTest.java Java (1) run with: java ObjectTest Java (2) run with: java -Xms20m -Xbatch ObjectTest SS and Psyco compiled and run without flags. ------------------- SOURCES: // obj_test_class.d class ObjectTest { ObjectTest next; } void main() { for (uint i = 0; i < 1_000; i++) { auto root = new ObjectTest(); for (uint j = 0; j < 10_000; j++) { root.next = new ObjectTest(); root = root.next; } } } ------------------- // obj_test_struct.d struct ObjectTest { ObjectTest* next; } void main() { for (uint i = 0; i < 1_000; i++) { auto root = new ObjectTest(); for (uint j = 0; j < 10_000; j++) { root.next = new ObjectTest(); root = root.next; } } } ------------------- // ObjectTest.java public class ObjectTest { public ObjectTest next; public static void main(String[] args) { for (int i = 0; i < 1000; i++) { ObjectTest root = new ObjectTest(); for (int j = 0; j < 10000; j++) { root.next = new ObjectTest(); root = root.next; } } } } ------------------- # obj_test.py from psyco.classes import __metaclass__ class ObjectTest: pass def main(): for i in xrange(1000): # N root = ObjectTest() for j in xrange(10000): # M root.next = ObjectTest() root = root.next import psyco; psyco.full() main() ------------------- # obj_test_ss.py class ObjectTest: pass def main(): for i in xrange(1000): # N root = ObjectTest() for j in xrange(10000): # M root.next = ObjectTest() root = root.next main() Bear hugs, bearophile | ||||
December 10, 2007 Re: A smaller GC benchmark | ||||
|---|---|---|---|---|
| ||||
Posted in reply to bearophile | On Mon, 10 Dec 2007 18:56:10 +0200, bearophile <bearophileHUGS@lycos.com> wrote: > > Note that I test the GC of Phobos only, I don't use Tango yet. I did some tests with Tango (head SVN). Specs: Core Duo E6600 @ 2.40GHz per core, running Windows Phobos time: 3.140 seconds Tango time: 1.859 seconds Times are best of 3 runs, like your tests. I didn't measure memory usage, but it's pretty obvious that the Tango GC is significantly faster (in this case, at least) than the Phobos version. -- Best regards, Vladimir mailto:thecybershadow@gmail.com | |||
December 10, 2007 Re: A smaller GC benchmark | ||||
|---|---|---|---|---|
| ||||
Posted in reply to bearophile | bearophile wrote:
> This tiny benchmark focus on GC performance and shows similar results to the "binary trees" benchmark I have shown here few weeks ago, but it's quite shorter, so it may be more useful. This code isn't meant to represent normal programs with thousands of classes, etc, it just shows something quite limited. This is an adaptation of the "Object Test" that you can find used and discussed here:
> http://www.twistedmatrix.com/users/glyph/rant/python-vs-java.html
> http://blog.snaplogic.org/?p=55
> http://programming.reddit.com/info/24ynh/comments
> Note that I test the GC of Phobos only, I don't use Tango yet.
>
>
> Object Test timings (on PIII @ 500 MHz), best of 3 runs (seconds, approximate Mbytes memory used), n=1_000, m=10_000:
> seconds MB
> DMD class: 18.95 1.7
> GDC class: 17.91 1.8
> DMD struct: 11.77 1.7
> GDC struct: 12.31 1.8
> Python: 37.10 3.1 (ShedSkin version)
> Psyco: 15.68 3.5
> ShedSkin 6.35 1.6
> Java (1): 2.19 7.3
> Java (2): 2.10 7.8
>
> See below for the sources. You can see the Java is about 9 (~= 18.95 / 2.10) times faster than the program produced by DMD. Even Psyco (the JIT for the turtle-slow language Python) gives faster performance (and the Python GC is simple, it's just a reference count plus cycle detection). You can also see Java (as usual) uses much more RAM (7.8/1.7 ~= 4.6).
>
> Note that with some manual GC optimization the running time for the DMD class benchmark can become about one-two seconds smaller, but the memory used can be about 4 MB:
Manually calling gc.collect() actually slowed down the app for me. And Tango performs about the same as Phobos, when using the Java test as a reference (ie. my quick test showed D being about 9 times as slow as Java using Tango).
Sean
| |||
December 10, 2007 Re: A smaller GC benchmark | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Vladimir Panteleev | Vladimir Panteleev wrote:
> On Mon, 10 Dec 2007 18:56:10 +0200, bearophile <bearophileHUGS@lycos.com> wrote:
>
>> Note that I test the GC of Phobos only, I don't use Tango yet.
>
> I did some tests with Tango (head SVN).
>
> Specs: Core Duo E6600 @ 2.40GHz per core, running Windows
> Phobos time: 3.140 seconds
> Tango time: 1.859 seconds
>
> Times are best of 3 runs, like your tests.
> I didn't measure memory usage, but it's pretty obvious that the Tango GC is significantly faster (in this case, at least) than the Phobos version.
>
Nifty.
Sean
| |||
December 10, 2007 Re: A smaller GC benchmark | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Vladimir Panteleev | Regarding that Java version, the HotSpot Java compiler FAQ warns to not use code like that: http://java.sun.com/docs/hotspot/HotSpotFAQ.html It says: if you insist on using/writing microbenchmarks like this, you can work around the problem by moving the body of main to a new method and calling it once from main to give the compiler a chance to compile the code, then calling it again in the timing bracket to see how fast HotSpot is. | |||
December 10, 2007 Re: A smaller GC benchmark | ||||
|---|---|---|---|---|
| ||||
Posted in reply to bearophile | bearophile wrote:
> This tiny benchmark focus on GC performance and shows similar results to the "binary trees" benchmark I have shown here few weeks ago, but it's quite shorter, so it may be more useful. This code isn't meant to represent normal programs with thousands of classes, etc, it just shows something quite limited. This is an adaptation of the "Object Test" that you can find used and discussed here:
> http://www.twistedmatrix.com/users/glyph/rant/python-vs-java.html
> http://blog.snaplogic.org/?p=55
> http://programming.reddit.com/info/24ynh/comments
> Note that I test the GC of Phobos only, I don't use Tango yet.
>
>
> Object Test timings (on PIII @ 500 MHz), best of 3 runs (seconds, approximate Mbytes memory used), n=1_000, m=10_000:
> seconds MB
> DMD class: 18.95 1.7
> GDC class: 17.91 1.8
> DMD struct: 11.77 1.7
> GDC struct: 12.31 1.8
> Python: 37.10 3.1 (ShedSkin version)
> Psyco: 15.68 3.5
> ShedSkin 6.35 1.6
> Java (1): 2.19 7.3
> Java (2): 2.10 7.8
>
> See below for the sources. You can see the Java is about 9 (~= 18.95 / 2.10) times faster than the program produced by DMD. Even Psyco (the JIT for the turtle-slow language Python) gives faster performance (and the Python GC is simple, it's just a reference count plus cycle detection). You can also see Java (as usual) uses much more RAM (7.8/1.7 ~= 4.6).
>
> Note that with some manual GC optimization the running time for the DMD class benchmark can become about one-two seconds smaller, but the memory used can be about 4 MB:
I've been wanting for a while to implement a different GC for D taking into account some of the stuff talked about in a few papers I read on non-moving GCs. IMO, D's GCs (Tango & Phobos) are pretty basic, they don't do any sort of sexy trickery.
FWIW, I don't think non-moving GCs will ever be able to outperform a well-tuned generational collector. But I'm not exactly in the know about these things, so I may be wrong.
| |||
December 10, 2007 Re: A smaller GC benchmark | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Robert Fraser | On Mon, 10 Dec 2007 22:48:00 +0200, Robert Fraser <fraserofthenight@gmail.com> wrote: > FWIW, I don't think non-moving GCs will ever be able to outperform a well-tuned generational collector. But I'm not exactly in the know about these things, so I may be wrong. I'm not sure if generational collectors are moving GCs, but since you're comparing them to non-moving GCs I'll add that a moving GC needs full knowledge of type information to work properly. While confusing an integer for a pointer may be a forgiveable mistake for non-moving GCs, it's not for a moving GC. I think D isn't ready for that yet (especially considered that both Phobos and Tango maintainers/users insist on void[] being a type that's allowed to hold pointers). -- Best regards, Vladimir mailto:thecybershadow@gmail.com | |||
December 10, 2007 Re: A smaller GC benchmark | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Vladimir Panteleev | Vladimir Panteleev wrote:
> On Mon, 10 Dec 2007 22:48:00 +0200, Robert Fraser <fraserofthenight@gmail.com> wrote:
>
>> FWIW, I don't think non-moving GCs will ever be able to outperform a
>> well-tuned generational collector. But I'm not exactly in the know about
>> these things, so I may be wrong.
>
> I'm not sure if generational collectors are moving GCs, but since you're comparing them to non-moving GCs I'll add that a moving GC needs full knowledge of type information to work properly. While confusing an integer for a pointer may be a forgiveable mistake for non-moving GCs, it's not for a moving GC. I think D isn't ready for that yet (especially considered that both Phobos and Tango maintainers/users insist on void[] being a type that's allowed to hold pointers).
The void[] thing is a language decision, not a runtime decision. The runtimes for Phobos and Tango rely on TypeInfo.flags & 1 to determine whether a particular type contains pointers. More exact type info for precise scanning would likely require a compiler change as well, since this would have to be added to TypeInfo. All of this would allow for a moving GC, but the GC would still be somewhat conservative because type information would still not be available for the stack (unless the app contained debug information, perhaps). So basically, any blocks pointed to from a location whose type is unknown, the block would effectively be pinned.
Sean
| |||
December 10, 2007 Re: A smaller GC benchmark | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Robert Fraser | Robert Fraser: > I've been wanting for a while to implement a different GC for D taking into account some of the stuff talked about in a few papers I read on non-moving GCs. IMO, D's GCs (Tango & Phobos) are pretty basic, they don't do any sort of sexy trickery. I can see people in this newsgroup don't like to try to adapt already written code. Is this some bad from of NIH (Not Invented Here) syndrome? Maybe you can just take a 5-10-years old GC and adapt it, so we can avoid other 10 years of development and have a better GC soon (from your and other's comments I presume the Java GC can't be adapted to D, but other GCs may be found). Note that ShedSkin uses the Bohem GC: http://www.hpl.hp.com/personal/Hans_Boehm/gc/ From various tests it seems faster than the Phobos GC. Can't it be copied inside Phobos? It's well refined, well debugged, it doesn't leak, it has something like ten years of development, articles written on it, it's used often, etc. And it can be used from C too. Bye, bearophile | |||
December 10, 2007 Re: A smaller GC benchmark | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Vladimir Panteleev | Vladimir Panteleev wrote:
> On Mon, 10 Dec 2007 22:48:00 +0200, Robert Fraser <fraserofthenight@gmail.com> wrote:
>
>> FWIW, I don't think non-moving GCs will ever be able to outperform a
>> well-tuned generational collector. But I'm not exactly in the know about
>> these things, so I may be wrong.
>
> I'm not sure if generational collectors are moving GCs, but since you're comparing them to non-moving GCs I'll add that a moving GC needs full knowledge of type information to work properly. While confusing an integer for a pointer may be a forgiveable mistake for non-moving GCs, it's not for a moving GC. I think D isn't ready for that yet (especially considered that both Phobos and Tango maintainers/users insist on void[] being a type that's allowed to hold pointers).
>
How would you make a non-moving generational GC? Isn't the idea of the generational GC that long-lived objects are moved into a separate part of memory?
| |||
Copyright © 1999-2021 by the D Language Foundation
Permalink
Reply