June 10, 2008 Re: More embarrassing microbenchmars for D's GC. | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Leandro Lucarella | Leandro Lucarella wrote: > Walter Bright, el 9 de junio a las 17:12 me escribiste: >> Leandro Lucarella wrote: >>> But there are a few other results I can't explain: >>> 1) Why is D gc (disabled or not) version ~25% slower than the D version >>> that uses malloc when iterating the list? It shouldn't be any GC >>> activity in that part. Could be some GC locallity issue that yields >>> more cache misses? >> There are so many factors that can influence this, it's hard to say without spending a lot of time carefully examining it. > > But there is no GC activity there right? Instrument it to be sure. > >>> 2) Why is D malloc version ~33% slower than the C version? I took a look >>> at the generated assembly and it's almost identical: >>> <_Dmain+198>: lea -0x20(%ebp),%eax >>> <_Dmain+201>: lea 0x0(%esi,%eiz,1),%esi >>> <_Dmain+208>: addl $0x1,0x8(%eax) >>> <_Dmain+212>: adcl $0x0,0xc(%eax) >>> <_Dmain+216>: mov (%eax),%eax >>> <_Dmain+218>: test %eax,%eax >>> <_Dmain+220>: jne 0x804a240 <_Dmain+208> >>> <main+248>: lea -0x1c(%ebp),%eax >>> <main+251>: nop >>> <main+252>: lea 0x0(%esi,%eiz,1),%esi >>> <main+256>: addl $0x1,0x4(%eax) >>> <main+260>: adcl $0x0,0x8(%eax) >>> <main+264>: mov (%eax),%eax >>> <main+266>: test %eax,%eax >>> <main+268>: jne 0x8048550 <main+256> >>> <main+270>: movl $0x0,0x4(%esp) >>> <main+278>: movl $0x8049800,(%esp) >>> Tests attached. >> Without running a profiler, there's no way to be sure about just where in the code the time is being consumed. > > Attached is the output of the gprof program (for the list-test-d-gc with > the GC running). I don't see any helpful information for the point 2), but > it shows clearly that most of the time is spent in garbage collection. Break up your code into more functions to get better info from the profiler. | |||
June 11, 2008 Re: More embarrassing microbenchmars for D's GC. | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Leandro Lucarella | Leandro Lucarella wrote:
>
> Considering that the version with GC disabled is almost as fast as the
> malloc version, and that the time grows exponentially (graph attached)
> with the number of nodes, it looks like the problem is the GC
> collection cycles to me. Is the GC algorithm exponential?
>
The graph may look exponential but that's often misleading (it may very well be a polynomial). To tell for sure plot the data on a graph with a logarithmic Y axis and then on a graph with both axes logarithmic. If the plot is a straight line on the semilogy graph then it's exponential. If it's a straight line on the loglog graph then it's polinomial.
| |||
June 11, 2008 Re: More embarrassing microbenchmars for D's GC. | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Marius Muja | Marius Muja, el 10 de junio a las 21:38 me escribiste: > Leandro Lucarella wrote: > >Considering that the version with GC disabled is almost as fast as the malloc version, and that the time grows exponentially (graph attached) with the number of nodes, it looks like the problem is the GC collection cycles to me. Is the GC algorithm exponential? > > The graph may look exponential but that's often misleading (it may very well be a polynomial). To tell for sure plot the data on a graph with a logarithmic Y axis and then on a graph with both axes logarithmic. If the plot is a straight line on the semilogy graph then it's exponential. If it's a straight line on the loglog graph then it's polinomial. You and Sean are right, the graph somewhere in between linear and exponential, but is really far from linear, which I think is the worst case in GC algorithms... -- Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/ ---------------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------------- Si pudiera acercarme un poco más, hacia vos Te diría que me tiemblan los dos pies, cuando me mirás Si supieras todo lo que me costó, llegar Hoy sabrías que me cuesta respirar, cuando me mirás | |||
June 11, 2008 Re: More embarrassing microbenchmars for D's GC. | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | Walter Bright, el 10 de junio a las 14:13 me escribiste: > >But there is no GC activity there right? > > Instrument it to be sure. Doesn't seems to be running. But it's weird you can tell for sure being the author :S > >>>2) Why is D malloc version ~33% slower than the C version? I took a look > >>> at the generated assembly and it's almost identical: > >>> <_Dmain+198>: lea -0x20(%ebp),%eax > >>> <_Dmain+201>: lea 0x0(%esi,%eiz,1),%esi > >>> <_Dmain+208>: addl $0x1,0x8(%eax) > >>> <_Dmain+212>: adcl $0x0,0xc(%eax) > >>> <_Dmain+216>: mov (%eax),%eax > >>> <_Dmain+218>: test %eax,%eax > >>> <_Dmain+220>: jne 0x804a240 <_Dmain+208> > >>> <main+248>: lea -0x1c(%ebp),%eax > >>> <main+251>: nop > >>> <main+252>: lea 0x0(%esi,%eiz,1),%esi > >>> <main+256>: addl $0x1,0x4(%eax) > >>> <main+260>: adcl $0x0,0x8(%eax) > >>> <main+264>: mov (%eax),%eax > >>> <main+266>: test %eax,%eax > >>> <main+268>: jne 0x8048550 <main+256> > >>> <main+270>: movl $0x0,0x4(%esp) > >>> <main+278>: movl $0x8049800,(%esp) > >>>Tests attached. > >>Without running a profiler, there's no way to be sure about just where in the code the time is being consumed. > >Attached is the output of the gprof program (for the list-test-d-gc with the GC running). I don't see any helpful information for the point 2), but it shows clearly that most of the time is spent in garbage collection. > > Break up your code into more functions to get better info from the profiler. Is this useful (kcachegrind screenshoots, using callgrind with --dump-instr=yes and --trace-jump=yes): http://llucax.com.ar/~luca/kcachegrind-d.png http://llucax.com.ar/~luca/kcachegrind-c.png -- Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/ ---------------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------------- De tan fina la condesa, por no cagarse, reza. -- Ricardo Vaporeso | |||
June 25, 2008 Re: More embarrassing microbenchmars for D's GC. | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Leandro Lucarella | Leandro Lucarella wrote:
> I've done a very simple microbenchmark testing a single linked list.
>
> What the test does is fill the list with long elements, and then iterate
> the list and increment the associated value. This test mostly test the
> allocation speed.
>
> I've done plain a C version, a C++ version (using STL double-linked list,
> so it has some disvantage here), a D version without using the GC, a D
> version using the GC, and a Python version.
>
> Here are the result (Celeron ~2GHz, 1M elements, average from 3 runs):
>
> C C++ D no gc D gc D gc dis Python
> Fill 0.136136 0.142705 0.131238 22.628577 0.242637 12.952816
> Inc 0.025134 0.038768 0.037456 0.050480 0.051545 3.765271
> Total 0.161270 0.181473 0.168694 22.679057 0.294182 16.718087
>
> Results are in seconds, compiled with gcc/g++/gdc (-O3 -finline, plus
> -frelease for D code). Using phobos (gdc 0.25-20080419-4.1.2-22, with
> debian patches).
>
> You can see Python is almost twice faster than D doing allocation (and the
> python example has some intentional overhead to make it as close as D code
> as possible, using more Pythonic code could yield better results).
Python used reference counting (~ Python 1.x) in the past, but I think
they switched to a tracing garbage collector in 2.x. Using reference
counting would explain the speedup IMHO. I think any mark and sweep
garbage collector would be somewhat slow in the case of a linked list,
as it has to traverse all elements to determine which memory is live and
which can be freed (so it's O(n)). You'd be probably better off using a
dynamic array here.
Regards,
Michael
| |||
June 25, 2008 Re: More embarrassing microbenchmars for D's GC. | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Michael Neumann | "Michael Neumann" <mneumann@ntecs.de> wrote in message news:4862675F.3020603@ntecs.de... > Python used reference counting (~ Python 1.x) in the past, but I think they switched to a tracing garbage collector in 2.x. Using reference counting would explain the speedup IMHO. I think any mark and sweep garbage collector would be somewhat slow in the case of a linked list, as it has to traverse all elements to determine which memory is live and which can be freed (so it's O(n)). You'd be probably better off using a dynamic array here. Maybe they just didn't change the API for backwards compatibility, but all the Python 2.x source I've seen used reference counting macros. Maybe it's a hybrid memory manager? Or maybe the macros are now no-ops ;) | |||
June 25, 2008 Re: More embarrassing microbenchmars for D's GC. | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Michael Neumann | Michael Neumann: > Python used reference counting (~ Python 1.x) in the past, but I think they switched to a tracing garbage collector in 2.x. Python 2.5.2 uses a reference counting GC plus logic to break reference cycles, part of the code: http://svn.python.org/view/python/trunk/Modules/gcmodule.c?rev=64048&view=markup You may want to take a look at this page of the results too (Psyco uses the same GC, it's not another language): http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=72520 Bye, bearophile | |||
June 25, 2008 Re: More embarrassing microbenchmars for D's GC. | ||||
|---|---|---|---|---|
| ||||
Posted in reply to bearophile | "bearophile" <bearophileHUGS@lycos.com> wrote in message news:g3tqs0$57g$1@digitalmars.com... > Michael Neumann: >> Python used reference counting (~ Python 1.x) in the past, but I think they switched to a tracing garbage collector in 2.x. > > Python 2.5.2 uses a reference counting GC plus logic to break reference > cycles, part of the code: > http://svn.python.org/view/python/trunk/Modules/gcmodule.c?rev=64048&view=markup And there it is :) | |||
September 15, 2010 Re: More embarrassing microbenchmars for D's GC. | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Leandro Lucarella | I found this posting after investigating a huge slowdown in a D program I'd made using gprof, which showed the culprit to be a function called _D3gcx3Gcx4markMFPvPvZv, in which the program spent 87% of its time. That turned out to be the garbage collector. The program takes 1.74 seconds to run with two particular input files in Python. The garbage collected D version takes 5.27 seconds. The non-garbage collected version takes only 0.68 seconds. Here's a more detailed report: http://rounin.livejournal.com/21815.html Obviously, the compiled program should run at least as fast as Python, so the garbage collector could use some improvement. | |||
September 15, 2010 Re: More embarrassing microbenchmars for D's GC. | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Rounin | On Wed, 15 Sep 2010 15:53:38 -0400, Rounin <davidjo@student.matnat.uio.no> wrote: > I found this posting after investigating a huge slowdown in a D program I'd made > using gprof, which showed the culprit to be a function called > _D3gcx3Gcx4markMFPvPvZv, in which the program spent 87% of its time. That turned > out to be the garbage collector. > > The program takes 1.74 seconds to run with two particular input files in Python. > The garbage collected D version takes 5.27 seconds. > The non-garbage collected version takes only 0.68 seconds. > > Here's a more detailed report: > > http://rounin.livejournal.com/21815.html > > Obviously, the compiled program should run at least as fast as Python, so the > garbage collector could use some improvement. gdc is very old. You may want to try the thing on the latest dmd. No guarantee this is going to help though. Another thing you may want to try is dcollections HashMap instead of builtin associative arrays, as they use a free-list style allocator that helps avoid GC collections. I agree the GC is a major issue with performance, I hope after D2 is a bit more stable, we start working on improving it. -Steve | |||
Copyright © 1999-2021 by the D Language Foundation
Permalink
Reply