September 16, 2010 Re: More embarrassing microbenchmars for D's GC. | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Steven Schveighoffer | Thank you for that advice. I'm using GDC because it's available from Ubuntu Linux's package system, whereas DMD currently is not. (And the .deb posted on digitalmars.com is only for i386.) Hopefully, D will gain more popularity and more up-to-date tools will be made available. By the way, today I re-compiled the program with a "std.gc.enable;" right before the final "return 0" statement, and it still runs in 0.68 seconds. So either those objects aren't marked as needing garbage collection, or it's really just an issue of keeping the garbage collector from activating too quickly. | |||
September 16, 2010 Re: More embarrassing microbenchmars for D's GC. | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Rounin | Rounin: > Thank you for that advice. I'm using GDC because it's available from Ubuntu Linux's package system, whereas DMD currently is not. (And the .deb posted on digitalmars.com is only for i386.) Hopefully, D will gain more popularity and more up-to-date tools will be made available. On a 32 bit system I find LDC to produce efficient programs, when possible (the GC is similar, in D2 it's essentially the same). > By the way, today I re-compiled the program with a "std.gc.enable;" right before the final "return 0" statement, and it still runs in 0.68 seconds. You may try to disable/enable the GC in the Python code too (on 32 bit systems there's the very good Psyco too). Your benchmark is not portable, so I can't help you find where the performance problem is. When you perform a benchmark it's better to give all source code and all data too, to allow others to reproduce the results and look for performance problems. Keep in mind that D associative arrays are usually slower than Python dicts. Probably you build data structures like associative arrays, and this slows down the GC. If you disable&enable the GC around that build phase, the program is probably fast (so I suggest you to narrow as much as possible the width of the disable/enable span, so you may see where the GC problem is). If you put a exit(0) at the end of the program (to kill final collection) the D program may save more time. In Python 2.7 they have added a GC optimization that I may be used in D too: http://bugs.python.org/issue4074 >The garbage collector now performs better for one common usage pattern: when many objects are being allocated without deallocating any of them. This would previously take quadratic time for garbage collection, but now the number of full garbage collections is reduced as the number of objects on the heap grows. The new logic only performs a full garbage collection pass when the middle generation has been collected 10 times and when the number of survivor objects from the middle generation exceeds 10% of the number of objects in the oldest generation. (Suggested by Martin von Löwis and implemented by Antoine Pitrou; issue 4074.)< >while D's splitlines() wasn't quite working.< >int space = std.regexp.find(line, r"\s");< What do you mean? If there's a bug in splitlines() or split() it's better to add it to Bugzilla, possibly with inlined string to split (no external file to read). splitlines() or split() are simple functions of a module, written in D, so if there's a problem it's usually not too much hard to fix it, they are not built-in methods written in C as in CPython. if(!(path in oldpaths) && !(checksum in oldsums)) In D2 this may be written (unfortunately there is no "and" keyword, Walter doesn't get its usefulness yet): if (path !in oldpaths && checksum !in oldsums) Bye, bearophile | |||
September 16, 2010 Re: More embarrassing microbenchmars for D's GC. | ||||
|---|---|---|---|---|
| ||||
Posted in reply to bearophile | >Keep in mind that D associative arrays are usually slower than Python dicts. Probably you build data structures like associative arrays, and this slows down the GC. If you disable&enable the GC around that build phase, the program is probably fast (so I suggest you to narrow as much as possible the width of the disable/enable span, so you may see where the GC problem is). If you put a exit(0) at the end of the program (to kill final collection) the D program may save more time. This could be the case, as I started noticing the slowdown as I was writing either the second or third block, which is where I build a ton of data structures. Complicating the code by avoiding associative arrays isn't really an option, though, as the whole point of writing the program was comparing D to Python in terms of simplicity. Turning off the garbage collector during time-critical bits is simple enough, though, as long as one knows that it has to be done. >What do you mean? If there's a bug in splitlines() or split() it's better to add >it to Bugzilla, possibly with inlined string to split (no external file to read). >splitlines() or split() are simple functions of a module, written in D, so if >there's a problem it's usually not too much hard to fix it, they are not built-in >methods written in C as in CPython. There's no bug in split() as far as I know, but during development of the program, splitlines() would split the data on other things than newlines. It could be a bug in D, but it could just be a temporary glitch in the code, so I'm not filing a bug. | |||
September 16, 2010 Re: More embarrassing microbenchmars for D's GC. | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Rounin | On Thu, 16 Sep 2010 06:41:14 -0400, Rounin <davidjo@student.matnat.uio.no> wrote: > Thank you for that advice. I'm using GDC because it's available from Ubuntu > Linux's package system, whereas DMD currently is not. (And the .deb posted on > digitalmars.com is only for i386.) Hopefully, D will gain more popularity and more > up-to-date tools will be made available. I've heard that dmd is a bit tricky to set up on a 64-bit system (mine is 32-bit), but it will give more credence to your position to use the latest available tools. > > By the way, today I re-compiled the program with a "std.gc.enable;" right before > the final "return 0" statement, and it still runs in 0.68 seconds. So either those > objects aren't marked as needing garbage collection, or it's really just an issue > of keeping the garbage collector from activating too quickly. The GC runs are not scheduled. Essentially, you are probably encountering a syndrome where the GC runs more than it should. This usually occurs when incrementally allocating large amounts of memory without dropping any of it. The GC colleciton runs when allocating memory cannot find any free memory. Most likely this is what's happening: allocate -> No free memory available -> run GC collection, get nothing back -> allocate more memory from OS -> allocate... Of course, the allocations will succeed for a bit after getting OS memory, but the GC collection is run way too often (as your profiling suggests). Disabling it at the beginning, then enabling it at the end means it's going to run at most once. Essentially, you take out the "run GC collection" step, everything else is the same. Given that 87% of the run time is spent in the collection cycle, it's pretty obvious why this reduced your runtime ;) -Steve | |||
September 16, 2010 Re: More embarrassing microbenchmars for D's GC. | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Rounin | == Quote from Rounin (davidjo@student.matnat.uio.no)'s article > Complicating the code by avoiding associative arrays isn't really an option, though, as the whole point of writing the program was comparing D to Python in terms of simplicity. How about using a library defined associative array? As I've mentioned several times here before, D's current builtin associative array implementation interacts horribly with the GC. This is true both in terms of speed and memory usage. Neither D's GC nor its AA implementation is that bad per se, but the AA implementation seems to expose all the weaknesses in the GC, such that the result when using large AAs is laughably bad. I think eventually we need to completely redesign and rewrite it, or at least put a sealed implementation in std.container that's more geared towards huge AAs. My RandAA can be several times faster than the builtin AA when dealing with huge arrays (purely due to memory management overhead), and some tweaking by others in the community has produced an even faster version. Furthermore, it doesn't create tons of heap fragmentation and false pointers for the GC. I've solved out of memory errors with the builtin AA in some small projects just by swapping in RandAA for it. (See http://dsource.org/projects/aa) | |||
September 16, 2010 Re: More embarrassing microbenchmars for D's GC. | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Steven Schveighoffer | On 16/09/2010 9:53 PM, Steven Schveighoffer wrote:
> On Thu, 16 Sep 2010 06:41:14 -0400, Rounin
> <davidjo@student.matnat.uio.no> wrote:
>
>> Thank you for that advice. I'm using GDC because it's available from
>> Ubuntu
>> Linux's package system, whereas DMD currently is not. (And the .deb
>> posted on
>> digitalmars.com is only for i386.) Hopefully, D will gain more
>> popularity and more
>> up-to-date tools will be made available.
>
> I've heard that dmd is a bit tricky to set up on a 64-bit system (mine
> is 32-bit), but it will give more credence to your position to use the
> latest available tools.
>
>>
>> By the way, today I re-compiled the program with a "std.gc.enable;"
>> right before
>> the final "return 0" statement, and it still runs in 0.68 seconds. So
>> either those
>> objects aren't marked as needing garbage collection, or it's really
>> just an issue
>> of keeping the garbage collector from activating too quickly.
>
> The GC runs are not scheduled. Essentially, you are probably
> encountering a syndrome where the GC runs more than it should. This
> usually occurs when incrementally allocating large amounts of memory
> without dropping any of it.
>
> The GC colleciton runs when allocating memory cannot find any free
> memory. Most likely this is what's happening:
>
> allocate -> No free memory available -> run GC collection, get nothing
> back -> allocate more memory from OS -> allocate...
>
> Of course, the allocations will succeed for a bit after getting OS
> memory, but the GC collection is run way too often (as your profiling
> suggests).
>
> Disabling it at the beginning, then enabling it at the end means it's
> going to run at most once. Essentially, you take out the "run GC
> collection" step, everything else is the same. Given that 87% of the run
> time is spent in the collection cycle, it's pretty obvious why this
> reduced your runtime ;)
>
> -Steve
The redux of this discussion is surely that no matter how "good" the
semantics of a PL are, the proof of the pudding is in the GC. Given
the more-or-less well-founded value vs reference semantics in D, it
might be useful to condider region-based memory management idioms and
integrate these idioms into the language. Perhaps we might see this
in D99 as a measure to circumvent traditional GC issues.
| |||
September 16, 2010 Re: More embarrassing microbenchmars for D's GC. | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Rounin | On 9/16/10 5:41 CDT, Rounin wrote:
> Thank you for that advice. I'm using GDC because it's available from Ubuntu
> Linux's package system, whereas DMD currently is not. (And the .deb posted on
> digitalmars.com is only for i386.) Hopefully, D will gain more popularity and more
> up-to-date tools will be made available.
>
> By the way, today I re-compiled the program with a "std.gc.enable;" right before
> the final "return 0" statement, and it still runs in 0.68 seconds. So either those
> objects aren't marked as needing garbage collection, or it's really just an issue
> of keeping the garbage collector from activating too quickly.
I have noted this trigger happiness of the GC since way back during my doctoral work, and conveyed it several times to Walter and Sean. They haven't gotten around to it, but I carry most of the blame because it was a pressing matter to me, not to them.
There are two issues: one is the GC jumps in too early, and the other is it jumps in too often. A program of mine had a pattern in which it would need a lot of memory for a long time, during which it would do the occasional small string allocation etc. There would still be a lot of collection attempts going on; the GC failed to notice that repeated collections do not recover much memory.
Thanks Rounin for the comparison. I think it's a great test bed for D's collector. If you could share the data too, that would be great.
Andrei
| |||
September 16, 2010 Re: More embarrassing microbenchmars for D's GC. | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu Attachments: | Thank you for those kind words! Hopefully, it'll become a prioritized issue whenever more pressing things are resolved. In the meantime, here are (some slightly anonymized versions of) the MD5 files in question. They're about 9-10 megabytes each, so no wonder it allocates a bit of memory! http://folk.uio.no/davidjo/a.md5 http://folk.uio.no/davidjo/b.md5 | |||
September 16, 2010 Re: More embarrassing microbenchmars for D's GC. | ||||
|---|---|---|---|---|
| ||||
Posted in reply to dsimcha | dsimcha Wrote:
> == Quote from Rounin (davidjo@student.matnat.uio.no)'s article
> > Complicating the code by avoiding associative arrays isn't really an option, though, as the whole point of writing the program was comparing D to Python in terms of simplicity.
>
> How about using a library defined associative array? As I've mentioned several times here before, D's current builtin associative array implementation interacts horribly with the GC. This is true both in terms of speed and memory usage. Neither D's GC nor its AA implementation is that bad per se, but the AA implementation seems to expose all the weaknesses in the GC, such that the result when using large AAs is laughably bad. I think eventually we need to completely redesign and rewrite it, or at least put a sealed implementation in std.container that's more geared towards huge AAs.
The precise scanning patch would probably help tremendously here.
| |||
September 16, 2010 Re: More embarrassing microbenchmars for D's GC. | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Sean Kelly | == Quote from Sean Kelly (sean@invisibleduck.org)'s article
> dsimcha Wrote:
> > == Quote from Rounin (davidjo@student.matnat.uio.no)'s article
> > > Complicating the code by avoiding associative arrays isn't really an option, though, as the whole point of writing the program was comparing D to Python in terms of simplicity.
> >
> > How about using a library defined associative array? As I've mentioned several times here before, D's current builtin associative array implementation interacts horribly with the GC. This is true both in terms of speed and memory usage. Neither D's GC nor its AA implementation is that bad per se, but the AA implementation seems to expose all the weaknesses in the GC, such that the result when using large AAs is laughably bad. I think eventually we need to completely redesign and rewrite it, or at least put a sealed implementation in std.container that's more geared towards huge AAs.
> The precise scanning patch would probably help tremendously here.
Yes, it would but even so, I don't think that a good AA implementation for a GC like D's should require an allocation on **every single insert**. This makes AAs basically unusable in parallel code.
| |||
Copyright © 1999-2021 by the D Language Foundation
Permalink
Reply