Jump to page: 1 2
Thread overview
Is Phobos's Garbage Collector utterly broken? (Phobos vs Tango)
Aug 01, 2007
Vladimir Panteleev
Aug 01, 2007
Vladimir Panteleev
Aug 01, 2007
Sean Kelly
Aug 01, 2007
Deewiant
Aug 01, 2007
Sean Kelly
Aug 01, 2007
Vladimir Panteleev
Aug 01, 2007
Sean Kelly
Aug 02, 2007
kenny
Aug 02, 2007
Sean Kelly
Aug 03, 2007
Daniel Keep
Aug 03, 2007
Brad Anderson
Aug 02, 2007
Dave
Aug 02, 2007
Sean Kelly
Aug 06, 2007
Jascha Wetzel
Aug 06, 2007
Sean Kelly
Aug 06, 2007
Jascha Wetzel
Sep 17, 2007
Jascha Wetzel
Aug 06, 2007
Sean Kelly
Aug 07, 2007
Vladimir Panteleev
Aug 07, 2007
Sean Kelly
August 01, 2007
Attached is a simple program which creates 326*100000 objects, and periodically prints out progress. I initially wrote it to try to find a memory leak in Tango's GC (which was actually fixed at some point).

The results of this program are quite interesting.
The Tango version runs in 65-75 seconds on my machine, and never seems to take more than 3MB of RAM.
The Phobos version, however, runs much more poorly - it starts to consume hundreds of MB of RAM immediately, and the fastest run from my tests was 265 seconds. It also behaves quite randomly (memory usage patterns differ each run).
Here's a description of one of the not-so-great runs: it starts off with almost instantly allocating 80 MB of memory, and after a while allocates about 30 more. It runs fine for a while, doing a 1000 in 2-3 seconds - until around 60000 it spontaneously starts eating a lot of memory. It goes on like this until it eats ~340 MB of RAM in all, where it plateaus. At that point, the program slows down to a crawl, doing a 1000 in a few minutes. I've attached a screenshot of the RAM history of this run - I had to stop the program to keep the run's history within view (don't think much would have changed had I left it running, and it could have taken half an hour or more).

What could be the problem of this seemingly random behavior? Is the GC not releasing the memory to the OS on purpose, by estimating that the OS doesn't need it? Or is it a GC bug - if it is, are the warnings Valgrind outputs about Phobos's GC referencing and using undefined memory somehow related to this?

-- 
Best regards,
  Vladimir                          mailto:thecybershadow@gmail.com

August 01, 2007
On Wed, 01 Aug 2007 09:08:16 +0300, Vladimir Panteleev <thecybershadow@gmail.com> wrote:

> I initially wrote it to try to find a memory leak in Tango's GC (which was actually fixed at some point).

Turns out it's still there, and it's the old "binary data" issue with pointer-searching GCs, which was fixed in D/Phobos 1.001 by making the GC type-aware. Check out the attached sample programs for a simple example - the Tango version can't know there are no pointers in its GrowBuffer's data, and thus leaks like crazy, while the Phobos version stays at 13MB.

-- 
Best regards,
  Vladimir                          mailto:thecybershadow@gmail.com

August 01, 2007
Vladimir Panteleev wrote:
> On Wed, 01 Aug 2007 09:08:16 +0300, Vladimir Panteleev <thecybershadow@gmail.com> wrote:
> 
>> I initially wrote it to try to find a memory leak in Tango's GC (which was actually fixed at some point).
> 
> Turns out it's still there, and it's the old "binary data" issue with pointer-searching GCs, which was fixed in D/Phobos 1.001 by making the GC type-aware. Check out the attached sample programs for a simple example - the Tango version can't know there are no pointers in its GrowBuffer's data, and thus leaks like crazy, while the Phobos version stays at 13MB.

It turns out this is because GrowBuffer uses a void[] internally to store data.  The type should probably be changed to byte[].  I'll file a ticket for it.

Sean
August 01, 2007
Sean Kelly wrote:
> Vladimir Panteleev wrote:
>> Turns out it's still there, and it's the old "binary data" issue with pointer-searching GCs, which was fixed in D/Phobos 1.001 by making the GC type-aware. Check out the attached sample programs for a simple example - the Tango version can't know there are no pointers in its GrowBuffer's data, and thus leaks like crazy, while the Phobos version stays at 13MB.
> 
> It turns out this is because GrowBuffer uses a void[] internally to store data.  The type should probably be changed to byte[].  I'll file a ticket for it.
> 

Isn't this also a problem with all the IO stuff that use void[] for everything? It starts right up there at IBuffer and IConduit.

The way I see it: void[] is for "pure memory" that you might want to access in
multiple different ways (akin to unions), ubyte[] is for the traditional model
of "arbitrary data", and byte[] is just weird, I've yet to figure out a use for it.

-- 
Remove ".doesnotlike.spam" from the mail address.
August 01, 2007
Deewiant wrote:
> Sean Kelly wrote:
>> Vladimir Panteleev wrote:
>>> Turns out it's still there, and it's the old "binary data" issue with
>>> pointer-searching GCs, which was fixed in D/Phobos 1.001 by making the
>>> GC type-aware. Check out the attached sample programs for a simple
>>> example - the Tango version can't know there are no pointers in its
>>> GrowBuffer's data, and thus leaks like crazy, while the Phobos version
>>> stays at 13MB.
>> It turns out this is because GrowBuffer uses a void[] internally to
>> store data.  The type should probably be changed to byte[].  I'll file a
>> ticket for it.
>>
> 
> Isn't this also a problem with all the IO stuff that use void[] for everything?
> It starts right up there at IBuffer and IConduit.

Well, passing data around as void[] is fine, it just can't be stored as void[].  Buffer is one of the few objects that actually stores data in this way.

> The way I see it: void[] is for "pure memory" that you might want to access in
> multiple different ways (akin to unions), ubyte[] is for the traditional model
> of "arbitrary data", and byte[] is just weird, I've yet to figure out a use for it.

That sounds like a good distinction.  Feel free to add to the ticket if you think other uses of void[] should be changed to byte[] or ubyte[].


Sean
August 01, 2007
On Wed, 01 Aug 2007 17:48:26 +0300, Sean Kelly <sean@f4.ca> wrote:

> It turns out this is because GrowBuffer uses a void[] internally to store data.  The type should probably be changed to byte[].  I'll file a ticket for it.

Cheers, that indeed fixed it. And now it runs much faster than the Phobos version, too!

What's the reasoning of scanning void[] - why would anyone keep pointers in a void[] since it's supposed to mean "binary non-descript data"?

-- 
Best regards,
  Vladimir                          mailto:thecybershadow@gmail.com
August 01, 2007
Vladimir Panteleev wrote:
> On Wed, 01 Aug 2007 17:48:26 +0300, Sean Kelly <sean@f4.ca> wrote:
> 
>> It turns out this is because GrowBuffer uses a void[] internally to
>> store data.  The type should probably be changed to byte[].  I'll file a
>> ticket for it.
> 
> Cheers, that indeed fixed it. And now it runs much faster than the Phobos version, too!
> 
> What's the reasoning of scanning void[] - why would anyone keep pointers in a void[] since it's supposed to mean "binary non-descript data"?

I think the idea was that a void array may contain /anything/ including structs, in-place constructed classes, array references, etc, and it was easier specifying that void arrays be scanned than expecting the user to call gc.hasPointers() or whatever on the memory block every time a reallocation occurs.


Sean
August 02, 2007
ok you convinced me... I'll switch to tango now -- or at least start heading that direction..

I use dmd 2.003 at the moment (it is fast enough for me, I guess... postgre is the slow one actually), and I noticed that it definitely doesn't work with 2.003 at all.

Another thing that's really awesome about phobos / d is that the documentation is 100% available offline. Personally, I have the worlds worst internet connection on this planet. I would definitely be needing the documentation to be available offline to be able to use tango effectively. I see some doc stuff, but it looks like css and js files (I assume that the html is generated somehow) -- but what about all of the good information on the wiki?

Other than that, I was checking out the docs and tango looks really really good. I really like the threading stuff you've got. There are so many really good features -- I'm actually pretty excited.

So here's the questions:
1. when is dmd 2.0 support? (I guess I can downgrade to dmd 1.0 for some time)
2. will all the documentation be available offline?
3. what other features are way faster than the phobos equiv?

Last thing... I really wish I had buffers and GrowBuffers right now... I could really use them for some cool stuff... They look awesome! Dang... I've been missing out. When I get converted over to tango, I'll provide before and after benchmarks.

Good work guys.

Kenny

Vladimir Panteleev wrote:
> On Wed, 01 Aug 2007 17:48:26 +0300, Sean Kelly <sean@f4.ca> wrote:
> 
>> It turns out this is because GrowBuffer uses a void[] internally to store data.  The type should probably be changed to byte[].  I'll file a ticket for it.
> 
> Cheers, that indeed fixed it. And now it runs much faster than the Phobos version, too!
> 
> What's the reasoning of scanning void[] - why would anyone keep pointers in a void[] since it's supposed to mean "binary non-descript data"?
> 
August 02, 2007
kenny wrote:
> ok you convinced me... I'll switch to tango now -- or at least start heading that direction..
> 
> I use dmd 2.003 at the moment (it is fast enough for me, I guess... postgre is the slow one actually), and I noticed that it definitely doesn't work with 2.003 at all.

Tango hasn't transitioned to D 2.0 yet.  Maintaining parallel versions of the library isn't a terribly appealing notion, though I suppose it may become necessary at some point.  Also, I had been worried that some of the existing features might change (though it seems like this probably won't actually happen), and because the work involved in a port to 2.0 will likely be fairly significant, I don't want to have to do it more than once.

> Another thing that's really awesome about phobos / d is that the documentation is 100% available offline. Personally, I have the worlds worst internet connection on this planet. I would definitely be needing the documentation to be available offline to be able to use tango effectively. I see some doc stuff, but it looks like css and js files (I assume that the html is generated somehow) -- but what about all of the good information on the wiki?

I believe there is some work underway to generate offline information from the wiki, but I'm not sure how far along it is.  I agree that offline documentation is very useful, as I do most of my D programming on the train.

> Other than that, I was checking out the docs and tango looks really really good. I really like the threading stuff you've got. There are so many really good features -- I'm actually pretty excited.
> 
> So here's the questions:
> 1. when is dmd 2.0 support? (I guess I can downgrade to dmd 1.0 for some time)

Not sure.  After the conference I may start looking into it.

> 2. will all the documentation be available offline?

Yes, but no timetable yet.

> 3. what other features are way faster than the phobos equiv?

IO is the stand-out in terms of performance.  I think regex would be faster with a rewrite if we (or someone else) can find the time for it (we use the Phobos version right now).  And I'm not sure if it matters, but I've been told our threading implementation is more robust, and certain calls there are definitely faster than Phobos given the way each are implemented.


Sean
August 02, 2007
Vladimir Panteleev wrote:
> On Wed, 01 Aug 2007 17:48:26 +0300, Sean Kelly <sean@f4.ca> wrote:
> 
>> It turns out this is because GrowBuffer uses a void[] internally to
>> store data.  The type should probably be changed to byte[].  I'll file a
>> ticket for it.
> 
> Cheers, that indeed fixed it. And now it runs much faster than the Phobos version, too!
> 

I thought the Phobos and Tango GC's were basically the same -- has that changed over the last 1/2 year or so?

Thanks,

- Dave

> What's the reasoning of scanning void[] - why would anyone keep pointers in a void[] since it's supposed to mean "binary non-descript data"?
> 
« First   ‹ Prev
1 2