Thread overview
[D-runtime] Fix for array append cache holding data hostage
[D-runtime] [phobos] Fix for array append cache holding data hostage
[phobos] Fix for array append cache holding data hostage
Dec 29, 2010
Sean Kelly
December 23, 2010
Cross-posting to both phobos and druntime for a wider audience...

For those of you not familiar with how the array appending cache works, it essentially caches recent lookups of BlkInfo for non-shared array appends.

What this means is that appending to an unshared array does not have to take the global GC lock, and allows a lookup to happen almost immediately.

One of the things that the cache required was to keep that array in memory. Otherwise, there was a possibility that the array would be collected and reallocated.  An append to such an array would result in the stale cache entry being used as the block info, possibly resulting in the wrong start address, wrong size, or wrong flags, all of which could be fatal.  A good prior discussion for this issue is listed in this bug: http://d.puremagic.com/issues/show_bug.cgi?id=3929

In order to solve the problem, I implemented a GC collection hook which runs *after* the mark phase, but before all threads are resumed.  This hook tests all cache entries to see if any of them are about to be collected.  Any entries which are not marked (i.e. about to be collected) are removed from the cache.

I also had to do some funky stuff with the cache since TLS data is also scanned.  So the cache is now in a malloc'd block that is freed on thread exit (and lazily allocated on first append).

One of the best tests that showed this problem (or so I thought) was posted to the newsgroup here: http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D.learn&article_id=22715


However, my patch does *not* fix this problem.  I've found that the true reason for the memory being held is that other data is pointing at the block.  With a 200MB block, this is understandable.  In a 4GB address space, that's 1/20 of the valid address space. Any random aligned 4-byte sequence in the stack or TLS could point to it and prevent it from being collected.

The really bad thing about this is, once the true pointers to that block are gone, it could be lost forever.  The stack data which "points" to it may be very high up on the stack, which means it never gets collected.  It makes me wonder if we might want to provide an unsafe 'realloc' type function for appending that forcefully frees blocks that were used in an append loop.  A 'unique' type qualifier could go a long way here.

In any case, I still want to checkin this fix, because it at least removes the cache as a possible culprit (and fixes 3929).  I don't want to commit the patch until people here agree on the timing.  There is a possibility that it introduces memory corruption bugs (this happened the last time I was tinkering with array appending), so I don't want to destabilize everything if there is good reason.  Given that we just had a release, I think this would be a good time to do it.  I most likely won't get to it this weekend.

So anyone have any objections to me checking this in?

-Steve




December 29, 2010
Well, looks like there is no response to this, I'll commit.  I'll warn you however, that I have not tested on OSX.  Linux and Windows both pass unit tests.

-Steve



----- Original Message ----
> From: Steve Schveighoffer <schveiguy at yahoo.com>
> To: Phobos <phobos at puremagic.com>; d-runtime <d-runtime at puremagic.com>
> Sent: Thu, December 23, 2010 4:08:03 PM
> Subject: [phobos] Fix for array append cache holding data hostage
> 
> Cross-posting to both phobos and druntime for a wider audience...
> 
> For  those of you not familiar with how the array appending cache works, it essentially caches recent lookups of BlkInfo for non-shared array  appends.
> 
> What this means is that appending to an unshared array does not  have to take
>the
>
> global GC lock, and allows a lookup to happen almost  immediately.
> 
> One of the things that the cache required was to keep that  array in memory. Otherwise, there was a possibility that the array  would be collected and reallocated.  An append to such an array would  result in the stale cache entry
>
>
> being used as the block info, possibly  resulting in the wrong start address, wrong size, or wrong flags, all of  which could be fatal.  A good prior discussion for this issue is listed  in this bug: http://d.puremagic.com/issues/show_bug.cgi?id=3929
> 
> In  order to solve the problem, I implemented a GC collection hook which runs *after* the mark phase, but before all threads are resumed.  This hook  tests
>all
>
> cache entries to see if any of them are about to be  collected.  Any entries which are not marked (i.e. about to be  collected) are removed from the cache.
> 
> I also had to do some funky stuff  with the cache since TLS data is also scanned.  So the cache is now in  a malloc'd block that is freed on thread exit
>
>
> (and lazily allocated on first  append).
> 
> One of the best tests that showed this problem (or so I thought)  was posted to
>
>
> the newsgroup here:
>http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D.learn&article_id=22715
>
>5
> 
> 
> However,  my patch does *not* fix this problem.  I've found that the true
>reason
>
> for the memory being held is that other data is pointing at the block.   With a
>
>
> 200MB block, this is understandable.  In a 4GB address space,  that's 1/20 of
>the
>
> valid address space. Any random aligned 4-byte sequence  in the stack or TLS could point to it and prevent it from being  collected.
> 
> The really bad thing about this is, once the true pointers to  that block are gone, it could be lost forever.  The stack data which  "points" to it may be
>very
>
> high up on the stack, which means it never gets  collected.  It makes me wonder
>
>
> if we might want to provide an unsafe  'realloc' type function for appending
>that
>
> forcefully frees blocks that were  used in an append loop.  A 'unique' type qualifier could go a long way  here.
> 
> In any case, I still want to checkin this fix, because it at least  removes the
>
>
> cache as a possible culprit (and fixes 3929).  I don't want  to commit the
>patch
>
> until people here agree on the timing.  There is a  possibility that it introduces memory corruption bugs (this happened the  last time I was tinkering
>
>
> with array appending), so I don't want to  destabilize everything if there is good reason.  Given that we just had  a release, I think this would be a good time to do it.  I most likely  won't get to it this weekend.
> 
> So anyone have any objections to me  checking this in?
> 
> -Steve
> 
> 
> 
> 
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos
> 




December 29, 2010
Sorry, I've been busy.  I'll test... um... soon.

On Dec 29, 2010, at 11:42 AM, Steve Schveighoffer wrote:

> Well, looks like there is no response to this, I'll commit.  I'll warn you however, that I have not tested on OSX.  Linux and Windows both pass unit tests.
> 
> -Steve
> 
> 
> 
> ----- Original Message ----
>> From: Steve Schveighoffer <schveiguy at yahoo.com>
>> To: Phobos <phobos at puremagic.com>; d-runtime <d-runtime at puremagic.com>
>> Sent: Thu, December 23, 2010 4:08:03 PM
>> Subject: [phobos] Fix for array append cache holding data hostage
>> 
>> Cross-posting to both phobos and druntime for a wider audience...
>> 
>> For  those of you not familiar with how the array appending cache works, it essentially caches recent lookups of BlkInfo for non-shared array  appends.
>> 
>> What this means is that appending to an unshared array does not  have to take the
>> 
>> global GC lock, and allows a lookup to happen almost  immediately.
>> 
>> One of the things that the cache required was to keep that  array in memory. Otherwise, there was a possibility that the array  would be collected and reallocated.  An append to such an array would  result in the stale cache entry
>> 
>> 
>> being used as the block info, possibly  resulting in the wrong start address, wrong size, or wrong flags, all of  which could be fatal.  A good prior discussion for this issue is listed  in this bug: http://d.puremagic.com/issues/show_bug.cgi?id=3929
>> 
>> In  order to solve the problem, I implemented a GC collection hook which runs *after* the mark phase, but before all threads are resumed.  This hook  tests all
>> 
>> cache entries to see if any of them are about to be  collected.  Any entries which are not marked (i.e. about to be  collected) are removed from the cache.
>> 
>> I also had to do some funky stuff  with the cache since TLS data is also scanned.  So the cache is now in  a malloc'd block that is freed on thread exit
>> 
>> 
>> (and lazily allocated on first  append).
>> 
>> One of the best tests that showed this problem (or so I thought)  was posted to
>> 
>> 
>> the newsgroup here: http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D.learn&article_id=22715
>> 
>> 5
>> 
>> 
>> However,  my patch does *not* fix this problem.  I've found that the true reason
>> 
>> for the memory being held is that other data is pointing at the block.   With a
>> 
>> 
>> 200MB block, this is understandable.  In a 4GB address space,  that's 1/20 of the
>> 
>> valid address space. Any random aligned 4-byte sequence  in the stack or TLS could point to it and prevent it from being  collected.
>> 
>> The really bad thing about this is, once the true pointers to  that block are gone, it could be lost forever.  The stack data which  "points" to it may be very
>> 
>> high up on the stack, which means it never gets  collected.  It makes me wonder
>> 
>> 
>> if we might want to provide an unsafe  'realloc' type function for appending that
>> 
>> forcefully frees blocks that were  used in an append loop.  A 'unique' type qualifier could go a long way  here.
>> 
>> In any case, I still want to checkin this fix, because it at least  removes the
>> 
>> 
>> cache as a possible culprit (and fixes 3929).  I don't want  to commit the patch
>> 
>> until people here agree on the timing.  There is a  possibility that it introduces memory corruption bugs (this happened the  last time I was tinkering
>> 
>> 
>> with array appending), so I don't want to  destabilize everything if there is good reason.  Given that we just had  a release, I think this would be a good time to do it.  I most likely  won't get to it this weekend.
>> 
>> So anyone have any objections to me  checking this in?
>> 
>> -Steve
>> 
>> 
>> 
>> 
>> _______________________________________________
>> phobos mailing list
>> phobos at puremagic.com
>> http://lists.puremagic.com/mailman/listinfo/phobos
>> 
> 
> 
> 
> 
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos