Thread overview
GC behavior and Allocators
Jul 03, 2014
Brian Schott
Jul 03, 2014
safety0ff
Jul 04, 2014
safety0ff
Jul 04, 2014
Chris Cain
Jul 04, 2014
David Nadlinger
Jul 06, 2014
Brian Schott
Jul 06, 2014
safety0ff
Jul 04, 2014
safety0ff
Jul 04, 2014
Kagamin
July 03, 2014
Experience has shown that using allocators can drastically improve the execution time of D programs. It has also shown that the biggest issue with allocators is that unless you are very careful, the GC will start freeing your live memory.

I think that we need to define the behavior of core.memory.GC.addRange more precisely. Consider the following pseudo-code:

auto a = allocate(aSize);
GC.addRange(a, aSize);
auto a = reallocate(a, aSize * 2);
GC.addRange(a, aSize * 2);

What does this do? Well, if the proxy pointer inside of the GC is null, it creates two entries. If it's not null and the default GC implementation is being used, it results in one entry being present. This single entry will only contain the smaller size, leaving the GC free to collect memory pointed to by the second half of your array.

How about removing? Again, it depends. If the default GC is being used there can only be one entry per pointer, so removeRange does exactly what you expect. If the proxy pointer is null, gc_removeRange falls back to an implementation that scans through its ranges array and overwrites the first entry containing a matching pointer with the last entry in the array, and then decrementing the array length. This reordering means if you have multiple entries for various pointers, you don't know which one is being removed when you call removeRange. This is fine if you like to practice luck-oriented programming.

What about calling removeRange and then addRange immediately? This will only work in single-threaded code. In a multi-threaded program the GC can run between the calls to removeRange and addRange.

I think that the only sane way to solve this is to define in the specs for core.memory that GC.addRange will only ever store one entry per pointer, and that the length will be the value of "sz" from the most recent call to addRange.

Thoughts?
July 03, 2014
On Thursday, 3 July 2014 at 21:53:25 UTC, Brian Schott wrote:
> I think that the only sane way to solve this is to define in the specs for core.memory that GC.addRange will only ever store one entry per pointer, and that the length will be the value of "sz" from the most recent call to addRange.
>
> Thoughts?

Easy to change the behaviour of the 2.066 GC, just force update instead of ignoring duplicates:
https://github.com/D-Programming-Language/druntime/blob/master/src/rt/util/container/treap.d#L138

I feel there are instances where core.memory is underspecified.
When I was implementing the update for addRange from array to treap it was considered "undefined behaviour" to addRange twice with the same base pointer without first removeRange'ing the second. That's why the code currently ignores dups.

Boehm GC on the other hand uses an array similar to the previous version of the D2 GC, except that it has code for properly merging duplicates (this has performance implications though.)

Please keep sharing your insight into practical use of these APIs. :)
July 04, 2014
On Thursday, 3 July 2014 at 21:53:25 UTC, Brian Schott wrote:
>
> I think that the only sane way to solve this is to define in the specs for core.memory that GC.addRange will only ever store one entry per pointer, and that the length will be the value of "sz" from the most recent call to addRange.
>
> Thoughts?

I just thought a little more about this and you will always have a race.

Consider this code:
auto a = malloc(aSize);
GC.addRange(a, aSize);
auto b = realloc(a, aSize * 2);

If realloc moves the data (a != b) and the GC runs before you can fix up the ranges you can have invalid access during collection.
July 04, 2014
If you reallocate doubling the size, it's likely such reallocs always move, so they should be equivalent to malloc+free, so your code can be

mem2 = alloc(sz*2);
mem2[] = mem1[];
addRange(mem2);
removeRange(mem1);
free(mem1);

if not, you need to lock the GC so that it won't interfere during realloc.
July 04, 2014
On Friday, 4 July 2014 at 10:36:03 UTC, safety0ff wrote:
> I just thought a little more about this and you will always have a race.
>
> Consider this code:
> auto a = malloc(aSize);
> GC.addRange(a, aSize);
> auto b = realloc(a, aSize * 2);
>
> If realloc moves the data (a != b) and the GC runs before you can fix up the ranges you can have invalid access during collection.

http://dlang.org/phobos/core_memory.html#.GC.enable

If GC.enable and GC.disable truly disallowed GC running (or alternative `GC.hard_disable`/`GC.hard_enable` existed that guaranteed such) then you could use that to make sure that the GC didn't collect in the middle of a pair of those calls.

On Friday, 4 July 2014 at 11:54:07 UTC, Kagamin wrote:
> If you reallocate doubling the size, it's likely such reallocs always move, so they should be equivalent to malloc+free, so your code can be
>
> mem2 = alloc(sz*2);
> mem2[] = mem1[];
> addRange(mem2);
> removeRange(mem1);
> free(mem1);
>
> if not, you need to lock the GC so that it won't interfere during realloc.

Is there a way to lock the GC currently?
July 04, 2014
On Friday, 4 July 2014 at 14:47:12 UTC, Chris Cain wrote:
> Is there a way to lock the GC currently?

There are critical regions in core.thread. While in such a region, your thread will never be suspended, effectively also precluding the GC from running. They are a rather dangerous tool though, use them only if absolutely required and on small, controlled pieces of code.

David
July 04, 2014
On Friday, 4 July 2014 at 14:47:12 UTC, Chris Cain wrote:
>
> If GC.enable and GC.disable truly disallowed GC running (or alternative `GC.hard_disable`/`GC.hard_enable` existed that guaranteed such) then you could use that to make sure that the GC didn't collect in the middle of a pair of those calls.

Right, this should be sufficient in the current implementation.

On Friday, 4 July 2014 at 17:05:32 UTC, David Nadlinger wrote:
>
> There are critical regions in core.thread. While in such a region, your thread will never be suspended, effectively also precluding the GC from running. They are a rather dangerous tool though, use them only if absolutely required and on small, controlled pieces of code.

I think this will deadlock on the GC mutex.
July 06, 2014
On Friday, 4 July 2014 at 17:05:32 UTC, David Nadlinger wrote:
> On Friday, 4 July 2014 at 14:47:12 UTC, Chris Cain wrote:
>> Is there a way to lock the GC currently?
>
> There are critical regions in core.thread. While in such a region, your thread will never be suspended, effectively also precluding the GC from running. They are a rather dangerous tool though, use them only if absolutely required and on small, controlled pieces of code.
>
> David

So something like this would work?

void* GCAwareRealloc(void* ptr, size_t size)
{
  import core.thread;
  void* r;
  thread_enterCriticalRegion();
  scope (exit) thread_exitCriticalRegion();
  r = realloc(ptr, size);
  // Assuming that addRange performs an update of size
  GC.addRange(r, size);
  return r;
}
July 06, 2014
On Sunday, 6 July 2014 at 04:58:42 UTC, Brian Schott wrote:
>
> So something like this would work?
>
> void* GCAwareRealloc(void* ptr, size_t size)
> {
>   import core.thread;
>   void* r;
>   thread_enterCriticalRegion();
>   scope (exit) thread_exitCriticalRegion();
>   r = realloc(ptr, size);
>   // Assuming that addRange performs an update of size
>   GC.addRange(r, size);
>   return r;
> }

No, it can cause a deadlock:

Thread 1: thread_enterCriticalRegion();
Thread 2: Takes GC lock.
Thread 1: Tries to take GC lock to add range, but it has to wait.
Thread 2: triggers a collection -> thread_suspendAll().

Thread 2 waits on thread 1 to exit critical section, thread 1 waits on thread 2 to release GC lock.

So the best we can do at the moment is:
void* GCAwareRealloc(void* ptr, size_t size)
{
  void* r;
  GC.disable;
  scope (exit) GC.enable;
  GC.removeRange(ptr);
  r = realloc(ptr, size);
  GC.addRange(r, size);
  return r;
}

With your proposed enhancement we get:
void* GCAwareRealloc(void* ptr, size_t size)
{
  void* r;
  GC.disable;
  scope (exit) GC.enable;
  r = realloc(ptr, size);
  if (r != ptr)
    GC.removeRange(ptr);
  GC.addRange(r, size); // if r == ptr, only updates size
  return r;
}

Which executes a little faster because we sometimes do 1 less GC range operation (plus 1 less GC lock.)