April 22, 2009
Leandro Lucarella, el 22 de abril a las 14:19 me escribiste:
> > Unfortunately, that is not efficient. The state used by the GC is in flux during the sweep phase, so there is no easy lockless way to access that data. Having to acquire a lock on every weakref dereference absolutely kills performance.
> 
> The sweep phase don't touch mark bits. I can't possible understand why you should synchronize access to a (temporarily) read-only variable.
> 
> Mark bits are only written when:
> 1) Clearing all the mark bits before marking (this is done with all
>    threads running now, and this is what it should be changed to be done
>    when all threads are suspended)
> 2) Marking (threads are suspended in the current implementation)
> 
> > I'm definitely open to hearing more ideas on how to efficiently implement.
> 
> The only performance penalty of my proposal would be incrementing the time
> the world is stooped a bit (adding the marks cleaning phase there).
> I think is a fair price to pay (being that mark bits are implemented using
> bitmaps, it should be pretty fast resetting all bits to 0) for an easy and
> safe weak references implementations.

I forgot to mention that at allocation time, the mark bit should be set to keep the invariant that all live data has the mark bit set. I don't think this should impose a very big performance penalty either.

-- 
Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/
----------------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------------
Ideas claras, libertad y patriotismo, para mejorar la calidad de vida;
claro que todo con mucha calidad, porque, joven, ideas claras, libertad
o patriotismo tiene cualquiera.
	-- Ricardo Vaporeso
April 22, 2009
Leandro Lucarella Wrote:

> Jason House, el 22 de abril a las 11:32 me escribiste:
> > Leandro Lucarella Wrote:
> > 
> > > Jason House, el 21 de abril a las 21:11 me escribiste:
> > > > Another alternative I came up with is notification when the GC clears or sets a mark on an object.  That would allow the weak ref to know that its information is suspect.  That approach would only required that the mark phase is done with all threads stopped.
> > > 
> > > I think instead of a notification mechanism access to the mark bit can be provided.
> > > 
> > > The mark phase is ran with all thread paused, so you don't need any notifications in that phase because nobody can ask the weak ref object for the underlaying pointer then.
> > > 
> > > When threads are resumed, the mark phase is complete, so the invariant of all live objects having the mark bit set and all the garbage having the bit unset should hold. If the GC guarantee that the mark bit is set *always* when a object is live (the mark bit clearing should be done with all threads stopped and that should be it), except only when the mark phase is running, then, a weak ref implementation can look at the mark bit to safely see if the reference is garbage or not (even before the sweep phase find it and call the finalizer).
> > > 
> > > I think this approach is both simple and efficient, it doesn't have races at all (unless I'm missing something, of course). The only problem is it's too tied to the current GC implementation. But I think any weak ref implementation will, so it looks like weak ref really belong to the GC.
> > 
> > Unfortunately, that is not efficient. The state used by the GC is in flux during the sweep phase, so there is no easy lockless way to access that data. Having to acquire a lock on every weakref dereference absolutely kills performance.
> 
> The sweep phase don't touch mark bits. I can't possible understand why you should synchronize access to a (temporarily) read-only variable.

I'm mostly regurgitating what Sean Kelly said. I know that if the sweep deallocates at an object, the bits marking it become invalid. I don't disagree that reading the bits without a lock should be possible, but I also trust Sean's statement that it isn't easy. I'm trying to find something that doesn't require reworking or rewriting large parts of a garbage collector.

> Mark bits are only written when:
> 1) Clearing all the mark bits before marking (this is done with all
>    threads running now, and this is what it should be changed to be done
>    when all threads are suspended)
> 2) Marking (threads are suspended in the current implementation)

Is that what it does? I realized recently that one could simple toggle a mark bit and avoid a clearing phase altogether.


> > I'm definitely open to hearing more ideas on how to efficiently implement.
> 
> The only performance penalty of my proposal would be incrementing the time
> the world is stooped a bit (adding the marks cleaning phase there).
> I think is a fair price to pay (being that mark bits are implemented using
> bitmaps, it should be pretty fast resetting all bits to 0) for an easy and
> safe weak references implementations.
> 
> -- 
> Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/
> ----------------------------------------------------------------------------
> GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
> ----------------------------------------------------------------------------
> ELLA FUE INFIEL, PERO EX POLOLO PAGÓ
> 	-- TV CHILE

April 22, 2009
Jason House, el 22 de abril a las 13:31 me escribiste:
> > > Unfortunately, that is not efficient. The state used by the GC is in flux during the sweep phase, so there is no easy lockless way to access that data. Having to acquire a lock on every weakref dereference absolutely kills performance.
> > 
> > The sweep phase don't touch mark bits. I can't possible understand why you should synchronize access to a (temporarily) read-only variable.
> 
> I'm mostly regurgitating what Sean Kelly said. I know that if the sweep
> deallocates at an object, the bits marking it become invalid. I don't
> disagree that reading the bits without a lock should be possible, but
> I also trust Sean's statement that it isn't easy. I'm trying to find
> something that doesn't require reworking or rewriting large parts of
> a garbage collector.

Sean's idea was, if I recall correctly, to move the sweep phase into the stop-the-world. My idea is different, and it doesn't need rewriting any large part of the collector. I think the change is pretty small (and can be easily applied to D1/Tango too).

I'm looking at the Tango code right now, and you know what? The unmarking is already done when all threads are paused, so all you need for this to work is a method to expose the mark bit. This can be really easily done by extending the BlkAttr defined values (or using a reserved value if this should not be a standard attribute) and ensuring that the mark bit is set when new pools/bins are allocated.

This patch (Tango) should do that (untested):
------>8------>8------>8------>8------>8------>8------>8------>8------>8------
diff --git a/lib/gc/basic/gcx.d b/lib/gc/basic/gcx.d
index a760465..fd21d44 100644
--- a/lib/gc/basic/gcx.d
+++ b/lib/gc/basic/gcx.d
@@ -81,6 +81,7 @@ private
         NO_SCAN  = 0b0000_0010,
         NO_MOVE  = 0b0000_0100,
         ALL_BITS = 0b1111_1111
+        MARK_BIT = 0b1_0000_0000_0000_0000; // extension to query the mark bit
     }

     extern (C) void* rt_stackBottom();
@@ -519,7 +520,14 @@ class GC
             Pool *pool = gcx.findPool(p);
             assert(pool);

-            gcx.setBits(pool, cast(size_t)(p - pool.baseAddr) / 16, bits);
+           uint biti = cast(size_t)(p - pool.baseAddr) / 16;
+            gcx.setBits(pool, biti, bits);
+           // Set the mark bit to maintain the invariant that live objects
+           // always have the mark bit set, except when the collector is in
+           // the mark phase (useful for weak reference implementations)
+           // This is not done in setBits() to keep this bit read-only for
+           // the user.
+           pool.mark.set(biti)
         }
         return p;
     }
@@ -2576,6 +2584,8 @@ struct Gcx
 //        if (pool.nomove.nbits &&
 //            pool.nomove.test(biti))
 //            bits |= BlkAttr.NO_MOVE;
+        if (pool.mark.test(biti))
+            bits |= BlkAttr.MARK_BIT;
         return bits;
     }

------8<------8<------8<------8<------8<------8<------8<------8<------8<------

This implementation use an "extension" (according to Druntime specs) bit to return the mark bit when querying objects attributes. The mark bit is kept read-only, but I'm not sure about that (there are already plenty of ways to break the GC now, so I don't think security is a good enough reason to ban the user from doing nasty things ;)

If you try it, please tell me how it went.

> > Mark bits are only written when:
> > 1) Clearing all the mark bits before marking (this is done with all
> >    threads running now, and this is what it should be changed to be done
> >    when all threads are suspended)
> > 2) Marking (threads are suspended in the current implementation)
> 
> Is that what it does? I realized recently that one could simple toggle a mark bit and avoid a clearing phase altogether.

Yes, but that would mean to touch all the mark bits in the sweep phase, which can have bad cache behaviour. But being a bitmap that should be fairly cheap. But since this involve bit manipulation operations, and clearing the bitmap can be done very efficiently clearing complete words in one instruction, I don't see a clear gain in this.

I guess just benchmarks can have a final word on this, but I wouldn't expect much differences in performance for both approaches.


-- 
Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/
----------------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------------
More people die from a champagne-cork popping,
than from poison spiders
April 22, 2009
Leandro Lucarella, el 22 de abril a las 16:08 me escribiste:
> Jason House, el 22 de abril a las 13:31 me escribiste:
> > > > Unfortunately, that is not efficient. The state used by the GC is in flux during the sweep phase, so there is no easy lockless way to access that data. Having to acquire a lock on every weakref dereference absolutely kills performance.
> > > 
> > > The sweep phase don't touch mark bits. I can't possible understand why you should synchronize access to a (temporarily) read-only variable.
> > 
> > I'm mostly regurgitating what Sean Kelly said. I know that if the sweep
> > deallocates at an object, the bits marking it become invalid. I don't
> > disagree that reading the bits without a lock should be possible, but
> > I also trust Sean's statement that it isn't easy. I'm trying to find
> > something that doesn't require reworking or rewriting large parts of
> > a garbage collector.
> 
> Sean's idea was, if I recall correctly, to move the sweep phase into the stop-the-world. My idea is different, and it doesn't need rewriting any large part of the collector. I think the change is pretty small (and can be easily applied to D1/Tango too).
> 
> I'm looking at the Tango code right now, and you know what? The unmarking is already done when all threads are paused, so all you need for this to work is a method to expose the mark bit. This can be really easily done by extending the BlkAttr defined values (or using a reserved value if this should not be a standard attribute) and ensuring that the mark bit is set when new pools/bins are allocated.
> 
> This patch (Tango) should do that (untested):
> ------>8------>8------>8------>8------>8------>8------>8------>8------>8------
> diff --git a/lib/gc/basic/gcx.d b/lib/gc/basic/gcx.d
> index a760465..fd21d44 100644
> --- a/lib/gc/basic/gcx.d
> +++ b/lib/gc/basic/gcx.d
> @@ -81,6 +81,7 @@ private
>          NO_SCAN  = 0b0000_0010,
>          NO_MOVE  = 0b0000_0100,
>          ALL_BITS = 0b1111_1111
> +        MARK_BIT = 0b1_0000_0000_0000_0000; // extension to query the mark bit
>      }
> 
>      extern (C) void* rt_stackBottom();
> @@ -519,7 +520,14 @@ class GC
>              Pool *pool = gcx.findPool(p);
>              assert(pool);
> 
> -            gcx.setBits(pool, cast(size_t)(p - pool.baseAddr) / 16, bits);
> +           uint biti = cast(size_t)(p - pool.baseAddr) / 16;
> +            gcx.setBits(pool, biti, bits);
> +           // Set the mark bit to maintain the invariant that live objects
> +           // always have the mark bit set, except when the collector is in
> +           // the mark phase (useful for weak reference implementations)
> +           // This is not done in setBits() to keep this bit read-only for
> +           // the user.
> +           pool.mark.set(biti)
>          }
>          return p;
>      }
> @@ -2576,6 +2584,8 @@ struct Gcx
>  //        if (pool.nomove.nbits &&
>  //            pool.nomove.test(biti))
>  //            bits |= BlkAttr.NO_MOVE;
> +        if (pool.mark.test(biti))
> +            bits |= BlkAttr.MARK_BIT;
>          return bits;
>      }
> 
> ------8<------8<------8<------8<------8<------8<------8<------8<------8<------

Woops! I already spotted a bug there =). The mark bit was only set if the malloc was done passing bits. The solution has a performance penalty of a "extra" pool lookup for malloc where attr == 0.

This is the fixed patch:

------>8------>8------>8------>8------>8------>8------>8------>8------>8------
diff --git a/lib/gc/basic/gcx.d b/lib/gc/basic/gcx.d
index a760465..940bc71 100644
--- a/lib/gc/basic/gcx.d
+++ b/lib/gc/basic/gcx.d
@@ -81,6 +81,7 @@ private
         NO_SCAN  = 0b0000_0010,
         NO_MOVE  = 0b0000_0100,
         ALL_BITS = 0b1111_1111
+        MARK_BIT = 0b1_0000_0000_0000_0000; // extension to query the mark bit
     }

     extern (C) void* rt_stackBottom();
@@ -514,13 +515,18 @@ class GC
         sentinel_init(p, size);
         gcx.log_malloc(p, size);

-        if (bits)
-        {
-            Pool *pool = gcx.findPool(p);
-            assert(pool);
+        Pool *pool = gcx.findPool(p);
+        assert(pool);

-            gcx.setBits(pool, cast(size_t)(p - pool.baseAddr) / 16, bits);
-        }
+        uint biti = cast(size_t)(p - pool.baseAddr) / 16;
+        if (bits)
+            gcx.setBits(pool, biti, bits);
+        // Set the mark bit to maintain the invariant that live objects
+        // always have the mark bit set, except when the collector is in
+        // the mark phase (useful for weak reference implementations)
+        // This is not done in setBits() to keep this bit read-only for
+        // the user.
+        pool.mark.set(biti)
         return p;
     }

@@ -2576,6 +2582,8 @@ struct Gcx
 //        if (pool.nomove.nbits &&
 //            pool.nomove.test(biti))
 //            bits |= BlkAttr.NO_MOVE;
+        if (pool.mark.test(biti))
+            bits |= BlkAttr.MARK_BIT;
         return bits;
     }

------8<------8<------8<------8<------8<------8<------8<------8<------8<------

-- 
Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/
----------------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------------
Bulletproof... I Wish I Was
April 22, 2009
Leandro Lucarella wrote:
> I think instead of a notification mechanism access to the mark bit can be
> provided.
> 
> The mark phase is ran with all thread paused, so you don't need any
> notifications in that phase because nobody can ask the weak ref object for
> the underlaying pointer then.
> 
> When threads are resumed, the mark phase is complete, so the invariant of
> all live objects having the mark bit set and all the garbage having the
> bit unset should hold. If the GC guarantee that the mark bit is set
> *always* when a object is live (the mark bit clearing should be done with
> all threads stopped and that should be it), except only when the mark
> phase is running, then, a weak ref implementation can look at the mark bit
> to safely see if the reference is garbage or not (even before the sweep
> phase find it and call the finalizer).
> 
> I think this approach is both simple and efficient, it doesn't have races
> at all (unless I'm missing something, of course). The only problem is it's
> too tied to the current GC implementation. But I think any weak ref
> implementation will, so it looks like weak ref really belong to the GC.

I think you're missing something. Consider this sequence of events:
- Object A is collected by the GC. (And its mark bit M is cleared)
- Object B is allocated, and gets the memory formerly used by object A. Mark bit M is set.
- A weak reference to A gets used; it checks mark bit M, which is set...

In other words, the "this object was collected" state needs to be stored in the weak reference, not in the object that got collected (or anything tied to its memory location) because that memory may get reused.
April 22, 2009
Frits van Bommel Wrote:

> Leandro Lucarella wrote:
> > I think instead of a notification mechanism access to the mark bit can be provided.
> > 
> > The mark phase is ran with all thread paused, so you don't need any notifications in that phase because nobody can ask the weak ref object for the underlaying pointer then.
> > 
> > When threads are resumed, the mark phase is complete, so the invariant of all live objects having the mark bit set and all the garbage having the bit unset should hold. If the GC guarantee that the mark bit is set *always* when a object is live (the mark bit clearing should be done with all threads stopped and that should be it), except only when the mark phase is running, then, a weak ref implementation can look at the mark bit to safely see if the reference is garbage or not (even before the sweep phase find it and call the finalizer).
> > 
> > I think this approach is both simple and efficient, it doesn't have races at all (unless I'm missing something, of course). The only problem is it's too tied to the current GC implementation. But I think any weak ref implementation will, so it looks like weak ref really belong to the GC.
> 
> I think you're missing something. Consider this sequence of events:
> - Object A is collected by the GC. (And its mark bit M is cleared)
> - Object B is allocated, and gets the memory formerly used by object A. Mark bit
> M is set.
> - A weak reference to A gets used; it checks mark bit M, which is set...
> 
> In other words, the "this object was collected" state needs to be stored in the weak reference, not in the object that got collected (or anything tied to its memory location) because that memory may get reused.

That's where finalizers come in. They can mark the weak ref as invalid so that you don't care what was returned. In my (recent?) pseudo code, the weak ref was decoded, the mark bit was checked, and then the weak ref was decoded again. The final decoding is to solve the race with the finalizer.

Even with such precautions, a lock free implementation has one more problem: what happens if the memory block was released to the os? That's a segfault :( if other data is checked to avoid such situations, that's likely where gc locks come into the picture... I'm pretty sure Sean said querying the attribute bits requires a lock. It may be for this reason. I still have to look at the code though...
April 22, 2009
Jason House wrote:
> Frits van Bommel Wrote:
> 
>> Leandro Lucarella wrote:
>>> I think instead of a notification mechanism access to the mark bit can be
>>> provided.
>>>
>>> The mark phase is ran with all thread paused, so you don't need any
>>> notifications in that phase because nobody can ask the weak ref object for
>>> the underlaying pointer then.
>>>
>>> When threads are resumed, the mark phase is complete, so the invariant of
>>> all live objects having the mark bit set and all the garbage having the
>>> bit unset should hold. If the GC guarantee that the mark bit is set
>>> *always* when a object is live (the mark bit clearing should be done with
>>> all threads stopped and that should be it), except only when the mark
>>> phase is running, then, a weak ref implementation can look at the mark bit
>>> to safely see if the reference is garbage or not (even before the sweep
>>> phase find it and call the finalizer).
>>>
>>> I think this approach is both simple and efficient, it doesn't have races
>>> at all (unless I'm missing something, of course). The only problem is it's
>>> too tied to the current GC implementation. But I think any weak ref
>>> implementation will, so it looks like weak ref really belong to the GC.
>> I think you're missing something. Consider this sequence of events:
>> - Object A is collected by the GC. (And its mark bit M is cleared)
>> - Object B is allocated, and gets the memory formerly used by object A. Mark bit M is set.
>> - A weak reference to A gets used; it checks mark bit M, which is set...
>>
>> In other words, the "this object was collected" state needs to be stored in the weak reference, not in the object that got collected (or anything tied to its memory location) because that memory may get reused.
> 
> That's where finalizers come in. They can mark the weak ref as invalid so that you don't care what was returned. In my (recent?) pseudo code, the weak ref was decoded, the mark bit was checked, and then the weak ref was decoded again. The final decoding is to solve the race with the finalizer.

Will this work if B was allocated in another thread, right after it gets resumed (i.e. possibly before the finalizer has a chance to run)?

... oh, maybe the GC lock gets held even after other threads get resumed, so the allocation will wait for finalizers to run? (just thought of that)

> Even with such precautions, a lock free implementation has one more problem: what happens if the memory block was released to the os? That's a segfault :( if other data is checked to avoid such situations, that's likely where gc locks come into the picture... I'm pretty sure Sean said querying the attribute bits requires a lock. It may be for this reason. I still have to look at the code though...

You mean it would segfault on checking the mark bit? Can't the GC just say the mark bit was not set for any pointer not pointing into a currently-allocated pool?
But yeah, checking if that address is part of an allocated pool may require holding the GC lock...
April 22, 2009
Frits van Bommel, el 22 de abril a las 23:39 me escribiste:
> >>>I think this approach is both simple and efficient, it doesn't have races at all (unless I'm missing something, of course). The only problem is it's too tied to the current GC implementation. But I think any weak ref implementation will, so it looks like weak ref really belong to the GC.
> >>
> >>I think you're missing something. Consider this sequence of events:
> >>- Object A is collected by the GC. (And its mark bit M is cleared)
> >>- Object B is allocated, and gets the memory formerly used by object A. Mark bit M is set.
> >>- A weak reference to A gets used; it checks mark bit M, which is set...
> >>
> >>In other words, the "this object was collected" state needs to be stored in the weak reference, not in the object that got collected (or anything tied to its memory location) because that memory may get reused.
> >
> >That's where finalizers come in. They can mark the weak ref as invalid so that you don't care what was returned. In my (recent?) pseudo code, the weak ref was decoded, the mark bit was checked, and then the weak ref was decoded again. The final decoding is to solve the race with the finalizer.
> 
> Will this work if B was allocated in another thread, right after it gets resumed (i.e. possibly before the finalizer has a chance to run)?
> 
> ... oh, maybe the GC lock gets held even after other threads get resumed, so the allocation will wait for finalizers to run? (just thought of that)

That's exactly what happens. So I'm not seeing any flaws so far with my proposal.

> >Even with such precautions, a lock free implementation has one more problem: what happens if the memory block was released to the os? That's a segfault :( if other data is checked to avoid such situations, that's likely where gc locks come into the picture... I'm pretty sure Sean said querying the attribute bits requires a lock. It may be for this reason. I still have to look at the code though...
> 
> You mean it would segfault on checking the mark bit? Can't the GC just
> say the mark bit was not set for any pointer not pointing into
> a currently-allocated pool?  But yeah, checking if that address is part
> of an allocated pool may require holding the GC lock...

In the patch I posted this is already happening. The mark bit is returned
as part of the BlkAttr belonging to a memory block. If the pointer passed
to gc_getAttr() (or gc_query()) don't point to a GC allocated memory
block, 0 is returned.

I any case, I can't see how that could happen if there are still notifications before freeing the memory block in the sweep phase.

The sequence is like this:

0) All live cells has the mark bit set
1) Acquire the GC lock
2) Stop the world
3) Clear mark bits
4) Mark
5) Start the world
6) Sweep (calling finalizing notification callbacks)
7) Release the GC lock

Then, you are guaranteed to access to valid memory (mark bit or even
the entire memory block) until the notification comes. When the weak
reference receive the notification it invalidates the reference and don't
rely anymore in the mark bit.

But I just found a flaw in my patch (not the proposal), right now, as you
say, the gc_query() and gc_getAttr() functions are acquiring the GC lock.
Some non-locking interface should be exposed that calls to queryNoSync()
and getAttrNoSync(). It should be safe for this limited usage as long as
there are not concurrent calls to setAttr()/clrAttr() to that memory
block (I think).

-- 
Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/
----------------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------------
Instead of doing a wash, I just keep buying underwear. My goal is to have over
360 pair. That way I only have to do wash once a year.
	-- George Constanza
April 23, 2009
I'm finally looking at the gc source code.  From what I can tell, memory is allocated in large chunks (pools), and divided into 16 byte chunks. Attributes are not all packed together like the BlkInfo of the spec implies, so some so access to that stuff is safer than I would have thought.

Here's a weak ref implementation that I just whipped together under the assumption it goes into druntime/src/gc/basic/gcx.d.  I think it is sane.  I haven't scoured the gc's code to ensure everything is legit.  I declared it as a class mostly because value semantics don't make much sense and I'm too lazy/tired to do the sample code as a 100% correct struct definition.

My two biggest concerns:

1. Is the gc lock held while running a constructor? If not and it's needed to do what it needs, where can the hook be added to make the construction of a weak reference special?

2. Will the assignment to noscan get overwritten / hit a race condition?  It is possible to hide the pointer inside a size_t instead, but given the intimate tying to the GC such tricks just make the code tougher to read.

class weakRef(T) if (is(T==class))
{
    this(T _t){
        // set up tracking for the weakly referenced object
        t = _t;
        pool = findPool(t);
        rt_attachDisposeEvent(&t, &sweep);

        // tell the gc to not scan the weak reference
        weakRefPool = findPool(&this);
        weakRefPool.noscan[index(weakRefPool,&this)] = true;
    }

    T opCall(){
        T tmp = t; // force future conservative GC scan to find the pointer
        if ( (tmp is null) || (pool.mark[index] == false) ){
            t = null;
            return null;
        }
        tmp = t; // avoid race where t was replaced with new data
        return tmp;
    }

private:
    T t;
    Pool* pool;

    size_t index(Pool *p, byte *element){ return (element-p)/16; }
    size_t index(){ return index(pool,&t); }

    void sweep( Object o ){
        t = null;
        pool = null;
    }
}


Comments and criticism are welcome.







Leandro Lucarella wrote:

> Frits van Bommel, el 22 de abril a las 23:39 me escribiste:
>> >>>I think this approach is both simple and efficient, it doesn't have races at all (unless I'm missing something, of course). The only problem is it's too tied to the current GC implementation. But I think any weak ref implementation will, so it looks like weak ref really belong to the GC.
>> >>
>> >>I think you're missing something. Consider this sequence of events:
>> >>- Object A is collected by the GC. (And its mark bit M is cleared)
>> >>- Object B is allocated, and gets the memory formerly used by object A.
>> >>Mark bit M is set. - A weak reference to A gets used; it checks mark
>> >>bit M, which is set...
>> >>
>> >>In other words, the "this object was collected" state needs to be stored in the weak reference, not in the object that got collected (or anything tied to its memory location) because that memory may get reused.
>> >
>> >That's where finalizers come in. They can mark the weak ref as invalid so that you don't care what was returned. In my (recent?) pseudo code, the weak ref was decoded, the mark bit was checked, and then the weak ref was decoded again. The final decoding is to solve the race with the finalizer.
>> 
>> Will this work if B was allocated in another thread, right after it gets resumed (i.e. possibly before the finalizer has a chance to run)?
>> 
>> ... oh, maybe the GC lock gets held even after other threads get resumed, so the allocation will wait for finalizers to run? (just thought of that)
> 
> That's exactly what happens. So I'm not seeing any flaws so far with my proposal.
> 
>> >Even with such precautions, a lock free implementation has one more problem: what happens if the memory block was released to the os? That's a segfault :( if other data is checked to avoid such situations, that's likely where gc locks come into the picture... I'm pretty sure Sean said querying the attribute bits requires a lock. It may be for this reason. I still have to look at the code though...
>> 
>> You mean it would segfault on checking the mark bit? Can't the GC just
>> say the mark bit was not set for any pointer not pointing into
>> a currently-allocated pool?  But yeah, checking if that address is part
>> of an allocated pool may require holding the GC lock...
> 
> In the patch I posted this is already happening. The mark bit is returned
> as part of the BlkAttr belonging to a memory block. If the pointer passed
> to gc_getAttr() (or gc_query()) don't point to a GC allocated memory
> block, 0 is returned.
> 
> I any case, I can't see how that could happen if there are still notifications before freeing the memory block in the sweep phase.
> 
> The sequence is like this:
> 
> 0) All live cells has the mark bit set
> 1) Acquire the GC lock
> 2) Stop the world
> 3) Clear mark bits
> 4) Mark
> 5) Start the world
> 6) Sweep (calling finalizing notification callbacks)
> 7) Release the GC lock
> 
> Then, you are guaranteed to access to valid memory (mark bit or even
> the entire memory block) until the notification comes. When the weak
> reference receive the notification it invalidates the reference and don't
> rely anymore in the mark bit.
> 
> But I just found a flaw in my patch (not the proposal), right now, as you
> say, the gc_query() and gc_getAttr() functions are acquiring the GC lock.
> Some non-locking interface should be exposed that calls to queryNoSync()
> and getAttrNoSync(). It should be safe for this limited usage as long as
> there are not concurrent calls to setAttr()/clrAttr() to that memory
> block (I think).
> 


April 23, 2009
Jason House, el 22 de abril a las 21:26 me escribiste:
> I'm finally looking at the gc source code.  From what I can tell, memory is allocated in large chunks (pools), and divided into 16 byte chunks.
> 
> Attributes are not all packed together like the BlkInfo of the spec implies, so some so access to that stuff is safer than I would have thought.

I'm sorry you had to go through that, I think this could have save you
a lot of time =)
http://proj.llucax.com.ar/blog/dgc/blog/tag/understanding%20the%20current%20gc

> Here's a weak ref implementation that I just whipped together under the assumption it goes into druntime/src/gc/basic/gcx.d.  I think it is sane.  I haven't scoured the gc's code to ensure everything is legit.  I declared it as a class mostly because value semantics don't make much sense and I'm too lazy/tired to do the sample code as a 100% correct struct definition.
> 
> My two biggest concerns:
> 
> 1. Is the gc lock held while running a constructor? If not and it's needed to do what it needs, where can the hook be added to make the construction of a weak reference special?

I don't think it's held, I just can't see how it can be held because construction is not done in the GC.

If you need to acquire the GC lock I think you can just do:
synchronized (GC.gcLock) {
	// protected stuff
}
The GC lock is a static member of the GC struct in gcx.d.

> 2. Will the assignment to noscan get overwritten / hit a race condition?

I don't think that should happen except if someone calls
gc_setAttr/gc_getAttr().

> It is possible to hide the pointer inside a size_t instead, but given the intimate tying to the GC such tricks just make the code tougher to read.

Agree. Maybe you can overload the operator new and pass the flags directly at allocation time, but I don't know very well how this is plugged in the runtime/GC, but since it's a regular gc_malloc() call it should be thread-safe.

Something like:

class weakRef(T) if (is(T==class))
{
	new(size_t size)
	{
		return gc_malloc(size, FINALIZE | NO_SCAN);
	}
	this(T _t)
	{
		// set up tracking for the weakly referenced object
		t = _t;
		pool = findPool(t);
		rt_attachDisposeEvent(&t, &sweep);
	}
	// ...
}


> class weakRef(T) if (is(T==class))
> {
>     this(T _t){
>         // set up tracking for the weakly referenced object
>         t = _t;
>         pool = findPool(t);
>         rt_attachDisposeEvent(&t, &sweep);
> 
>         // tell the gc to not scan the weak reference
>         weakRefPool = findPool(&this);
>         weakRefPool.noscan[index(weakRefPool,&this)] = true;

You can do this by calling setAttr(&this, NO_SCAN).

>     }
> 
>     T opCall(){
>         T tmp = t; // force future conservative GC scan to find the pointer
>         if ( (tmp is null) || (pool.mark[index] == false) ){

pool.mark.test(index) is used in the GC code, I don't know if the array syntax works. And to use the mark bit trick, I think something in the lines of my patch is needed to ensure the mark bit is set for all live objects.

>             t = null;
>             return null;
>         }
>         tmp = t; // avoid race where t was replaced with new data

I don't understand exactly what kind of race are you trying to avoid with this (and how this assignation can avoid any race if there is one).

>         return tmp;
>     }
> 
> private:
>     T t;
>     Pool* pool;
> 
>     size_t index(Pool *p, byte *element){ return (element-p)/16; }
>     size_t index(){ return index(pool,&t); }
> 
>     void sweep( Object o ){
>         t = null;
>         pool = null;
>     }
> }



-- 
Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/
----------------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------------
Lo último que hay que pensar es que se desalinea la memoria
Hay que priorizar como causa la idiotez propia
Ya lo tengo asumido
	-- Pablete, filósofo contemporáneo desconocido