June 05, 2006
Frank Benoit wrote:

> 
>> * The cost of removing the last pointer to an object is unbounded
> delete can be unbound? I didn't know that.

The reason is that a delete might trigger an unknown additional amount of decrefs leading to more deletes.

>> * It has a substantial space overhead, making it less useful for smaller heaps
> In codesize yes, but in heap size?
> I though the tracing GC does need much more heap because it should be
> called very seldom.

Hmm, might have paraphrased somewhat wildly there, but the space overhead _is_ considered a drawback, and reducing it will most likely lead to more computation overhead instead.

-- 
Lars Ivar Igesund
blog at http://larsivi.net
DSource & #D: larsivi
June 05, 2006
Frank Benoit wrote:
>> This fails if the passed reference 'escapes'. It can escape by being
>> assigned to a global, being assigned to a class member, being inserted
>> into some data structure that lives longer than the function, and being
>> returned by the function.
> Yes, and the compiler can catch these cases. With incr/decr the
> parameter reference you win nothing. So it should be possible to
> optimize them.

It can only do it if it has full source available. It also won't work for virtual functions, as it is not known at compile time what code will be called.

> So here comes the more ugly solution:
> Every pointer is allways a struct with the pointer to the data and a
> pointer to the refcounter.

It's the wrapper approach, with two allocations per allocation.


>>>> 7) Fixing the compiler to hide all this stuff from the programmer will
>>>> make it difficult to interface cleanly with C.
>>> hm, ok. Here is some more manual work necessary.
>>> If you give the last reference to a C lib, isn't this a problem for the
>>> MS GC also?
>> I don't know anything about the MS GC.
> 
> you do :) MS=mark&sweep

I thought you meant MS=Microsoft <g>. No, it isn't a problem, as it scans the C data segment and stack, too.


>>>> 9) Ref counting does not eliminated latency problems, it just reduces
>>>> them.
>>> Where do you mean is the latency with RC? The collecting work is done in
>>> the smallest pieces all the time. Only the allocation of new memory
>>> needs an unbound time to complete.
>> That's the same with gc. Only upon an allocation is the time unbound.
> And additional the time is unbound for the GC fullcollect...

The collector only runs during an allocation request - which means you can completely control when a collection can happen.


>>> But it does not need to stop all
>>> other threads. e.g. hardware interrupts don't need to be disabled.
>> GC doesn't require disabling hardware interrupts. It does not need to
>> stop any threads that do not hold references to GC allocated data.
> If you want to have you ISR written in D, you have two choices:
> a) be sure that you do not move a reference (refb=refa; refa=null;).
> This can end up in GC deleting a living object. If you cannot guarantee
> this, than
> b) disable the interrupting code also while the GC is running. But this
> is not acceptable, so go back to a) or have another GC.

c) code the ISR so it does not reference any gc allocated data. For example, use malloc() for dynamic data the ISR needs.

d) make sure any gc references used by the ISR have another reference to them that the gc will scan (such as in the global data segment).

June 05, 2006
Lars Ivar Igesund wrote:
> Frank Benoit wrote:
> 
>>> * The cost of removing the last pointer to an object is unbounded
>> delete can be unbound? I didn't know that.
> 
> The reason is that a delete might trigger an unknown additional amount of
> decrefs leading to more deletes.

And some allocators colaesce adjacent free blocks aggressively (ie. during delete).  Not to mention the fact that the allocator data is probably protected by a mutex, etc.  In general, if the allocator documentation makes no guarantees about progress then you may at least be stuck on a mutex while other (potentially low-priority) threads are making an (unbounded) allocation.


Sean
June 05, 2006
For what it's worth, incremental GC is similar to refcounting in that the cost is distributed across pointer use to avoid the need for a costly mark/sweep phase.  I've even seen hard-realtime incremental GC implementations, so it's a long-term solution worth considering.  That said, I would like to be able to create some sort of smart pointer in D for tracking valuable resources, so it's good to hear that Walter is considering this as well.

Frank Benoit wrote:
>
>> 3) in a multithreaded app, the incs and decs must be synchronized
> 
> Isn't a atomic inc/dec enough? (Don't know anything about the
> performance of the asm "lock" instruction)

Depending on the platform and the synch. requirement, a "lock" type instruction may be necessary.  It typically is for non-x86 platforms, but I think you could get away without using one in some cases on x86. The best place to check for optimizations would be the Boost shared_ptr code.  IIRC they don't use the Interlocked calls but there are spin locks in some places.

The cost of a "lock" instruction is variable based on the hardware, number of CPUs, whether the cache line is shared, etc, but the most reliable estimate I've seen averages locked ops at around 70ns.

>> 9) Ref counting does not eliminated latency problems, it just reduces them.
> 
> Where do you mean is the latency with RC? The collecting work is done in
> the smallest pieces all the time. Only the allocation of new memory
> needs an unbound time to complete. But it does not need to stop all
> other threads. e.g. hardware interrupts don't need to be disabled.

I've seen analyses that suggested reference counting is actually slower than mark/sweep when measured over the run of the program.  The advantage is the avoidance of the unbounded mark/sweep phase, with far more expensive pointer modifications instead.  As above, I'd almost rather see incremental GC support in D (it typically requires additional ode generation on pointer modifications, and may be tricky to sort out for C routines like memset).


Sean
June 05, 2006
On 06/04/2006 05:40 AM, Daniel Keep wrote:

> "GC, the simple solution"

[snip]

> * inability to free cycles
> * inability to even *run* destructors where cycles are involved, because
> there is no order in which they can logically be executed and still be valid


I think someone else in the thread mentioned this as a programmer error.
However, assuming the programmer intended this, then this programmer
must expect that the MS GC would break the cycle somewhere before
calling the destructors and he must not care where the cycle was
broken, because otherwise he would have used a weak pointer
to close the cycle where he wanted the break to occur.

Anyway, code for doing this GC breaking of cycles in RC
is at:

  http://tinyurl.com/reuzl

See code after comment:

 *  Breaks all arcs in cycle_arcs.

As mentioned in the code comments, this is Christopher's
method for collecting cycles and suffers a delay since it
keeps all smart ptrs in a list and processes the entire list.
Other methods don't keep this list but do a local mark-sweep
at each rc decrement with obvious time penalties.  A compromise
just uses a queue which, when it reaches a certain size,
plays the same role as Christopher's list of all smart pointers.

>
> Those last two are the main problem with reference counting, along with
> the others and slightly added complexity in the compiler.
>
> Python, however, does not suffer the cycle problem.  Know why?
>
> Because they have a *mark & sweep* garbage collector to clean up the cycles.

Then it must not automatically call destructors since then it would
suffer the 2nd problem above.

>
[snip]

> As it stands, you can implement reference counting yourself: you just
> need to write an allocator, and call the incref and decref functions
> manually.  However, I do think that having built-in *support* for
> Reference counting would be fantastic.  The less that stuff like this is
> left in the hands of the programmer, the better.


Smart pointers would be a great help here, but since
D doesn't allow user definition of the assignment operator:

  http://www.digitalmars.com/d/archives/digitalmars/D/learn/984.html

that's not easy.  I'm a newbie, so what's the best way to implement
something like a smart pointer without having to manually:

  increment rc1;
  decrement rc2;
  assign the pointer to its target location;
June 07, 2006
Sean Kelly wrote:
> For what it's worth, incremental GC is similar to refcounting in that the cost is distributed across pointer use to avoid the need for a costly mark/sweep phase.  I've even seen hard-realtime incremental GC implementations, so it's a long-term solution worth considering.

My _only_ significant beef with D so far as been the unbound time for a garbage collection; therefore, if an incremental garbage collector built into D could guarantee me a set maximum time would be taken for each garbage collection (which I would manually induce as often as possible) then unless the efficiency hit is TERRIBLE I would be all over it.

What I'm really curious about, however, is how important this is to _other_ people - especially Walter. Would it be reasonable to say that it seems D has been designed largely without long-term real-time performance in mind? Sure, there are workarounds, but I haven't seen anything elegant yet that will let me use D for such applications without making my memory management more complicated than it would have been in C++. Is anything in the works?
June 07, 2006
In article <e60vmu$1b6t$1@digitaldaemon.com>, Walter Bright says...
>
..
>>>>> 9) Ref counting does not eliminated latency problems, it just reduces them.
>>>> Where do you mean is the latency with RC? The collecting work is done in the smallest pieces all the time. Only the allocation of new memory needs an unbound time to complete.
>>> That's the same with gc. Only upon an allocation is the time unbound.
>> And additional the time is unbound for the GC fullcollect...
>
>The collector only runs during an allocation request - which means you can completely control when a collection can happen.

Correct me if I'm wrong, but while this is true for M + S, doesn't RC /kind of/ have the opposite problem?  In refcounting systems, allocation would be mostly predictable, a la malloc, but dereferences can trigger a long chain of other dereferences.

I say 'kind of' because it is more deterministic than GC - you can always dig deeper to see what is going to be freed by a given dereference.

For instance, if you are the last person to let go of a hash table, you have to wait for the whole thing to unravel, along with any objects stored inside.

I guess a real-time RC implementation could be made by keeping a 'recycling bin' and only taking apart N objects at a time, i.e. incremental freeing.  This would un-guarantee the prompt running of destructors though (which seems to be one of the big rationales for RC in D, right, at least for non-memory objects?)

Kevin


June 07, 2006
Walter Bright wrote:
>> Only if the reference is stored,
>> the ref counter has to be incremented. This is only possible if this is
>> done by the compiler. No smart pointer can do this :)
>>
>>> 3) in a multithreaded app, the incs and decs must be synchronized
>>
>> Isn't a atomic inc/dec enough?
> 
> That requires synchronizing.
> 

Are you sure? It says in http://www.iecc.com/gclist/GC-algorithms.html#Reference%20counting that:
"In a multi-threaded system, if threads are preemptively scheduled, reference counting requires additional care to ensure that the updates to reference counts are atomic. This is straightforward using hardware primitives such as fetch-and-add, compare-and-swap, or load-linked/store-conditional, but it is definitely necessary to pay attention to this detail." ?




-- 
Bruno Medeiros - CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
June 07, 2006
Bruno Medeiros wrote:
> Walter Bright wrote:
>>> Only if the reference is stored,
>>> the ref counter has to be incremented. This is only possible if this is
>>> done by the compiler. No smart pointer can do this :)
>>>
>>>> 3) in a multithreaded app, the incs and decs must be synchronized
>>>
>>> Isn't a atomic inc/dec enough?
>>
>> That requires synchronizing.
>>
> 
> Are you sure? It says in http://www.iecc.com/gclist/GC-algorithms.html#Reference%20counting that:
> "In a multi-threaded system, if threads are preemptively scheduled, reference counting requires additional care to ensure that the updates to reference counts are atomic. This is straightforward using hardware primitives such as fetch-and-add, compare-and-swap, or load-linked/store-conditional, but it is definitely necessary to pay attention to this detail." ?

These are synchronizing operations.


Sean
June 08, 2006
Sorry for the late (and short) reply: exams breathing down my neck and
all :)

Larry Evans wrote:
> On 06/04/2006 05:40 AM, Daniel Keep wrote:
> 
>> "GC, the simple solution"
> 
> [snip]
> 
>> * inability to free cycles
>> * inability to even *run* destructors where cycles are involved, because
>> there is no order in which they can logically be executed and still be
> valid
> 
> 
> I think someone else in the thread mentioned this as a programmer error.
> However, assuming the programmer intended this, then this programmer
> must expect that the MS GC would break the cycle somewhere before
> calling the destructors and he must not care where the cycle was
> broken, because otherwise he would have used a weak pointer
> to close the cycle where he wanted the break to occur.
> 
> Anyway, code for doing this GC breaking of cycles in RC
> is at:
> 
>   http://tinyurl.com/reuzl
> 
> See code after comment:
> 
>  *  Breaks all arcs in cycle_arcs.
> 
> As mentioned in the code comments, this is Christopher's
> method for collecting cycles and suffers a delay since it
> keeps all smart ptrs in a list and processes the entire list.
> Other methods don't keep this list but do a local mark-sweep
> at each rc decrement with obvious time penalties.  A compromise
> just uses a queue which, when it reaches a certain size,
> plays the same role as Christopher's list of all smart pointers.
> 
>>
>> Those last two are the main problem with reference counting, along with the others and slightly added complexity in the compiler.
>>
>> Python, however, does not suffer the cycle problem.  Know why?
>>
>> Because they have a *mark & sweep* garbage collector to clean up the
> cycles.
> 
> Then it must not automatically call destructors since then it would suffer the 2nd problem above.

I haven't read the above link since I really don't need to be distracted any more than I already am at the moment.  I'll read that link later on once I'm not quite so swamped :)

However, I will say this: if you have a cycle in Python, it will clean it up normally PROVIDED destructors are not involved.  If they are, then it just moves the whole cycle to a dead object list, since there is no safe way to break the cycle.  At that point, it's up to the programmer to deal with it.

> 
>>
> [snip]
> 
>> As it stands, you can implement reference counting yourself: you just need to write an allocator, and call the incref and decref functions manually.  However, I do think that having built-in *support* for Reference counting would be fantastic.  The less that stuff like this is left in the hands of the programmer, the better.
> 
> 
> Smart pointers would be a great help here, but since
> D doesn't allow user definition of the assignment operator:
> 
>   http://www.digitalmars.com/d/archives/digitalmars/D/learn/984.html
> 
> that's not easy.  I'm a newbie, so what's the best way to implement something like a smart pointer without having to manually:
> 
>   increment rc1;
>   decrement rc2;
>   assign the pointer to its target location;

Not really, sorry.  Your best bet might be with a preprocessor.  I've had plans to write an AST-based preprocessor for a while, but that's off in the far, far future.  Maybe give Build 3.0 a look?  In that case, you'd want to write a macro like:

    rc2 #= rc1;
==>
    (rc1).incref(); (rc2).decref(); rc2 = (rc1);

Again, sorry for the terse reply.

	-- Daniel

-- 
Unlike Knuth, I have neither proven or tried the above; it may not even make sense.

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/