November 19, 2010
Should I make RefCounted and data structures that use it configurable, such that they're not atomic by default, but can be trivially made atomic?  The use of non-atomic ref counting makes it impossible to store ref counted data structures in classes for now, and makes it impossible to share them across threads forever, even if you synchronize manually.  In cases where they are not being passed around much, I don't think atomic reference counting is unreasonably expensive.  I personally would love a version of std.container.Array with a few issues I've mentioned previously fixed and with atomic ref counting.

If people like the idea of making this configurable, any input on how?  I see three ways:

1.  Make RefCounted into a class instead of a struct, and use virtual functions and interfaces.  (Probably a bad idea, but I'll throw it out there anyhow).

2.  Store a bool in it and use an if statement.  (I like this.  It allows runtime flexibility and the perf. cost should be negligible because the branch will be so predictable.)

3.  Use a template parameter and handle it at compile time.  (I don't like the idea of atomic and non-atomic ref counted versions of the same container being different types.  OTOH, this is the most efficient at runtime.)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/phobos/attachments/20101119/1909c4a0/attachment.html>
November 19, 2010
Ok, so now that I have time, I did some benchmarks.  Atomic ref counting is somewhat expensive, but not obscenely so.  Here's my benchmark:

import std.stdio, std.datetime, std.container;

import std.stdio, std.datetime, std.container;

void main() {
     Array!uint arr;
     arr.length = 4;

     auto sw = StopWatch(autoStart);
     foreach(i; 0..10_000_000) {
         auto arr2 = arr;
     }

     writeln(sw.peek.milliseconds);
}

Here's a diff of std.typecons:

Index: typecons.d ===================================================================
--- typecons.d    (revision 2181)
+++ typecons.d    (working copy)
@@ -2328,7 +2328,16 @@
      this(this)
      {
          if (!RefCounted.isInitialized) return;
-        ++RefCounted._store._count;
+        //++RefCounted._store._count;
+        auto p = &(RefCounted._store._count);
+        asm {
+            push EAX;
+            mov EAX, p;
+            lock;
+            inc dword ptr[EAX];
+            pop EAX;
+        }
+
          debug(RefCounted) if (RefCounted.debugging)
                   writeln(typeof(this).stringof,
                  "@", cast(void*) RefCounted._store, ": bumped refcount
to ",
@@ -2345,7 +2354,23 @@
      {
          if (!RefCounted._store) return;
          assert(RefCounted._store._count > 0);
-        if (--RefCounted._store._count)
+
+        // Can't just do lock; dec; here because I need the value
afterwords.
+        size_t oldVal = void;
+        auto p = &(RefCounted._store._count);
+        asm {
+            push EAX;
+            push EBX;
+            mov EAX, size_t.max;  // + size_t.max is equivalent to - 1.
+            mov EBX, p;
+            lock;
+            xadd [EBX], EAX;
+            mov oldVal, EAX;
+            pop EBX;
+            pop EAX;
+        }
+
+        if (oldVal > 1)
          {
              debug(RefCounted) if (RefCounted.debugging)
                       writeln(typeof(this).stringof,

Note the direct use of ASM.  Between function call overhead and the fact that core.atomic uses a CAS loop, using it instead would have about doubled the overhead.  I'm not sure how we'll factor this when/if this hits production code.

Anyhow, the timings are:

Atomic:  258 milliseconds total (258 nanoseconds per iteration)
Non-atomic:  88 milliseconds  (88 nanoseconds per iteration)

I think this is a small enough overhead that we should just make all reference counting atomic.  This would make the race condition pointed out by Michael completely moot IIUC.  It would also make it possible to safely share reference counted containers across threads in shared/synchronized wrappers, etc.

Furthermore, the cost of atomic reference counting can be mitigated by the standard tricks for dealing with expensive to copy objects, though it shouldn't be necessary except in the most extreme conditions.

--David Simcha

On 11/19/2010 2:13 PM, David Simcha wrote:
> Should I make RefCounted and data structures that use it configurable, such that they're not atomic by default, but can be trivially made atomic?  The use of non-atomic ref counting makes it impossible to store ref counted data structures in classes for now, and makes it impossible to share them across threads forever, even if you synchronize manually.  In cases where they are not being passed around much, I don't think atomic reference counting is unreasonably expensive.  I personally would love a version of std.container.Array with a few issues I've mentioned previously fixed and with atomic ref counting.
>
> If people like the idea of making this configurable, any input on how?  I see three ways:
>
> 1.  Make RefCounted into a class instead of a struct, and use virtual functions and interfaces.  (Probably a bad idea, but I'll throw it out there anyhow).
>
> 2.  Store a bool in it and use an if statement.  (I like this.  It allows runtime flexibility and the perf. cost should be negligible because the branch will be so predictable.)
>
> 3.  Use a template parameter and handle it at compile time.  (I don't like the idea of atomic and non-atomic ref counted versions of the same container being different types.  OTOH, this is the most efficient at runtime.)