View mode: basic / threaded / horizontal-split · Log in · Help
March 19, 2013
[Issue 9763] New: @contended and @contended("groupName")
http://d.puremagic.com/issues/show_bug.cgi?id=9763

          Summary: @contended and @contended("groupName")
          Product: D
          Version: D2
         Platform: All
       OS/Version: All
           Status: NEW
         Severity: enhancement
         Priority: P2
        Component: DMD
       AssignedTo: nobody@puremagic.com
       ReportedBy: bearophile_hugs@eml.cc


--- Comment #0 from bearophile_hugs@eml.cc 2013-03-19 16:39:52 PDT ---
See for more info:
http://openjdk.java.net/jeps/142
https://blogs.oracle.com/dave/entry/java_contented_annotation_to_help

The idea is to introduce some way in D to specify that some fields in an
struct/object are probably highly contended across processor cores, so the
compiler can pad (or arrange, in objects) to not share cache lines with other
fields, or other structs, that are likely to be independently accessed.

- - - - - - - - - - - - - - - - - - - -

This is from:
http://mail.openjdk.java.net/pipermail/hotspot-dev/2012-November/007309.html


> A. Marking the class as contended:
> 
>     @Contended
>     public static class ContendedTest2 {
>         private Object plainField1;
>         private Object plainField2;
>         private Object plainField3;
>         private Object plainField4;
>     }
> 
> ...makes the entire field block to be padded from the both sides:
> (below is the output of new tracing -XX:+PrintFieldLayout)
> 
>   TestContended$ContendedTest2: field layout
>     Entire class is marked contended
>      @140 --- instance fields start ---
>      @140 "plainField1" Ljava.lang.Object;
>      @144 "plainField2" Ljava.lang.Object;
>      @148 "plainField3" Ljava.lang.Object;
>      @152 "plainField4" Ljava.lang.Object;
>      @288 --- instance fields end ---
>      @288 --- instance ends ---
> 
> Note that we use 128 bytes, twice the cache line size on most hardware
> to adjust for adjacent sector prefetchers extending the false sharing
> collisions to two cache lines.
> 
> B. Marking the field as contended:
> 
>     public static class ContendedTest1 {
>         @Contended
>         private Object contendedField1;
>         private Object plainField1;
>         private Object plainField2;
>         private Object plainField3;
>         private Object plainField4;
>     }
> 
> ...pushes the field out of dense block and effectively applies padding:
> 
>    TestContended$ContendedTest1: field layout
>      @ 12 --- instance fields start ---
>      @ 12 "plainField1" Ljava.lang.Object;
>      @ 16 "plainField2" Ljava.lang.Object;
>      @ 20 "plainField3" Ljava.lang.Object;
>      @ 24 "plainField4" Ljava.lang.Object;
>      @156 "contendedField1" Ljava.lang.Object; (contended, group = 0)
>      @288 --- instance fields end ---
>      @288 --- instance ends ---
> 
> C. Marking multiple fields makes each field padded:
> 
>     public static class ContendedTest4 {
>         @Contended
>         private Object contendedField1;
> 
>         @Contended
>         private Object contendedField2;
> 
>         private Object plainField3;
>         private Object plainField4;
>     }
> 
> ...pushes both fields with individual padding for each:
> 
>    TestContended$ContendedTest4: field layout
>      @ 12 --- instance fields start ---
>      @ 12 "plainField3" Ljava.lang.Object;
>      @ 16 "plainField4" Ljava.lang.Object;
>      @148 "contendedField1" Ljava.lang.Object; (contended, group = 0)
>      @280 "contendedField2" Ljava.lang.Object; (contended, group = 0)
>      @416 --- instance fields end ---
>      @416 --- instance ends ---
> 
> *** IV. Contention groups
> 
> There are cases where you want to separate the *group* of fields that
> are experiencing contention with everything else but not pairwise. This
> is the usual thing for some of the code updating two fields at once.
> While marking both with @Contended would be sufficient, we can optimize
> the memory footprint by not applying padding between them. In order to
> demarcate these groups, we have the parameter in the annotation
> describing the equivalence class for contention group.
> 
> So that:
> 
>     public static class ContendedTest5 {
>         @Contended("updater1")
>         private Object contendedField1;
> 
>         @Contended("updater1")
>         private Object contendedField2;
> 
>         @Contended("updater2")
>         private Object contendedField3;
> 
>         private Object plainField5;
>         private Object plainField6;
>     }
> 
> ...is laid out as:
> 
>    TestContended$ContendedTest5: field layout
>      @ 12 --- instance fields start ---
>      @ 12 "plainField5" Ljava.lang.Object;
>      @ 16 "plainField6" Ljava.lang.Object;
>      @148 "contendedField1" Ljava.lang.Object; (contended, group = 12)
>      @152 "contendedField2" Ljava.lang.Object; (contended, group = 12)
>      @284 "contendedField3" Ljava.lang.Object; (contended, group = 15)
>      @416 --- instance fields end ---
>      @416 --- instance ends ---
> 
> Note $contendedField1 and $contendedField2 are padded from everything
> else, but still densely packed with each other.

- - - - - - - - - - - - - - - - - - - -

A design for D is similar. For structs the problem is solved with padding
between fields or between structs, while in class instances it's solved with
padding and/or field reordering. The syntax is similar:


struct Test1 {
   @contended int field1;
   int field2;
   @contended int field3;
}
@contended final class Test2 {
   @contended int field1;
   int field2;
   @contended int field3;
}

@contended struct Test3 {
   int field1;
   int field2;
}
@contended final class Test4 {
   private Object field1;
   private Object field2;
}

- - - - - - - - - - - - - - - - - - - -

Contention groups in D:


struct Test5 {
   @contended("group1") int field1;
   @contended("group1") int field2;

   @contended("group2") int field3;

   private int field4;
   private int field5;
}

- - - - - - - - - - - - - - - - - - - -

I think the D compiler should ignore @contended for const/immutable fields:


alias T = const(int);
struct Test6 {
   @contended T field1;
   int field2;
}
static assert(Test6.sizeof == 8);

- - - - - - - - - - - - - - - - - - - -

An alternative syntax is to use align:

align(cache_line)
align(cache_line, "group1")


Like this:


struct Test7 {
   align(cache_line) int field1;
   int field2;
   align(cache_line) int field3;
}
align(cache_line) class Test8 {
   private Object field1;
   private Object field2;
}
struct Test9 {
   align(cache_line, "group1") int field1;
   align(cache_line, "group1") int field2;

   align(cache_line, "group2") int field3;

   private int field4;
   private int field5;
}


But @contended is more explicit in its purpose.

- - - - - - - - - - - - - - - - - - - -

Cache lines contention is a well known phenomenon. This is a small example. The
code is adapted from:
http://rosettacode.org/wiki/Atomic_updates#D



// start atomic_updates.d code.
import std.stdio: writeln;
import std.conv: text;
import std.random: uniform, Xorshift;
import std.algorithm: min, max;
import std.parallelism: task;
import core.thread: Thread;
import core.sync.mutex: Mutex;
import core.time: dur;

__gshared uint transfersCount;

final class Buckets(size_t nBuckets) if (nBuckets > 0) {
   alias TBucketValue = uint;

   private static struct Bucket {
       TBucketValue value;
       Mutex mtx;
       //uint[14] _voidPadding = void;
       alias value this;
   }

   private Bucket[nBuckets] buckets;
   private bool running;

   public this() {
       this.running = true;
       foreach (ref b; buckets)
           b = Bucket(uniform(0, 100), new Mutex());
   }

   public TBucketValue opIndex(in size_t index) const pure nothrow {
       return buckets[index];
   }

   public void transfer(in size_t from, in size_t to,
                        in TBucketValue amount) {
       immutable low  = min(from, to);
       immutable high = max(from, to);
       buckets[low].mtx.lock();
       buckets[high].mtx.lock();

       scope(exit) {
           buckets[low].mtx.unlock();
           buckets[high].mtx.unlock();
       }

       immutable realAmount = min(buckets[from].value, amount);
       buckets[from] -= realAmount;
       buckets[to  ] += realAmount;
       transfersCount++;
   }

   @property size_t length() const pure nothrow {
       return this.buckets.length;
   }

   void toString(in void delegate(const(char)[]) sink) {
       TBucketValue total = 0;
       foreach (ref b; buckets) {
           b.mtx.lock();
           total += b;
       }

       scope(exit)
           foreach (ref b; buckets)
               b.mtx.unlock();

       sink(text(buckets));
       sink(" ");
       sink(text(total));
   }
}

void randomize(size_t N)(Buckets!N data) {
   immutable maxi = data.length - 1;
   auto rng = Xorshift(1);

   while (data.running) {
       immutable i = uniform(0, maxi, rng);
       immutable j = uniform(0, maxi, rng);
       immutable amount = uniform(0, 20, rng);
       data.transfer(i, j, amount);
   }
}

void equalize(size_t N)(Buckets!N data) {
   immutable maxi = data.length - 1;
   auto rng = Xorshift(1);

   while (data.running) {
       immutable i = uniform(0, maxi, rng);
       immutable j = uniform(0, maxi, rng);
       immutable a = data[i];
       immutable b = data[j];
       if (a > b)
           data.transfer(i, j, (a - b) / 2);
       else
           data.transfer(j, i, (b - a) / 2);
   }
}

void display(size_t N)(Buckets!N data) {
   foreach (immutable _; 0 .. 10) {
       writeln(transfersCount, " ", data);
       transfersCount = 0;
       Thread.sleep(dur!"msecs"(1000));
   }
   data.running = false;
}

void main() {
   writeln("N. transfers, buckets, buckets sum:");
   auto data = new Buckets!20();
   task!randomize(data).executeInNewThread();
   task!equalize(data).executeInNewThread();
   task!display(data).executeInNewThread();
}
// end atomic_updates.d code.



Timings by nazriel on IRC, done with dmd on an Intel i5 2,4ghz, arch linux
x86_64 with:
-release -inline -O -noboundscheck



Without padding in struct Bucket:

N. transfers, buckets, buckets sum:
85 [0, 5, 0, 51, 5, 25, 89, 0, 147, 93, 47, 2, 76, 43, 76, 69, 0, 62, 67, 51]
908
6246828 [35, 51, 38, 25, 48, 88, 11, 23, 39, 54, 92, 13, 69, 48, 29, 48, 52,
26, 68, 51] 908
6322759 [60, 40, 43, 55, 44, 39, 49, 61, 38, 44, 51, 42, 18, 37, 34, 44, 50,
48, 60, 51] 908
6103135 [29, 63, 74, 44, 44, 48, 44, 53, 35, 62, 35, 44, 44, 44, 34, 47, 31,
40, 42, 51] 908
6567254 [36, 39, 42, 49, 70, 36, 10, 28, 51, 39, 55, 42, 35, 72, 42, 65, 35,
42, 69, 51] 908
6274262 [55, 63, 27, 40, 31, 40, 41, 32, 20, 44, 45, 95, 42, 54, 45, 38, 76,
40, 29, 51] 908
6171490 [39, 24, 61, 56, 42, 51, 39, 56, 62, 43, 72, 39, 56, 13, 32, 67, 45,
48, 12, 51] 908
6211137 [29, 39, 36, 70, 16, 41, 48, 35, 74, 20, 65, 59, 48, 48, 43, 59, 54,
30, 43, 51] 908
6202907 [51, 16, 80, 36, 25, 42, 43, 58, 48, 41, 41, 35, 80, 49, 54, 24, 40,
44, 50, 51] 908
6350466 [50, 35, 37, 66, 46, 65, 38, 48, 52, 47, 51, 46, 20, 37, 34, 46, 51,
38, 50, 51] 908


With padding in struct Bucket:

N. transfers, buckets, buckets sum:
112 [22, 41, 41, 54, 54, 58, 52, 32, 56, 53, 49, 64, 59, 65, 56, 74, 54, 59,
66, 55] 1064
6698768 [56, 58, 54, 29, 38, 54, 58, 55, 54, 72, 26, 57, 83, 51, 49, 45, 56,
57, 57, 55] 1064
8371235 [53, 53, 53, 53, 52, 53, 53, 54, 53, 54, 53, 52, 53, 54, 53, 54, 53,
53, 53, 55] 1064
7680510 [73, 71, 16, 53, 20, 55, 84, 57, 55, 53, 49, 55, 57, 50, 52, 68, 53,
35, 53, 55] 1064
6675895 [45, 32, 79, 46, 48, 29, 65, 79, 13, 79, 0, 79, 61, 65, 40, 63, 65, 61,
60, 55] 1064
7865702 [31, 63, 58, 76, 54, 43, 49, 51, 51, 63, 54, 29, 47, 66, 54, 62, 54,
56, 48, 55] 1064
8068515 [51, 69, 49, 54, 54, 59, 51, 58, 44, 46, 58, 56, 48, 57, 59, 48, 38,
56, 54, 55] 1064
6718293 [40, 55, 52, 55, 77, 51, 40, 57, 87, 37, 56, 56, 49, 69, 65, 17, 13,
54, 79, 55] 1064
6512280 [53, 34, 21, 69, 47, 69, 43, 25, 61, 43, 63, 52, 97, 52, 50, 52, 49,
78, 51, 55] 1064
7408234 [51, 44, 54, 56, 31, 55, 34, 59, 72, 69, 56, 50, 58, 70, 45, 67, 48,
41, 49, 55] 1064


The numbers in the first column like 6274262 represent the number of updated in
a second. With the padding there is less cache lines contention between two
cores, so it performs more atomic updated in a second, like 7408234.


That manual padding in Bucket is similar to:

   @contended private static struct Bucket {
       TBucketValue value;
       Mutex mtx;
       alias value this;
   }


One difference is that in the Java example the padding is on both sides, and
it's twice larger, as explained:

> Note that we use 128 bytes, twice the cache line size on most hardware
> to adjust for adjacent sector prefetchers extending the false sharing
> collisions to two cache lines.

- - - - - - - - - - - - - - - - - - - -

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
March 20, 2013
[Issue 9763] @contended and @contended("groupName")
http://d.puremagic.com/issues/show_bug.cgi?id=9763



--- Comment #1 from bearophile_hugs@eml.cc 2013-03-20 11:53:59 PDT ---
A current workaround is to use align (with a value, because of Issue 9766 ):


align(128) struct Test3 {
   int field1;
   int field2;
}
pragma(msg, "Test3:");
pragma(msg, Test3.field1.offsetof);
pragma(msg, Test3.field2.offsetof);
pragma(msg, "Total size:");
pragma(msg, Test3.sizeof);
pragma(msg, "");


The print shows there is trailing padding (no leading padding):

Test3:
0u
4u
Total size:
128u


Adding align(128) on some fields of struct/object allows to introduce
intermediate padding, but it's tricky to get all the padding right. But
@contended adapts automatically the padding needed on different CPUs and makes
the creation of spaces and groups simpler.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Top | Discussion index | About this forum | D home