Thread overview
Global thread-local free-list allocator
Mar 16, 2023
Witold
Mar 16, 2023
Witold
Mar 17, 2023
Herbie Melbourne
Mar 18, 2023
Witold
March 16, 2023

I ported a classic DeltaBlue benchmark to D, and for allocations I use a mix of malloc/free (for structs with some trailing inline arrays), and new for some other structs and arrays.

After running it, it felt kind of slow (I did not compare it to any other language yet, so no idea if it is actually slower, just feels slow), so I did some profiling (perf / hotspot mostly), and noticed that substantial amount of time is spent in gc (both allocation and collection). This is on Linux, and benchmark indeed creates a lot of garbage, as I rerun and rerun it in a loop.

In the past I used custom per-type allocator using D1 style allocator methods, mostly together with hand coded free lists (was working fine and was short).

Now I tried to use new std.experimental.allocator, and have trouble understanding how to use them.

import std.experimental.allocator : allocatorObject, ThreadLocal;
import std.experimental.allocator.building_blocks.free_list : FreeList;
import std.experimental.allocator.gc_allocator : GCAllocator;
import std.experimental.allocator.mallocator : Mallocator;
import std.experimental.allocator.common : unbounded;
// static auto variableAllocator = allocatorObject(FreeList!(GCAllocator, Variable.sizeof)());
// static auto variableAllocator = allocatorObject(FreeList!(GCAllocator, Variable.sizeof, unbounded)());
// static auto variableAllocator = allocatorObject(FreeList!(Mallocator, Variable.sizeof, unbounded)());
alias VariableAllocator = ThreadLocal!(FreeList!(Mallocator, Variable.sizeof, unbounded));

// VariableAllocator variableAllocator;
// this() {
//   variableAllocator = VariableAllocator.instance;
//}

struct Variable {
  long value;
  List!Constraint* constraints;
  Constraint* determinedBy;
  long mark;
  Strength walkStrength;
  bool stay;
  // char[10] name;
  string name;

  static Variable* Create(string name, long initialValue) {
    // auto new_ = new Variable();
    auto new_ = VariableAllocator.instance.make!Variable();
    // auto new_ = variableAllocator.make!Variable();
    new_.value = initialValue;  // CRASH
    ...
    return new_;
  }
  static void Destroy(Variable* v) {
    assert(v.constraints !is null, "bad Variable struct; already freed?");
    List!Constraint.Destroy(v.constraints);
    v.constraints = null;
    // free(v);
    // destroy(v);
    VariableAllocator.instance.dispose(v);
  }

  ...
  ...
}

It crashes at new_.value, as new_ is null

Reading symbols from ./t1_gdc_debug...
(gdb) r
Starting program: /home/user/benchy/t1_gdc_debug
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
0x000055555557c86a in deltablue.Variable.Create(immutable(char)[], long) (name=..., initialValue=0) at /home/user/benchy/deltablue.d:248
248	    new_.value = initialValue;
(gdb) bt
#0  0x000055555557c86a in deltablue.Variable.Create(immutable(char)[], long) (name=..., initialValue=0) at /home/user/benchy/deltablue.d:248
#1  0x000055555557f14e in deltablue.ChainTest(int) (n=100) at /home/user/benchy/deltablue.d:901
#2  0x000055555557f88c in deltablue.deltablue(out ulong, out ulong) (iters=@0x7fffffffd9b8: 0, flops=@0x7fffffffd9c0: 0)
    at /home/user/benchy/deltablue.d:999
#3  0x00005555555e2b3e in b (__capture=0x7fffffffdba0, name=..., expectedResult=0) at /home/user/benchy/main.d:60
#4  0x00005555555e2417 in D main (args=...) at /home/user/benchy/main.d:91
(gdb) q

I want to use FreeList with parent allocator being Mallocator. Never return allocations back to Mallocator, and for each instance to be thread-local, AND for accessing thread-local instance to not require ANY locks.

Documentation for std.experimental.allocator is kind of vague, complicated and not very prescriptive.

Help would be appreciated.

March 16, 2023

On Thursday, 16 March 2023 at 20:53:28 UTC, Witold wrote:

>

It crashes at new_.value, as new_ is null

It looks like this:

alias VariableAllocator = ThreadLocal!(FreeList!(Mallocator, Variable.sizeof, unbounded));

should be this instead:

alias VariableAllocator = ThreadLocal!(FreeList!(Mallocator, Variable.sizeof, Variable.sizeof));

or

alias VariableAllocator = ThreadLocal!(FreeList!(Mallocator, 0, Variable.sizeof));

Otherwise if the list is empty (which it is at the start), the free list will attempt to allocate from Mallocator.instance, but not at requested size but rather at maxSize of the free list allocator (that makes sense), which is unbound, which is size_t.max, or about 18446 PiB. malloc will return null for such crazy amount.

Also using Mallocator as a parent might not be a good idea.

The reason is that while it will allocate memory, it will not tell GC to scan that area for pointers to other structs. So if some objects are allocated using normal GC allocator, and only reference is from Mallocator/FreeList allocated struct, GC will think this object is unreachable, and clean it up.

Using GCAllocator works. I made this handy alias template:

alias ListAllocator(Type) = ThreadLocal!(FreeList!(GCAllocator, Type.sizeof, Type.sizeof));

Then just in a struct:

struct MyStruct(T) {
  alias allocator = ListAllocator!(MyStruct!T);  // or typeof(this)

  ...
  auto m = allocator.instance.make!(MyStruct!T)();
  ...

  allocator.instance.dispose(m);



}

March 17, 2023

On Thursday, 16 March 2023 at 20:53:28 UTC, Witold wrote:

>

I ported a classic DeltaBlue benchmark to D, and for allocations I use a mix of malloc/free (for structs with some trailing inline arrays), and new for some other structs and arrays.

After running it, it felt kind of slow (I did not compare it to any other language yet, so no idea if it is actually slower, just feels slow), so I did some profiling (perf / hotspot mostly), and noticed that substantial amount of time is spent in gc (both allocation and collection). This is on Linux, and benchmark indeed creates a lot of garbage, as I rerun and rerun it in a loop.

care to share some numbers?
did you try the forking GC on Linux, too ?

March 18, 2023

On Friday, 17 March 2023 at 07:56:28 UTC, Herbie Melbourne wrote:

>

On Thursday, 16 March 2023 at 20:53:28 UTC, Witold wrote:

>

I ported a classic DeltaBlue benchmark to D, and for allocations I use a mix of malloc/free (for structs with some trailing inline arrays), and new for some other structs and arrays.

After running it, it felt kind of slow (I did not compare it to any other language yet, so no idea if it is actually slower, just feels slow), so I did some profiling (perf / hotspot mostly), and noticed that substantial amount of time is spent in gc (both allocation and collection). This is on Linux, and benchmark indeed creates a lot of garbage, as I rerun and rerun it in a loop.

care to share some numbers?

What numbers you are interested in? I have numbers, but code is still being tuned here and there, and numbers from my machine would be arbitrary.