December 11, 2017
I just had a discussion with Walter, Andrei and Ali about open methods. While Andrei is not a great fan of open methods, he likes the idea of improving D to better support libraries that extend the language - of which my openmethods library is just an example. Andrei, correct me if I misrepresented your opinion in this paragraph.

Part of the discussion was about a mechanism to add user-defined per-object or per-class metadata (there's another part that I will discuss in another thread).

Andrei's initial suggestion is to put it in the vtable. If we know the initial size of the vtable, we can grow it to accommodate new slots. In fact we can already do something along those lines...sort of:

import std.stdio;

class Foo {
  abstract void report();
}

class Bar : Foo {
  override void report() { writeln("I'm fine!"); }
}

void main() {
  void*[] newVtbl;
  auto initVtblSize = Bar.classinfo.vtbl.length;
  newVtbl.length = initVtblSize + 1;
  newVtbl[0..initVtblSize] = Bar.classinfo.vtbl[];
  newVtbl[initVtblSize] = cast(void*) 0x123456;
  byte[] newInit = Bar.classinfo.m_init.dup;
  *cast(void***) newInit.ptr = newVtbl.ptr;
  Bar.classinfo.m_init = newInit;
  Foo foo = new Bar();
  foo.report(); // I'm fine!
  writeln((*cast(void***)foo)[initVtblSize]); // 123456
}

This works with dmd and gdc, not with ldc2. But it gives an idea of what the extension would like.

A variant of the idea is to allocate the user slots *before* the vtable and access them via negative indices. It would be faster.

Of course we would need a thread safe facility that libraries would call to obtain (and release) slots in the extended vtable, and return the index of the allocated slot(s). Thus a library would call an API to (globally) reserve a new slot; then another one to grow the vtable of the classes it targets (automatically finding and growing all the vtables is unfeasible because nested classes are not locatable via ModuleInfo).

Walter also reminded me of the __monitor field so I played with it too. Here is prototype of what per-instance user defined slots could look like.

import std.stdio;

class Foo {
}

void main() {
  byte[] init;
  init.length = Foo.classinfo.m_init.length;
  init[] = Foo.classinfo.m_init[];
  (cast(void**) init.ptr)[1] = cast(void*) 0x1234;
  Foo.classinfo.m_init = init;
  Foo foo = new Foo();
  writeln((cast(void**) foo)[1]); // 1234 with dmd and gdc, null with ldc2
}

This works with dmd and gdc but not with ldc2.

This may be useful for implementing reference-counting schemes, Observers, etc.

In both cases I use the undocumented 'm_init' field in ClassInfo. The books and docs do talk about the 'init' field that is used to initialize structs, but I have found no mention of 'm_init' for classes. Perhaps we could document it and make it mandatory that an implementation uses its content to pre-initialize objects.

Also here I am using the space reserved for the '__monitor' hidden field. This is a problem because 1/ it will go away some day 2/ it is only one word. Granted, that word could store a pointer to a vector of words, where user-defined slots would live; but that would be at the cost of performance.

Finally, note that if you have per-instance user slots and a way of automatically initializing them when an object is created, then you also have per-class user-defined metadata: just allocate a slot in the object, and put a pointer to the data in it.

Please send in comments, especially if you are a library author and have encountered a need for this kind of thing. Eventually the discussion may lead to the drafting of a DIP.



December 11, 2017
I realize that I focused too much on the how, and not enough on the why.

By "metadata" I mean the data that is "just there" in any object, in addition to user defined fields.

An example of per-class metadata is the pointer to the the virtual function table. It is installed by the compiler or the runtime as part of object creation. It is the same for all the instances of the same class.

Just like virtual functions, my openmethods library uses "method tables" and needs a way of finding the method table relevant to an object depending on its class. I want the library to work with objects of any classes, without requiring modifications to existing classes. Thus, there is a need to add that information to any object, in an orthogonal manner. Openmethods has two ways of doing this (one actually hijacks the deprecated 'deallocator' field in ClassInfo) but could profit from the ability to plant pointers right inside objects.

Examples of per-object metadata could be: a reference count, a time stamp, an allocator, or the database an object was fetched from.