Hi all,
I have been arguing for quite some time about the fact that there is a well of improvement to make to D that are not necessarily breaking changes or languages addition - and that, in fact, D having more feature than most language, adding feature is clearly not where the impact is.
Today I would like to provide a concrete example of things that could be dramatically improved, with no breakage except at the ABI level - which we already do not guarantee. Let's go over these changes.
1/ Downcast to final classes.
class A {}
final class B : A {}
auto foo(A a) {
return cast(B) a;
}
This generates a call into the runtime, making things completely opaque to the optimizer. The algorithm used by the runtime is linear. But there is a dumb constant time solution. because B is final, a is either an instance of B, or it is not (as in, it cannot be an instance of a subclass of B). Therefore, we expect the codegen to look like the following instead:
auto foo(A a) {
return typeid(a) is typeid(B) ? a : null;
}
This obviously won't compile but you get the idea. Not only this is constant time, but very simple too, and visible to the optimizer, which means the check can be folded by the opitimizer after other transforms, for instance inlining.
2/ Inline ClassInfo with the vtbl.
The current layout of objects is as follow:
+-----------+ +-----------+ +-----------+
| __vtbl +--->| typeid +--->| ClassInfo |
+-----------+ +-----------+ | |
| __monitor | | method0 | | ... |
+-----------+ +-----------+ | |
| field0 | | method1 | +-----------+
+-----------+ +-----------+
| field1 | | ... |
+-----------+ +-----------+
| ... |
+-----------+
This causes a ton of extra indirections that are not necessary. instead the following layout ought to be used:
+-----------+ +-----------+
| __vtbl +--->| ClassInfo |
+-----------+ | |
| __monitor | | ... |
+-----------+ | |
| field0 | +-----------+
+-----------+ | method0 |
| field1 | +-----------+
+-----------+ | method1 |
| ... | +-----------+
+-----------+ | ... |
+-----------+
Alternatively, it is possible to have the pointer point at the first method and subtract to get the typeid. The important part is that there is only one indirection now.
3/ Downcast.
Currently, we use a linear algorithm to downcast and we reach this algorithm via a call int he runtime. This prevent the optimizer from doing anything about it, in addition to be slow. The problem is similar to 1/. The whole check can be made trivial if we let the compiler prebake some data for us, namely an array of primary parents. Let's see what it looks like in the first example, when b is not final.
class A {}
class B : A {}
auto foo(A a) {
return cast(B) a;
}
Which we can do as follow:
auto foo(A a) {
// The primaries generated by the compiler for us.
auto tidA = typeid(A);
assert(tidA.primaries = [tidA]);
auto tidB = typeid(B);
assert(tidB.primaries = [tidA, tidB]);
auto t = typeid(a);
auto depth = tidB.primaries.length - 1;
if (t.primary.length <= depth) {
return null;
}
if (t.primary[depth] == tidB) {
return a;
}
return null;
}
This is starting to look good, now we have general downcast in a form that is not only really fast to perform, but also completely transparent to the optimizer.
4/ Downcast to interfaces.
This is getting a bit more tricky, so I won't expand this one fully, but it is also possible to do much better on this front as well. The obvious complication that that interfaces allow for multiple inheritance.
The first obvious optimization is to consider the chain of leftmost (or alternatively the longest chain, which allows to cull more candidates faster) inheritance the primaries for the interface and eliminate the whole branch at once using the same strategy as 3/.
Now we are left with possible secondaries match. Here, no possibility to remain O(1), we'll have to loop over the other interfaces and repeat the matching process. Thankfully very branch hierarchy are uncommon in practice, but even then, we can use this one weird trick to dramatically cull out the search space: bloom filters.
Make the bloom filter 64 bits and simply cull via (targetBloom & aggregateBloom) == targetBloom
and you can eliminate a good chunk of the search right there.
I hope this is the kind of change we can see in D in the future. This is the kind of changes that would make D better in practice, not the next cool feature, but that the current, existing feature, have an quality of implementation that is in line with what can be expected in a modern language.