For the past few years I've been writing a programming language entirely in D.
The website https://crow-lang.org/ explains the language itself, so here I thought I'd include some comments on my experience writing a medium-sized project in D.
Pros
- Debug builds with DMD in under 5 seconds.
- LDC produces very fast optimized code (at the cost of long compile times). Compiling to WASM supports running code in the website.
- Metaprogramming was useful in the interpreter to generate specialized code for various operations, e.g. operations for reading N bytes from a pointer for various values of N.
- I like how you generally get a compile error instead of the code doing something surprising. I've added new features and had them work correctly the first time thanks to purity and strong typing.
Cons
- I run into https://issues.dlang.org/show_bug.cgi?id=22944 a lot. This is annoying when calling a function that takes many delegates. A single error in one delegate causes spurious
@nogc
errors in every one. - Having to write
@safe @nogc pure nothrow
all the time. It needs a way to make that the default and mark specific things as not-safe or not-pure.
Unions
I used a TaggedUnion
mixin. It looks like:
immutable struct ParamsAst {
immutable struct Varargs {
DestructureAst param;
}
mixin TaggedUnion!(SmallArray!DestructureAst, Varargs*);
}
This is like a DestructureAst[] | Varargs*
.
Normally that would be 192 bits: 64 for the array length, 64 for the pointer, 1 for the tag, and 63 for alignment.
But this uses a SmallArray
, which packs the pointer and length together, and also has some room for the tag. So ParamsAst
only takes up 64 bits.
I implemented pattern matching through a generated match
function that takes a delegate for each type. A pattern matching syntax for D could make this prettier.
Tail calls
Using tail calls makes a big difference to interpreter performance. Unfortunately there's no way to specify that a call must be a tail call. It only happens in optimized builds, so I pass --d-version=TailRecursionAvailable
in those builds only, and other builds use a less efficient method to call the next operation.
Immutability
Almost everything in the compiler is immutable.
The AST is immutable, so instead of updating it with semantic information, the type checker returns a "model".
This has the advantage of allowing several different AST types to compile to the same model type; a lot of different-looking things are just function calls.
In the IDE, when a file changes, it updates the AST of only the affected code, and updates the model for the module and any modules that depend on it.
Late (logical variables)
Sometimes a field of an immutable entity can't be written immediately.
For example, the type checker first builds a model for the signature of every function, and only then checks function bodies (since that involves looking at the signatures of other functions).
To accomplish this I have a Late
type. This starts off uninitialized. Attempting to read it while it's uninitialized is an assertion error. Once it's initialized, it can't be written again. Thus it's logically immutable from the reader's perspective since it will never read two different values.
This requires using unsafe code to write the late value (since you can't normally write to an immutable value). This apparently works, though I wonder if some day a compiler will optimize away lateSet
since it's pure, takes immutable
inputs, and returns nothing.
Purity
The compiler part of the code (basically everything but the interpreter) is completely pure. It basically implements the LSP (Language Server Protocol) and the LSP client is the one doing all the I/O. Thus the I/O implementation can be different for desktop, IDE, and web.
One annoyance with pure code is having to pass AllSymbols
, the symbol (interned string) table, to any function that needs to create a symbol or un-intern it. I think using this through a global variable could be considered pure, since a caller to symbolOfString
can't tell whether the symbol has been added or not, and the stringOfSymbol
never changes. But I'm not sure if that's actually safe or how to tell D to allow a global variable in pure code.
Scope
I've used scope
and in
wherever possible with -preview=dip1000 -preview=in
. I often need to cast away scope
using a function castNonScope
. This feels like it needs a language intrinsic or at least a standard library function.