Thread overview
Crow programming language
Feb 15
andy
Feb 15
IchorDev
Feb 15
andy
Feb 16
andy
Feb 16
andy
Feb 17
IchorDev
February 15

For the past few years I've been writing a programming language entirely in D.
The website https://crow-lang.org/ explains the language itself, so here I thought I'd include some comments on my experience writing a medium-sized project in D.

Pros

  • Debug builds with DMD in under 5 seconds.
  • LDC produces very fast optimized code (at the cost of long compile times). Compiling to WASM supports running code in the website.
  • Metaprogramming was useful in the interpreter to generate specialized code for various operations, e.g. operations for reading N bytes from a pointer for various values of N.
  • I like how you generally get a compile error instead of the code doing something surprising. I've added new features and had them work correctly the first time thanks to purity and strong typing.

Cons

  • I run into https://issues.dlang.org/show_bug.cgi?id=22944 a lot. This is annoying when calling a function that takes many delegates. A single error in one delegate causes spurious @nogc errors in every one.
  • Having to write @safe @nogc pure nothrow all the time. It needs a way to make that the default and mark specific things as not-safe or not-pure.

Unions

I used a TaggedUnion mixin. It looks like:

immutable struct ParamsAst {
	immutable struct Varargs {
		DestructureAst param;
	}
	mixin TaggedUnion!(SmallArray!DestructureAst, Varargs*);
}

This is like a DestructureAst[] | Varargs*.
Normally that would be 192 bits: 64 for the array length, 64 for the pointer, 1 for the tag, and 63 for alignment.
But this uses a SmallArray, which packs the pointer and length together, and also has some room for the tag. So ParamsAst only takes up 64 bits.

I implemented pattern matching through a generated match function that takes a delegate for each type. A pattern matching syntax for D could make this prettier.

Tail calls

Using tail calls makes a big difference to interpreter performance. Unfortunately there's no way to specify that a call must be a tail call. It only happens in optimized builds, so I pass --d-version=TailRecursionAvailable in those builds only, and other builds use a less efficient method to call the next operation.

Immutability

Almost everything in the compiler is immutable.
The AST is immutable, so instead of updating it with semantic information, the type checker returns a "model".
This has the advantage of allowing several different AST types to compile to the same model type; a lot of different-looking things are just function calls.
In the IDE, when a file changes, it updates the AST of only the affected code, and updates the model for the module and any modules that depend on it.

Late (logical variables)

Sometimes a field of an immutable entity can't be written immediately.
For example, the type checker first builds a model for the signature of every function, and only then checks function bodies (since that involves looking at the signatures of other functions).
To accomplish this I have a Late type. This starts off uninitialized. Attempting to read it while it's uninitialized is an assertion error. Once it's initialized, it can't be written again. Thus it's logically immutable from the reader's perspective since it will never read two different values.
This requires using unsafe code to write the late value (since you can't normally write to an immutable value). This apparently works, though I wonder if some day a compiler will optimize away lateSet since it's pure, takes immutable inputs, and returns nothing.

Purity

The compiler part of the code (basically everything but the interpreter) is completely pure. It basically implements the LSP (Language Server Protocol) and the LSP client is the one doing all the I/O. Thus the I/O implementation can be different for desktop, IDE, and web.

One annoyance with pure code is having to pass AllSymbols, the symbol (interned string) table, to any function that needs to create a symbol or un-intern it. I think using this through a global variable could be considered pure, since a caller to symbolOfString can't tell whether the symbol has been added or not, and the stringOfSymbol never changes. But I'm not sure if that's actually safe or how to tell D to allow a global variable in pure code.

Scope

I've used scope and in wherever possible with -preview=dip1000 -preview=in. I often need to cast away scope using a function castNonScope. This feels like it needs a language intrinsic or at least a standard library function.

February 15

On Thursday, 15 February 2024 at 04:32:27 UTC, andy wrote:

>
  • Having to write @safe @nogc pure nothrow all the time. It needs a way to make that the default and mark specific things as not-safe or not-pure.

You can make a scope with nothrow, @nogc, etc.:

nothrow @nogc pure @safe{
void fn1(){}
void fn2(){}
void fn3(){}
}
>

A pattern matching syntax for D could make this prettier.

I think Walter has a draft DIP for "sumtype"s with pattern matching.
I really wish this would be added soon.

>

One annoyance with pure code is having to pass AllSymbols, the symbol (interned string) table, to any function that needs to create a symbol or un-intern it. I think using this through a global variable could be considered pure, since a caller to symbolOfString can't tell whether the symbol has been added or not, and the stringOfSymbol never changes. But I'm not sure if that's actually safe or how to tell D to allow a global variable in pure code.

If you make global variables immutable, you can access them in pure functions.
pure functions are not really meant to access global mutable data.

>

I often need to cast away scope using a function castNonScope.
This feels like it needs a language intrinsic or at least a standard
library function.

I think you're not meant to cast away scope?? scope is meant to guarantee that a variable doesn't escape the given scope; casting it away breaks that guarantee, so why use it? If you're using it for memory allocation, be careful... it's not meant for that.

February 15

On Thursday, 15 February 2024 at 15:24:37 UTC, IchorDev wrote:

>

You can make a scope with nothrow, @nogc, etc.:

I've been setting @safe @nogc pure nothrow: at the top of (almost) every module, but then I still have to do it at the top of each struct in the module (if it has functions) and after each delegate type.

>

If you make global variables immutable, you can access them in pure functions.

Is it as simple as that? I'd have to cast away the immutable when adding a new interned string though. Is that still the correct way to do it?

>

I think you're not meant to cast away scope?? scope is meant to guarantee that a variable doesn't escape the given scope; casting it away breaks that guarantee, so why use it? If you're using it for memory allocation, be careful... it's not meant for that.

I declare a parameter scope whenever it's true — the memory isn't retained anywhere — even if I can't prove that to the compiler, so it needs to be trusted instead. This comes up a lot because scope only applies one level deep, so if I need a pointer to something scope, I'm forced to cast away the scopeness of the pointee. This happens if I need to put it in a struct since structs can't contain refs, only pointers.

@safe @nogc:

void main() {
    int i = 3;
    scope S s = S(&i);
    foo(s);
}

struct S { int* ptr; }

struct Ctx {
    private S* s;
}

void foo(scope ref S s) {
    scope Ctx ctx = Ctx(ptrTrustMe(s));
    bar(ctx);
}

void bar(scope ref Ctx ctx) {}

@trusted T* ptrTrustMe(T)(scope ref T x) {
    size_t res = cast(size_t) &x;
    return cast(T*) res;
}
February 16
On 16/02/2024 12:46 PM, andy wrote:
>     If you make global variables |immutable|, you can access them in
>     |pure| functions.
> 
> Is it as simple as that? I'd have to cast away the |immutable| when adding a new interned string though. Is that still the correct way to do it?

No.

It was never correct.

Immutable is a very strong guarantee that the memory will never change.

The compiler in such a case is free to put it into read only memory and as a result crash if you tried to write to it.

You can use const instead which doesn't have any such guarantees and it'll work with a pure function :)
February 16

On Friday, 16 February 2024 at 01:26:42 UTC, Richard (Rikki) Andrew Cattermole wrote:

>

You can use const instead which doesn't have any such guarantees and it'll work with a pure function :)

It still seems to be considered mutable?

pure void main() {
    // a.d(2): Error: `pure` function `D main` cannot access mutable static data `strings`
    auto mut = cast(string[]) strings;
    mut ~= "foo";
}

const string[] strings;
February 16
On 16/02/2024 4:21 PM, andy wrote:
> On Friday, 16 February 2024 at 01:26:42 UTC, Richard (Rikki) Andrew Cattermole wrote:
> 
>> You can use const instead which doesn't have any such guarantees and it'll work with a pure function :)
> 
> It still seems to be considered mutable?
> 
>      pure void main() {
>          // a.d(2): Error: `pure` function `D main` cannot access mutable static data `strings`
>          auto mut = cast(string[]) strings;
>          mut ~= "foo";
>      }
> 
>      const string[] strings;
> 

That would be stored in TLS, add static to get it out.
February 16

On Friday, 16 February 2024 at 03:21:48 UTC, andy wrote:

>

It still seems to be considered mutable?

I got this working using a function pointer:

@safe:

void main() {
	string a = "a";
	string ab = "ab";
	string ab2 = a ~ "b";
	assert(ab.ptr != ab2.ptr);
	assert(internString(ab).ptr == internString(ab2).ptr);
}

@trusted pure string internString(string s) =>
	(cast(string function(string) pure) &internString_impure)(s);

string internString_impure(string a) {
	foreach (string x; strings) {
		if (x == a)
			return x;
	}
	strings ~= a;
	return a;
}
string[] strings;
February 16

On Thursday, 15 February 2024 at 23:46:10 UTC, andy wrote:

>

On Thursday, 15 February 2024 at 15:24:37 UTC, IchorDev wrote:

>

You can make a scope with nothrow, @nogc, etc.:

I've been setting @safe @nogc pure nothrow: at the top of (almost) every module, but then I still have to do it at the top of each struct in the module (if it has functions) and after each delegate type.

@safe permeates into structs, the others do not.

> >

If you make global variables immutable, you can access them in pure functions.

Is it as simple as that? I'd have to cast away the immutable when adding a new interned string though. Is that still the correct way to do it?

No, this is not correct.

What you are doing is something that is logically immutable, but not actually immutable.

What you need to do is to section this off into its own module, and then use language tricks to lie to the compiler. For instance, cast a function pointer that is not pure to a pure function. Then you need to carefully review the module for correctness from an API standpoint.

The language does something similar with memory allocation, which uses a global data structure to allocate memory, but effectively is giving you back memory that is unique while it is valid.

-Steve

February 17

On Thursday, 15 February 2024 at 23:46:10 UTC, andy wrote:

>

Is it as simple as that? I'd have to cast away the immutable when adding a new interned string though. Is that still the correct way to do it?

Oh no, you should never cast away immutable, that might lead to undefined behaviour (as immutable objects may be placed in ROM)
Pure should not be able to read any global mutable data, either way…

>

I declare a parameter scope whenever it's true — the memory isn't retained anywhere — even if I can't prove that to the compiler, so it needs to be trusted instead. This comes up a lot because scope only applies one level deep, so if I need a pointer to something scope, I'm forced to cast away the scopeness of the pointee. This happens if I need to put it in a struct since structs can't contain refs, only pointers.

Oh, interesting. I’ve never had this exact issue