Inline sumtype

Inline sumtype
Jun 20 Richard (Rikki) Andrew Cattermole
Jun 20 monkyyy
Jun 20 Richard (Rikki) Andrew Cattermole
Jun 20 monkyyy
Jun 20 Richard (Rikki) Andrew Cattermole
Jun 20 monkyyy
Jun 20 monkyyy
Jun 20 Richard (Rikki) Andrew Cattermole
Jun 21 MrSmith33
Jun 21 Richard (Rikki) Andrew Cattermole
Jun 21 Lance Bachmeier
Jun 21 Richard (Rikki) Andrew Cattermole
Jun 23 Paul Backus

June 20

Inline sumtype

Posted by Richard (Rikki) Andrew Cattermole

Permalink

Richard (Rikki) Andrew Cattermole

Permalink

I've gone back to the drawing board on sumtypes, and I had some ideas yesturday based upon feedback from the last couple of years.

Unlike the other designs that have been proposed, this one is an inline to the type definition instead of having a declaration. It gives enum declarations without a value a type (error currently), that is non-unique to the declaration.

Matching is not added here, but my previous DIP for them could be made to work for it.

Elements

A sumtype contains zero or more elements.

Each element may have a name.

An element type + name pair must be unqiue in the element list, and a name may only appear once.

Valid: alias S = sumtype (int, int i);

Invalid: alias S = sumtype(int i, float i);

Any element whose type is an enum type, will also have the element's name set to that of its identifier.

Set ops

Two sumtypes may be merged together using the + operator.

alias A = sumtype(int);
alias B = sumtype(float);
alias C = A + B;

And subtracted from with -:

alias C = sumtype(int, float);
alias B = sumtype(float);
alias A = C - B;

The normal restrictions within a sumtype elements apply before and after a setop has occured.

Combine with alias assignment for fine grained control:

alias Result = sumtype();

static foreach(Type; Input) {
	Result += sumtype(Type);
}

Duplicates are ignored during merging.

Construction

A sumtype maybe constructed in one of three situations:

Sumtype initialization syntax: Type(Expression) or Type(name: Expression).
Variable declaration: Type var = Expression;
Return: return Expression;
Function call: func(1); where void func(sumtype(int) param)

For function calls, the argument to parameter matching will use conversion, and will be considered less of a match than a exact one.

Assignment

A sumtype may be assigned to another, that has an comparable element list.

alias A = sumtype(int);
alias B = sumtype(int | string);

B b = A(5);

Assigning a sumtype to another will have preference over initialization.

alias A = sumtype(int);
alias B = sumtype(int, A);

B initialization = 8;
B assignment = A(9);

Enum Type

An enum without a value, is given a non-unique type based upon its identifier.

enum None;
pragma(msg, typeof(None)); // __enumtype("None")
static assert(is(typeof(None) == __enumtype));

An enum type may be used as its type, when the grammar requires a type:

None none = None;

As its size is zero, any variables that are of these types are dummy and can replace the existing practice of void[0] storage;. They do not contribute to field layouts.

An assignment of true will succeed, although will be no-op.

None none = true;

The enum type __enumtype("None") will have an instance in object.d.

The mangling of an enum type does not include the module it is in.

Comparison

To check the type of a sumtype against a known type, use an is expression.

assert(sumtype(int).init is int);

Other comparisons i.e. == are done by compiler hook by matching and comparing the values if the tags match.

Casting

Casting a sumtype results in a read barrier to check the tag matches the requested type on read. There is no read barrier on write.

The result of a cast cannot be passed around by-ref or taken a pointer to.

Properties

A sumtype holds the properties:

tag that of the tagged union
storage the block of storage (@system), typed as void[X]
types holds a sequence of types, which are the types for the elements.
names holds a sequence of strings, which are the names for the elements.
copyconstructor, the function pointer for the copy constructor @system.
destructor, the function pointer for the destructor @system.

All properties are assignable in non-@safe code except names and types.

Layout

The layout of a sumetype is variable length it is as follows:

size_t tag
void function(ref new_, ref old_) Copy constructor
void function(ref old_) Destructor
void[X] Storage

The tag is a hash of the fully qualified name of the type + name.

The copy constructor and destructor will work, as long as their arguments are pointing at storage. In practice the compiler will need to inject a null check before calling. The calling convention of the functions matches that of methods.

Attributes used on the copy constructor and destructor will be the common denominator between all the elements who have copy constructors and destructor respectively.

These are optional if none of the element types use a copy constructor or destructor.

Grammar

BasicType:
+    SumType

TypeSpecialization:
+    sumtype
+    __enumtype

+ SumType:
+    sumtype ( SumTypeElements|opt )

+ SumTypeElements:
+    SumTypeElements '|' SumTypeElement
+    SumTypeElement

+ SumTypeElement
+    Type Identifier
+    Type

Example

enum None;
alias Animal = sumtype(None | Cat | Dog dog);

struct Cat {
}

struct Dog {
}

void main() {
	Animal animal = Animal(Cat());
	
	animal.dog = Dog();
	cast(Cat)animal = Cat();

	writeln(cast(Cat)animal);

	animal.None = true;
	assert(animal is None);
}

void someFunc(Animal animal) {
	import std.stdio;
	writeln(animal.tag); // some hash number
}

June 20

Re: Inline sumtype

Posted by monkyyy
in reply to Richard (Rikki) Andrew Cattermole

Permalink

monkyyy

Posted in reply to Richard (Rikki) Andrew Cattermole

Permalink

On Friday, 20 June 2025 at 18:15:17 UTC, Richard (Rikki) Andrew Cattermole wrote:

alias A = sumtype(int);
alias B = sumtype(float);
alias C = A + B;

It seems silly to me to consume alias operators on an untested and unimplemented concept when its not like it couldnt be used elsewhere

why not: ?

alias seq(T...)=T;
alias a=seq!int;
alias b=seq!float;
alias c=a+b;

alias a=seq!();
a+=int;
a+=float;

June 20

Re: Inline sumtype

Posted by monkyyy
in reply to Richard (Rikki) Andrew Cattermole

Permalink

monkyyy

Posted in reply to Richard (Rikki) Andrew Cattermole

Permalink

On Friday, 20 June 2025 at 18:15:17 UTC, Richard (Rikki) Andrew Cattermole wrote:

size_t tag

Nonsense why would someone need 64 bits of types, that wont ever compile
enum are at least only ints https://dlang.org/spec/enum.html#enum_properties
(tho they should also just be bytes that upgrade to shorts if you ever go longer then 255 elements)

June 21

Re: Inline sumtype

Posted by Richard (Rikki) Andrew Cattermole
in reply to monkyyy

Permalink

Richard (Rikki) Andrew Cattermole

Posted in reply to monkyyy

Permalink

On 21/06/2025 6:27 AM, monkyyy wrote:
> On Friday, 20 June 2025 at 18:15:17 UTC, Richard (Rikki) Andrew Cattermole wrote:
>> ```d
>> alias A = sumtype(int);
>> alias B = sumtype(float);
>> alias C = A + B;
>> ```
> 
> It seems silly to me to consume alias operators on an untested and unimplemented concept when its not like it couldnt be used elsewhere
> 
> why not: ?
> 
> ```d
> alias seq(T...)=T;
> alias a=seq!int;
> alias b=seq!float;
> alias c=a+b;
> ```
> 
> ```d
> alias a=seq!();
> a+=int;
> a+=float;
> ```

The operators are defined on the sumtype, nothing else.

Alias is not a typedef, it does not have its own unique type in the type system, so there is nothing for the operators to attach to in the language.

The alias itself disappears from the type system. The type system can only see what it has been aliased to, its a direct replacement, and yes there is at least one bug due to this.

Have I explained this well enough?
I get the feeling that you should have come across this by now, and therefore this explanation won't be enough.

June 21

Re: Inline sumtype

Posted by Richard (Rikki) Andrew Cattermole
in reply to monkyyy

Permalink

Richard (Rikki) Andrew Cattermole

Posted in reply to monkyyy

Permalink

On 21/06/2025 6:43 AM, monkyyy wrote:
> On Friday, 20 June 2025 at 18:15:17 UTC, Richard (Rikki) Andrew Cattermole wrote:
>> - ``size_t`` tag
> 
> Nonsense why would someone need 64 bits of types, that wont ever compile
> enum are at least only ints https://dlang.org/spec/ enum.html#enum_properties
> (tho they should also just be bytes that upgrade to shorts if you ever go longer then 255 elements)

Hashes in D are typically size_t or ulong.

The use of size_t makes sense as it takes up a full register, and we may as well use all of it to get more accuracy. The rest of the layout won't merge into it.

https://github.com/dlang/dmd/blob/f0541d65ba777e6f03499bcb5c0c59da8ce94050/druntime/src/object.d#L137

https://github.com/dlang/dmd/blob/f0541d65ba777e6f03499bcb5c0c59da8ce94050/druntime/src/core/internal/hash.d#L132

June 20

Re: Inline sumtype

Posted by monkyyy
in reply to Richard (Rikki) Andrew Cattermole

Permalink

monkyyy

Posted in reply to Richard (Rikki) Andrew Cattermole

Permalink

On Friday, 20 June 2025 at 18:46:38 UTC, Richard (Rikki) Andrew Cattermole wrote:
> 
> Hashes in D are typically size_t or ulong.
>
why would it be a hash?

snars being overly fancy here but this will be a ubyte most of the time: https://github.com/dlang/phobos/blob/832cc465998b1ea77051cd3fd014b544442a4f8c/std/sumtype.d#L288

in mine versions I think I used enum as is, meaning its wouldve been an int

in an ideal world:
```d
struct sumtype(T...){
  enum Tag=enum{ static foreach(A;T){...
  union Union{ static foreach(A;T){...
  Tag tag;
  Union myunion;
  ...
}
```
Its not a hash, its an enum paired with a union; then you can iterate T while having  tag with `static foreach(I,A;T){ if(I==tag){...` ; simple and sane

Ideally enum with less then 255 elements would be ubytes so you get to make simpliticy vs idealness tradeoffs; but 64 bits is insane; break web apis if a client is 32 bit, also insane. Size_t is awful in general. The d compiler breaks down with deeply nested template hell, how do you plan on generating 64^2 types when theres a 100 depth recursion limit and they all have to be in memory.

June 20

Re: Inline sumtype

Posted by monkyyy
in reply to monkyyy

Permalink

monkyyy

Posted in reply to monkyyy

Permalink

On Friday, 20 June 2025 at 19:17:33 UTC, monkyyy wrote:
>  The d compiler breaks down with deeply nested template hell, how do you plan on generating 64^2 types when theres a 100 depth recursion limit and they all have to be in memory.

Also isnt uniqueness filtering a O(n^2) algorithm? at n pf 2^64, your not finishing that filter in the runtime of the universe

June 21

Re: Inline sumtype

Posted by Richard (Rikki) Andrew Cattermole
in reply to monkyyy

Permalink

Richard (Rikki) Andrew Cattermole

Posted in reply to monkyyy

Permalink

On 21/06/2025 7:17 AM, monkyyy wrote:
> On Friday, 20 June 2025 at 18:46:38 UTC, Richard (Rikki) Andrew Cattermole wrote:
>>
>> Hashes in D are typically size_t or ulong.
>>
> why would it be a hash?

When you assign one sumtype to another, where the second has a different element set, you have to match and then translate the previous tag to the new one.

If you are doing this often, that is a lot of unnecessary work. A hash will work in both sumtypes and is therefore a direct copy.

This was suggested to me by Jacob Carlburg in the context of value type exceptions and is a brilliant way to minimize the cost.

> snars being overly fancy here but this will be a ubyte most of the time: https://github.com/dlang/phobos/ blob/832cc465998b1ea77051cd3fd014b544442a4f8c/std/sumtype.d#L288
> 
> in mine versions I think I used enum as is, meaning its wouldve been an int
> 
> in an ideal world:
> ```d
> struct sumtype(T...){
>    enum Tag=enum{ static foreach(A;T){...
>    union Union{ static foreach(A;T){...
>    Tag tag;
>    Union myunion;
>    ...
> }
> ```
> Its not a hash, its an enum paired with a union; then you can iterate T while having  tag with `static foreach(I,A;T){ if(I==tag){...` ; simple and sane
> 
> Ideally enum with less then 255 elements would be ubytes so you get to make simpliticy vs idealness tradeoffs; but 64 bits is insane; break web apis if a client is 32 bit, also insane. Size_t is awful in general. The d compiler breaks down with deeply nested template hell, how do you plan on generating 64^2 types when theres a 100 depth recursion limit and they all have to be in memory.

I'm not.

You are presupposing that the tag is an offset, not a hash.

If it was an offset I would indeed make it variable sized, so only the amount needed would be used.

The benefit of using an offset is branch tables will have a better chance to work. But I'm not convinced that it is worth it over getting the cheaper copies.

June 21

Re: Inline sumtype

Posted by MrSmith33
in reply to Richard (Rikki) Andrew Cattermole

Permalink

MrSmith33

Posted in reply to Richard (Rikki) Andrew Cattermole

Permalink

On Friday, 20 June 2025 at 18:15:17 UTC, Richard (Rikki) Andrew Cattermole wrote:

The tag is a hash of the fully qualified name of the type + name.

What happens in the case of collision?
Can collision be detected at compile-time?

June 21

Re: Inline sumtype

Posted by Lance Bachmeier
in reply to Richard (Rikki) Andrew Cattermole

Permalink

Lance Bachmeier

Posted in reply to Richard (Rikki) Andrew Cattermole

Permalink

On Friday, 20 June 2025 at 18:15:17 UTC, Richard (Rikki) Andrew Cattermole wrote:

I've gone back to the drawing board on sumtypes, and I had some ideas yesturday based upon feedback from the last couple of years.

It would be useful to include a comparison with std.sumtype. Explicitly state the new functionality this enables and the improvement in cases it overlaps.

void main() {
Animal animal = Animal(Cat());

animal.dog = Dog();
cast(Cat)animal = Cat();

writeln(cast(Cat)animal);

animal.None = true;
assert(animal is None);
}

void someFunc(Animal animal) {
import std.stdio;
writeln(animal.tag); // some hash number
}

Could it do this?

double fun(Dog d) {
// Operations specific to Dog
}

double fun(Cat c) {
// Operations specific to Cat
}


auto a1 = Animal(Cat());
fun(a1);

auto a2 = Animal(Dog());
fun(a2);

Just one example where this is useful is with dates. You might have an int, two ints, or a string. Handling those cases with templates or structs is less than an optimal experience in terms of verbosity, ugliness, and complexity.

Top | Forum index | About this forum

Forums

Elements

Set ops

Construction

Assignment

Enum Type

Comparison

Casting

Properties

Layout

Grammar

Example