Sum Types - first draft - D Programming Language Discussion Forum

Index » DIP Development » Sum Types - first draft

Sum Types - first draft
Sep 10 Walter Bright
Sep 10 Richard (Rikki) Andrew Cattermole
Sep 10 Richard (Rikki) Andrew Cattermole
Sep 10 Walter Bright
Sep 10 monkyyy
Sep 10 Richard (Rikki) Andrew Cattermole
Sep 10 Dennis
Sep 10 Walter Bright
Sep 11 Dukc
Sep 11 Paul Backus
Sep 11 Dukc
Sep 10 Paul Backus
Sep 10 Walter Bright
Sep 10 Paul Backus
Sep 10 Richard (Rikki) Andrew Cattermole
Sep 10 Paul Backus
Sep 10 Walter Bright
Sep 10 Paul Backus
Sep 10 Walter Bright
Sep 11 Paul Backus
Sep 10 jmh530
Sep 10 monkyyy
Sep 11 Dukc
Sep 11 Per Nordlöw
Sep 15 IchorDev
Sep 19 Quirin Schroll

September 09

Sum Types - first draft

Posted by Walter Bright

Permalink

Walter Bright

Permalink

https://github.com/WalterBright/documents/blob/96bca2f9f3520cf53ed5c4dec8e5e2d855e64e66/sumtype.md


I wrote that some time ago back in November 2022. The idea is to have a sumtypes proposal, followed by a match proposal.

Previous discussions:

https://www.digitalmars.com/d/archives/digitalmars/D/sumtypes_for_D_366242.html

https://www.digitalmars.com/d/archives/digitalmars/D/Sum_type_the_D_way_366389.html

https://www.digitalmars.com/d/archives/digitalmars/D/draft_proposal_for_Sum_Types_for_D_366307.html

September 10

Re: Sum Types - first draft

Posted by Richard (Rikki) Andrew Cattermole
in reply to Walter Bright

Permalink

Richard (Rikki) Andrew Cattermole

Posted in reply to Walter Bright

Permalink

I'll ignore the formatting issues with the headings.

1. While it is a little like an enum, it is only the tag that is the enum, not the elements.
2. Use a hash of the tag name + tag type, this allows it to combine between instances by copy alone. This was suggested by Jacob Carlborg for my design ages ago and it is brilliant.
3. Sumtypes need to support copy constructors, postblits and destructors. Otherwise you limit their usability too significantly. This requires a variable size virtual table that goes along with it.
4. Null does not need special casing, either ``typeof(null)`` is in the element set or its not.
5. Do not use this ? syntax for determining what the tag type is. That will prevent us doing the nullability operators which people including myself want. Matching takes over on this, it is not needed.
6. For reading, I would recommend against doing it in ``@safe`` code. If you want it to be safe, use matching.
7. There is no alternative syntax proposed, there are two different syntaxes with different tradeoffs, with one have significantly more familiarity to people who have been using sumtypes for 50 years. Both need to exist.
8. While it appears to be a good idea to place the tag name using a single identifier syntax, it keeps it nice and simple after all, it does require that a tagged union has to give each and every element a name. Note all existing D implementation of sumtypes do not have this requirement, it is too restrictive and does not match the communities understanding of the concept. Use my member-of-operator syntax for providing this functionality. It was due to be up next... but I'm working on escape set.

September 10

Re: Sum Types - first draft

Posted by Richard (Rikki) Andrew Cattermole
in reply to Richard (Rikki) Andrew Cattermole

Permalink

Richard (Rikki) Andrew Cattermole

Posted in reply to Richard (Rikki) Andrew Cattermole

Permalink

On 10/09/2024 5:15 PM, Richard (Rikki) Andrew Cattermole wrote:
> I'll ignore the formatting issues with the headings.
> 
> 1. While it is a little like an enum, it is only the tag that is the enum, not the elements.
> 2. Use a hash of the tag name + tag type, this allows it to combine between instances by copy alone. This was suggested by Jacob Carlborg for my design ages ago and it is brilliant.
> 3. Sumtypes need to support copy constructors, postblits and destructors. Otherwise you limit their usability too significantly. This requires a variable size virtual table that goes along with it.
> 4. Null does not need special casing, either ``typeof(null)`` is in the element set or its not.
> 5. Do not use this ? syntax for determining what the tag type is. That will prevent us doing the nullability operators which people including myself want. Matching takes over on this, it is not needed.
> 6. For reading, I would recommend against doing it in ``@safe`` code. If you want it to be safe, use matching.
> 7. There is no alternative syntax proposed, there are two different syntaxes with different tradeoffs, with one have significantly more familiarity to people who have been using sumtypes for 50 years. Both need to exist.
> 8. While it appears to be a good idea to place the tag name using a single identifier syntax, it keeps it nice and simple after all, it does require that a tagged union has to give each and every element a name. Note all existing D implementation of sumtypes do not have this requirement, it is too restrictive and does not match the communities understanding of the concept. Use my member-of-operator syntax for providing this functionality. It was due to be up next... but I'm working on escape set.

The reason I have not redone my proposal for sumtypes and posted it here is that I'm waiting on the member-of-operator AND matching syntax. The first matching proposal is in the DIP queue for Walter's and Atila's review.

Without both you end up with 4, 5, 6, and 8.

September 10

Re: Sum Types - first draft

Posted by Walter Bright
in reply to Richard (Rikki) Andrew Cattermole

Permalink

Walter Bright

Posted in reply to Richard (Rikki) Andrew Cattermole

Permalink

On 9/9/2024 10:15 PM, Richard (Rikki) Andrew Cattermole wrote:
> I'll ignore the formatting issues with the headings.

Sorry about that. Reading it over again reveals some other problems I'll address shortly.

September 10

Re: Sum Types - first draft

Posted by monkyyy
in reply to Walter Bright

Permalink

monkyyy

Posted in reply to Walter Bright

Permalink

On Tuesday, 10 September 2024 at 04:06:16 UTC, Walter Bright wrote:
>
> A pattern matching statement suitable for accessing SumTypes is the subject of another DIP.

is it tho? Its the ~~only~~ most important part

September 10

Re: Sum Types - first draft

Posted by Richard (Rikki) Andrew Cattermole
in reply to monkyyy

Permalink

Richard (Rikki) Andrew Cattermole

Posted in reply to monkyyy

Permalink

On 10/09/2024 10:15 PM, monkyyy wrote:
> On Tuesday, 10 September 2024 at 04:06:16 UTC, Walter Bright wrote:
>>
>> A pattern matching statement suitable for accessing SumTypes is the subject of another DIP.
> 
> is it tho? Its the ~~only~~ most important part

Certainly important.

The best part of this, is matching on types just went for review to W&A!

September 10

Re: Sum Types - first draft

Posted by Dennis
in reply to Walter Bright

Permalink

Dennis

Posted in reply to Walter Bright

Permalink

On Tuesday, 10 September 2024 at 04:06:16 UTC, Walter Bright wrote:

https://github.com/WalterBright/documents/blob/96bca2f9f3520cf53ed5c4dec8e5e2d855e64e66/sumtype.md

The evaluation of std.sumtype needs adjustment, because it's full of wrong assumptions.

std.sumtype cannot optimize the tag out of existence

It could add that special case if it wanted.

cannot produce compile time error if not all the arms are accounted for in a pattern match rather than a thrown exception

It can and does if you use match instead of tryMatch:

SumType!(string, float) s;
s.match!((float x) => 1);
// std/sumtype.d(2018): Error: static assert:  "No matching handler for types `(string)`"

an int and a pointer cannot both be in a sumtype and be safe

It can and is, that's why DIP1035 added @system variables.

void main() @safe
{
    SumType!(int, int*) s = new int;
}

September 10

Re: Sum Types - first draft

Posted by Paul Backus
in reply to Walter Bright

Permalink

Paul Backus

Posted in reply to Walter Bright

Permalink

On Tuesday, 10 September 2024 at 04:06:16 UTC, Walter Bright wrote:

https://github.com/WalterBright/documents/blob/96bca2f9f3520cf53ed5c4dec8e5e2d855e64e66/sumtype.md

Summary of comments

Special cases are bad.
New capabilities should ideally be general-purpose, not sumtype-specific.
Sumtype syntax should be modeled after unions, not enums.

Re: std.sumtype limitations

std.sumtype cannot include regular enum members

True, but you can get equivalent semantics using empty structs. For example, this enum:

enum Foo : ubyte { a, b; }

...could be translated to this SumType:

struct A {}
struct B {}
alias Foo = SumType!(A, B);

Currently, the SumType occupies more storage space than the enum, because it is forced to allocate 1 byte of storage to give the empty struct objects a unique address. If D had a feature like C++'s [[no_unique_address]] attribute [1], these two representations could be made completely identical.

std.sumtype cannot optimize the tag out of existence, for example, when having:
```
enum Option { None, int* Ptr }
```

A built-in sum type would not be able to do this either, because in D, every possible sequence of 4 bytes is a potentially-valid int* value.

The reason Rust is able to perform this optimization is that Rust has non-nullable reference types [2]. If D had non-nullable pointer types, then std.sumtype could perform the same optimization using reflection and static if.

cannot produce compile time error if not all the arms are accounted for in a pattern match rather than a thrown exception
[...]
an int and a pointer cannot both be in a sumtype and be safe

Dennis has already addressed these, and his responses are correct.

Re: Description

Member functions of field declarations are restricted the same way union member functions are.
[...]
Members of sumtypes cannot have copy constructors, postblits, or destructors.

std.sumtype does not have these limitations, and having built-in sumtypes limited like this would be a significant step backwards.

If you want to start with a proof-of-concept -preview implementation that lacks these features, that's fine--I did the same with the sumtype dub package. Support for members with postblits was added in v0.5.0, and support for copy constructors took all the way until v1.0.0. But the DIP should be clear that these limitations will only be temporary.

A special case of sumtypes will enable use of non-null pointers.

Unprincipled special cases like this are bad language design. Non-null pointers are a generally-useful language feature, even outside of sumtypes. If they're worth doing, they're worth doing properly.

A new expression, QueryExpression, is introduced to enable querying a sumtype to see if it contains a specified member.

Is this really necessary if we're already planning to add pattern matching?

SumTypeBody:
   `{` SumTypeMembers `}`

[...]

sumtype Option(T) { None, Some(T) }

Using enum-style synatx here is a big mistake, IMO. Sumtypes should use the same AggregateBody syntax as structs and unions.

Advantages of AggregateBody:

It's amenable to metaprogramming. Inside an AggregateBody, you can use static if, static foreach, mixin, and so on. With enum-style syntax, your options are greatly reduced.
It would allow sumtypes to have user-defined member functions, including operator overloads. (This is a limitation of std.sumtype that I have personally received several complaints about.)

The only disadvantage is that you lose the ability to mix named integer values (like None, above) with typed members (like Some(T)).

However, there is a simple solution to this, which is to allow the programmer to declare fields of type void:

sumtype Option(T)
{
    void none;
    T some;
}

This does not have to be a special-case feature of sumtypes; see the abandoned "Give unit type semantics to void" DIP [3] for a detailed description of how this could work as a general language feature.

The most pragmatic approach for now is to simply disallow taking the address of or a reference to a member of a SumType in @safe code.

This is one valid approach. The other is to make writing to a sumtype value that contains pointers or references @system.

Keep in mind that merely calling a member function of a struct or class instance requires taking a reference to it, since the this parameter is passed by reference. So this limitation is actually quite severe.

But since a subtype with only enum members can be implemented as an enum, the compiler should do that rewrite. Similarly, a SumType with only one field declaration should be rewritten as a struct (and the tag can be omitted). Furthermore, a subtype with an enum member with a value of 0 and a field declaration that is a pointer can be rewritten as just a pointer.

Again, special cases like this are bad language design--especially in a language like D with powerful reflection and metaprogramming.

It's also inconsistent with existing language features. For example, if I declare a type like this:

union Example { int n; }

...the compiler does not magically rewrite it as a struct, even though it's functionally equivalent to one.

References

[[no_unique_address]]: https://en.cppreference.com/w/cpp/language/attributes/no_unique_address
Non-nullable references: https://doc.rust-lang.org/std/primitive.reference.html
Give unit type semantics to void: https://github.com/dkorpel/DIPs/blob/dc1495cc2239729adb270012995c76809fe7f08c/DIPs/DIP1NNN-DK.md

September 10

Re: Sum Types - first draft

Posted by Walter Bright
in reply to Paul Backus

Permalink

Walter Bright

Posted in reply to Paul Backus

Permalink

Thanks for your detailed response. Let me address just one for the moment:

On 9/10/2024 9:20 AM, Paul Backus wrote:
>> * std.sumtype cannot optimize the tag out of existence, for example, when having:
>>
>>       enum Option { None, int* Ptr }
> 
> A built-in sum type would not be able to do this either, because in D, every possible sequence of 4 bytes is a potentially-valid int* value.
> 
> The reason Rust is able to perform this optimization is that Rust has non-nullable reference types [2]. If D had non-nullable pointer types, then std.sumtype could perform the same optimization using reflection and `static if`.

I was approaching it from the other way around. Isn't a non-nullable pointer a sumtype? Why have both non-nullable types and sumtypes?

September 10

Re: Sum Types - first draft

Posted by Paul Backus
in reply to Walter Bright

Permalink

Paul Backus

Posted in reply to Walter Bright

Permalink

On Tuesday, 10 September 2024 at 17:05:49 UTC, Walter Bright wrote:

Thanks for your detailed response. Let me address just one for the moment:

On 9/10/2024 9:20 AM, Paul Backus wrote:

> >

std.sumtype cannot optimize the tag out of existence, for example, when having:

enum Option { None, int* Ptr }

A built-in sum type would not be able to do this either, because in D, every possible sequence of 4 bytes is a potentially-valid int* value.

I was approaching it from the other way around. Isn't a non-nullable pointer a sumtype? Why have both non-nullable types and sumtypes?

You have it exactly backwards. A nullable pointer type is the sum of a non-nullable pointer type and typeof(null).

A non-nullable pointer type is a pointer type with its range of valid values restricted. You could think of it as a "difference type"--if you take T*, and subtract typeof(null) from it (i.e., take the set difference [1] of their values), you get a non-nullable pointer type.

[1] https://en.wikipedia.org/wiki/Complement_(set_theory)#Relative_complement

Top | Forum index | About this forum

Forums

Summary of comments

Re: std.sumtype limitations

Re: Description

References