On Tuesday, 10 September 2024 at 04:06:16 UTC, Walter Bright wrote:
> https://github.com/WalterBright/documents/blob/96bca2f9f3520cf53ed5c4dec8e5e2d855e64e66/sumtype.md
Summary of comments
- Special cases are bad.
- New capabilities should ideally be general-purpose, not sumtype-specific.
- Sumtype syntax should be modeled after unions, not enums.
Re: std.sumtype limitations
>
- std.sumtype cannot include regular enum members
True, but you can get equivalent semantics using empty structs. For example, this enum:
enum Foo : ubyte { a, b; }
...could be translated to this SumType:
struct A {}
struct B {}
alias Foo = SumType!(A, B);
Currently, the SumType occupies more storage space than the enum, because it is forced to allocate 1 byte of storage to give the empty struct objects a unique address. If D had a feature like C++'s [[no_unique_address]] attribute [1], these two representations could be made completely identical.
>
-
std.sumtype cannot optimize the tag out of existence, for example, when having:
enum Option { None, int* Ptr }
A built-in sum type would not be able to do this either, because in D, every possible sequence of 4 bytes is a potentially-valid int* value.
The reason Rust is able to perform this optimization is that Rust has non-nullable reference types [2]. If D had non-nullable pointer types, then std.sumtype could perform the same optimization using reflection and static if
.
>
- cannot produce compile time error if not all the arms are accounted for in a pattern match rather than a thrown exception
[...]
- an int and a pointer cannot both be in a sumtype and be safe
Dennis has already addressed these, and his responses are correct.
Re: Description
> Member functions of field declarations are restricted the same way union member functions are.
[...]
Members of sumtypes cannot have copy constructors, postblits, or destructors.
std.sumtype does not have these limitations, and having built-in sumtypes limited like this would be a significant step backwards.
If you want to start with a proof-of-concept -preview implementation that lacks these features, that's fine--I did the same with the sumtype
dub package. Support for members with postblits was added in v0.5.0, and support for copy constructors took all the way until v1.0.0. But the DIP should be clear that these limitations will only be temporary.
> A special case of sumtypes will enable use of non-null pointers.
Unprincipled special cases like this are bad language design. Non-null pointers are a generally-useful language feature, even outside of sumtypes. If they're worth doing, they're worth doing properly.
> A new expression, QueryExpression, is introduced to enable querying a sumtype to see if it contains a specified member.
Is this really necessary if we're already planning to add pattern matching?
> SumTypeBody:
`{` SumTypeMembers `}`
[...]
sumtype Option(T) { None, Some(T) }
Using enum-style synatx here is a big mistake, IMO. Sumtypes should use the same AggregateBody syntax as structs and unions.
Advantages of AggregateBody:
-
It's amenable to metaprogramming. Inside an AggregateBody, you can use static if
, static foreach
, mixin
, and so on. With enum-style syntax, your options are greatly reduced.
-
It would allow sumtypes to have user-defined member functions, including operator overloads. (This is a limitation of std.sumtype that I have personally received several complaints about.)
The only disadvantage is that you lose the ability to mix named integer values (like None, above) with typed members (like Some(T)).
However, there is a simple solution to this, which is to allow the programmer to declare fields of type void
:
sumtype Option(T)
{
void none;
T some;
}
This does not have to be a special-case feature of sumtypes; see the abandoned "Give unit type semantics to void" DIP [3] for a detailed description of how this could work as a general language feature.
> The most pragmatic approach for now is to simply disallow taking the address of or a reference to a member of a SumType in @safe code.
This is one valid approach. The other is to make writing to a sumtype value that contains pointers or references @system.
Keep in mind that merely calling a member function of a struct or class instance requires taking a reference to it, since the this
parameter is passed by reference. So this limitation is actually quite severe.
> But since a subtype with only enum members can be implemented as an enum, the compiler should do that rewrite. Similarly, a SumType with only one field declaration should be rewritten as a struct (and the tag can be omitted). Furthermore, a subtype with an enum member with a value of 0 and a field declaration that is a pointer can be rewritten as just a pointer.
Again, special cases like this are bad language design--especially in a language like D with powerful reflection and metaprogramming.
It's also inconsistent with existing language features. For example, if I declare a type like this:
union Example { int n; }
...the compiler does not magically rewrite it as a struct, even though it's functionally equivalent to one.
References
- [[no_unique_address]]: https://en.cppreference.com/w/cpp/language/attributes/no_unique_address
- Non-nullable references: https://doc.rust-lang.org/std/primitive.reference.html
- Give unit type semantics to void: https://github.com/dkorpel/DIPs/blob/dc1495cc2239729adb270012995c76809fe7f08c/DIPs/DIP1NNN-DK.md