On Tuesday, 29 November 2022 at 06:26:20 UTC, Walter Bright wrote:
> Go ahead, Make My Day! Destroy!
https://github.com/WalterBright/DIPs/blob/sumtypes/DIPs/1NNN-(wgb).md
Feedback
- In Alternative Syntax the following is not supported by the grammar you provided:
sumtype Option(T) { None, Some(T) }
It should probably be
sumtype Option(T) { None, T Some }
-
Maybe you should mention anonymous sumtype
(cf. anonymous struct
and anonymous union
).
-
As for a keyword, you could circumvent the problem by re-using existing keywords. enum union
would be a candidate.
-
“Members of sumtypes cannot have copy constructors, postblits, or destructors.” Limits its application severely. Andrei had a talk explaining how destruction of std::variant
in C++ could be done efficiently using static foreach
– if C++ had it. I don’t see how a copy constructor/postblit is evil.
-
“Sumtypes can be copied if all its constituent types can be copied.”
Discussion
It seems it tries to be too many things at once and that is confusing. It tries to be:
- An algebraic union with a tagged union as its special case
- An optional type
- Some form of non-nullable annotation.
Let’s talk about 3. first: The recognition of a magical pattern is bad; it is backwards and leads to surprises. Were reference types non-nullable from day one, enum union { typeof(null), int* }
would be a great candidate for a nullable pointer to int
.
The same way typeof(return)
is special-cased in the grammar (the token return
not being an expression), you could at least special-case null
(the keyword) as a possible case of a D sum type. However, this is still a confusing design because it looks like you’re adding a case, but actually it removes a value form the overall type. I’ve heard of negative types in type theory before, but to be honest, I don’t know much about it. Better than seemingly adding null
would be to add a (visually) negative null
: Allow the pseudo-member !null
or -null
. It’s still weird and maybe even hard to teach, but at least we can tell people: “The same way you can add something to 2 to make it 1, namely −1, you can add -typeof(null)
to int*
, you get an int*
that cannot be null
. For your convenience, instead of -typeof(null)
you can write -null
.”
A sum type consisting of at least one reference type (pointers, classes, AAs) may include -null
as a special member that makes the nullable options non-nullable.
We can even add syntax for simple non-nullable reference types: int*-null
or int*\null
or int*!
or whatever you like. Likely, it’d be a lowering to a template in object.d named nonnull
or something similar.
As for 1. and 2., those make sense. Adding a single, distinct value to a type makes this an optional type. A template optional
in Phobos would become a vocabulary type and makes people not implement their own optionals and be implemented using a sum type. This could even be added to object.d with syntax.
I still think that D would be a better and safer language if it had non-null reference types: In such a hypothetical D, an int*
always points to an int
, but an int*?
might be null
. For value types, something like int?
can be done in two ways: Either reserve an otherwise invalid value as the null value (possibly −2³²) or make int?
an alias for enum union NullableInt { typeof(null), int }
– the first option is a non-starter because of backward compatibility, but an entirely new language could do that. User-defined types can have a compile-time constant opNull
that tells the compiler: “Not all combinations of values for my members signify a valid object; use this as the ‘null’ object instead of adding a boolean tag when I’m combined with ?
.” It’s probably worth exploring how much of that can be done and to what cost.
Why all this talk about optional types? Because they play well with sum types. Asking for a possible variant of a sum type is inherently returning an optional value. The first step to getting sum types right is getting optional types right. That doesn’t mean optional types cannot be a special case of sum types. In that sense, optional types are the most relevant application of sum types.
EnumUnionDeclaration:
`enum` `union` Identifier EnumUnionBody
EnumUnionTemplateDeclaration
+ AnonymousEnumUnionDeclaration
+
+ AnonymousEnumUnionDeclaration:
+ `enum` `union` EnumUnionBody
EnumUnionTemplateDeclaration:
`enum` `union` Identifier TemplateParameters Constraint (opt) EnumUnionBody
EnumUnionBody:
`{` EnumUnionMembers `}`
EnumUnionMembers:
EnumUnionMember
EnumUnionMember `,`
EnumUnionMember `,` EnumUnionMembers
EnumUnionMember:
+ 'null'
+ '!null'
EnumMemberAttributes EnumUnionMember
EnumMember
FieldDeclaration
EnumMember:
- Identifier
- Identifier = AssignExpression
+ case Identifier
+ case Identifier = AssignExpression
FieldDeclaration:
- Type Identifier
- Type Identifier = AssignExpression
+ Type Identifier(opt)
+ Type Identifier(opt) = AssignExpression
QueryExpression:
`?` PostfixExpression `.` Identifier
I mentioned null
and !null
above. null
is a shorthand for typeof(null)
. Omitting the Identifier
is allowed only when the EnumMember
is the only one with its type and the declaration is not anonymous.
Basically we have these categories of sum types:
- Optionals: 1 member (usually unnamed) plus
null
.
- Non-nulls: 1 member (usually unnamed) of reference type minus
null
(= plus !null
).
- Commutative: ≥ 2 members (usually unnamed) with different types.
- Homogeneous: ≥ 2 named members with the same type.
- Algebraic: All members named, potentially with repeated types, potentially self-referential.
- Non-null commutative: ≥ 2 members (usually unnamed) with different types among which are reference types plus
!null
- Non-null algebraic: All members named, potentially with repeated types among which are reference types (potentially self-referential) plus
!null
.
A member can be an untyped named constant. Those are equivalent to members of unique unit type. But if types are optional and identifiers are optional, how do we know which is which? We don’t. We need something to disambiguate. I went with case
for something that’s maybe acceptable. Here, case
can be read as a type: “Make a unique case type for this.” The case
in case x = init
is Case_x
defined as struct Case_x { typeof(init) value = init; }
. Because the type is a unit type, its value need not be stored in the instance.
If all members are case
members, the type is effectively a plain-old enum
. The union (not the tag field) can be elided as the tag field suffices to retrieve the values. The only difference to regular enum
I see is that the DIP proposes that for the tag field, the smallest available integral type be used, whereas enum
by default has int
values.
I call the (usually unnamed) type-distinguished sum types commutative because there’s no reason whatsoever to treat int + string
and string + int
differently. There is the int
member and the string member. It’s the same type spelled differently the same way ordering of type constructors or function attributes (on e.g. function pointer types) not only makes no difference, but creates the very same type. One does not simply reorder a struct, but one can reorder a union and commutative sum types are close enough. Algebraic sum types are too much struct-like to be reordered (naming matters, knowing the type to extract does not suffice).
The member querying syntax is okay, I guess it can be improved. While value.member?
reads best, it meaningfully conflicts with the trinary operator, so the second best would be value.?member
. If a sum type is an optional type, i.e. it is enum union X {null, T}
for some T
, it’d be great to have some C#-esque ?.
and ??
operators as well: optionalcat?.name
returns an optional string: null
if there is no cat and the cat’s name if there is a cat. This conflicts with the trinary operator as well, but the ?
followed by module-scope .
without a space should be virtually non-existent. For a nullable value v
, v ?? d
returns v
if it’s not null
(but typed as not null) and d
(default, lazy evaluated) if it is. A .
could implicitly be .?
if the member after it is followed up by ??
or ?.
, i.e. value.member?.toString()
is actually value.?member?.toString()
and yourpet.cat ?? mycat
is actually yourpet.?cat ?? mycat
because you might not have a cat.
Querying for a commutative sum type would usually be done via the type: The sum type converts implicitly to an optional of any of its constituent types:
// object.d
enum union SumType(Ts...) { Ts }
enum union NotNull(Ts...) if (/* any is reference type */) { !null, Ts }
enum union Nullable(T) if (/* T is not reference type and has no opNull */) { null, T }
// main.d
StringOrInt = SumType!(string, int);
StringOrInt stringOrInt;
if (string? s = stringOrInt) { }
else (int? i = stringOrInt) { }
else assert(0);
Also, stringOrInt is int
and stringOrInt !is int
would work.
For an associative array, aa[key]
can return an optional value instead of throwing a RangeError
. That would enable aa[key] ?? d
; and aa[key] ??= value
can set the value only if key
is not present already.
For a (nullable) pointer ptr
, the dereference expression *ptr
can be treated like an optional value if it is followed by ??
or ?.
, and ptr?.member
I see one problem with ?.
vs. .?
, but the rule is easy: The question mark is where the optional is. For .?
, the left operand is a concrete value and the member on the right side might or might not be present. If optional sum types are a common thing, maybe we need ?.?
: lhs?.?member
is: “Give me the member’s value if lhs
isn’t null
and member
is there; otherwise give me null
.”
Sorry for the long post.