TL;DR: If we make TypeCtor
optional in the production rule BasicType
→ TypeCtor
(
Type
)
, the D type grammar can be improved and we have a way to solve the 14-year-old issue 2753.
What is an expression syntax?
In D, as in most programming languages, there is the concept of a primary expression, that, simply put, lets you put an arbitrary expression in parentheses giving you an expression again. Without it, (a + b) * c
wouldn’t even be expressible.
The scheme is like this:
Expression
→ PrimaryExpression
PrimaryExpression
→ (
Expression
)
Imagine you could only use parentheses where they’re needed and (a * b) + c
would be an error, since a * b + c
is in no way different. This is how D’s types behave. The type grammar is quite a mouthful and I reworked it in the past to make it somewhat understandable for an outsider.
D types almost have an expression syntax
There’s one particular interaction that makes D’s types almost have a primary expression:
Type
→ BasicType
BasicType
→ TypeCtor
(
Type
)
This means, a Type
can be (among other options) just a BasicType
, and a BasicType
can be (among other options) a TypeCtor
followed by a Type
in parentheses. If we make the the TypeCtor
optional, we get first-class type expression syntax. We should do this today and – taking advantage of it – do even more. (If you have experience with the parser, please let me know if this would be a difficult change. To me, it doesn’t seem like it would.)
Does it solve anything?
Yes. This isn’t just an academic, puritan, inner-monk-pleasing exercise. D’s type syntax doesn’t let you express types that are 100 % valid and useful and doesn’t let you clarify your intentions! Have you ever taken the address of a function that returns by reference? Normally, the function pointer type is written the same as a function declaration, just with the function name replaced by the function
keyword:
bool isEven (int i) => i % 2 == 0;
bool function(int i) isEvenPtr = &isEven; // ok
ref int refId (ref int i) => i;
ref int function(ref int i) refIdPtr = &refId; // Doesn’t parse!
You can declare refIdPtr
with auto
because the type of &refId
is 100 % well-formed, it’s just a syntax issue spelling it out in code; if you pragma(msg, typeof(refIdPtr))
you get:
int function(ref int i) ref
Interesting where the ref
is, isn’t it? Go ahead, try using that instead of auto
. It doesn’t parse! And frankly, it shouldn’t; it’s confusing to read.
The reason is that the grammar works by max munch and we don’t have the type in isolation, it’s part of a declaration. The ref
is parsed as a storage class for the declaration: It makes refIdPtr
a reference to an object of type int function(ref int)
– or, better, it would if it could. In this context, references aren’t allowed. Additionally, the type and value category of &refId
don’t fit the declaration, but the parser doesn’t even get there.
One way to do it is to use an alias:
alias FP = ref int function(ref int);
FP refIdPtr = &refId;
Why, then, does the alias definition of FP
parse? Essentially because the alias declaration rules can boil down to this:
AliasDeclaration
→ alias
Identifier
=
ref
Type
Simply put, alias declaration rules accept it as a special case.
We can use auto
, so what’s the deal? The deal is that there are cases where auto
cannot be used, e.g. in function parameter lists. A function with a function pointer parameter of type FP
cannot be declared without an alias:
void takesFP(ref int function(int) funcPtr) { pragma(msg, typeof(funcPtr)); }
This compiles, but doesn’t work as intended: The parameter funcPtr
is of type int function(int)
and taken by reference. Max munch reads ref
and sees a ParameterStorageClass
, then it sees the Type
int function(int)
. That’s perfectly valid and one could want that.
Here’s the catch: We can solve a lot of syntax issues if we not only make TypeCtor
optional (as suggested initially), but also allow ref
as the initial part of a Type
if followed by an appropriate TypeSuffix
: the function
and delegate
ones. (Here is the precise grammar change.)
This means, not only can you put types in parentheses to clarify your intent, it meaningfully affects parsing:
void takesFP((ref int function(int)) funcPtr) { } // NEW! Doesn’t work yet.
Now, ref
cannot be interpreted as a parameter storage class! It must be the first token of a Type
, which necessitates a function or delegate type, but that’s what we indeed have.
This also applies to return types:
ref int function(int) makesFPbyRef() { }
(ref int function(int)) makesByRefFP() { }
According to max munch parsing, the first function returns an object of type int function(int)
by reference, which is a function pointer that returns by value.
The second function returns an object of type ref int function(int)
by value, which is a function pointer that returns by reference. As soon as the parser sees the opening parenthesis, it must parse a type.
The first of those should be deprecated in favor of this:
ref (int function(int)) makesFPbyRef() { }
The same goes for parameters:
void takesFP(ref int function(int) funcPtr) // Make this an error …
void takesFP(ref (int function(int)) funcPtr) // … and require this!
This is in the same spirit as disallowing the nested lambdas => { }
. Together with that, we should deprecate applying type constructors to function and delegate types without clarification:
const Object function() f0; // Make this an error …
const (Object function()) f1; // … and require this!
(const Object) function() f2; // To differentiate from this.
const(Object) function() f3; // (Same type as f2)
We should do the same for type constructors as storage classes for non-static member function when used in front of the declaration:
struct S
{
const void action() { } // Make this an error …
void action() const { } // … and require this!
}
D requires ref
on the front, why should we have an issue with requiring that type constructors go to the end?
Are there unintended side effects?
There would be another way to express const(int)
: (const int)
. Because const(int)
is everywhere, it cannot be deprecated, and that’s fine. In my opinion, (const int)
is better in every regard. A newcomer would probably guess correctly that const(int)[]
is a mutable slice of read-only integers, but it’s no way as clear as (const int)[]
. If we imagine D some years in the future, when everyone uses “modern-style types,” i.e. (const int)[]
, seeing const(int)[]
probably looks weird to you: const
normally applies to everything that trails it, but here, because const
is followed by an opening parenthesis, it applies precisely to what is in there, nothing more.