Jump to page: 1 25  
Page
Thread overview
D needs a type expression syntax
May 04, 2023
Quirin Schroll
May 05, 2023
zjh
May 06, 2023
Quirin Schroll
May 07, 2023
Walter Bright
May 08, 2023
Quirin Schroll
May 13, 2023
Nick Treleaven
May 17, 2023
Nick Treleaven
May 17, 2023
Quirin Schroll
May 17, 2023
Quirin Schroll
May 07, 2023
Mike Parker
May 05, 2023
Basile B.
May 05, 2023
Paul Backus
May 07, 2023
Walter Bright
May 06, 2023
Quirin Schroll
May 07, 2023
Walter Bright
May 07, 2023
Timon Gehr
May 11, 2023
Timon Gehr
May 11, 2023
Dukc
May 11, 2023
Basile B.
May 12, 2023
Nick Treleaven
May 12, 2023
Joao Lourenço
May 13, 2023
Max Samukha
May 13, 2023
Max Samukha
May 12, 2023
Quirin Schroll
May 09, 2023
Basile B.
May 10, 2023
Quirin Schroll
May 07, 2023
Timon Gehr
May 05, 2023
Nick Treleaven
May 05, 2023
Basile B.
May 06, 2023
Quirin Schroll
May 06, 2023
Quirin Schroll
May 10, 2023
Nick Treleaven
May 10, 2023
Nick Treleaven
May 07, 2023
Walter Bright
May 07, 2023
David Gileadi
May 10, 2023
Walter Bright
Re: [OT] D needs a type expression syntax
May 10, 2023
David Gileadi
May 09, 2023
Quirin Schroll
May 09, 2023
Tim
May 10, 2023
Quirin Schroll
May 10, 2023
Tim
May 04, 2023

TL;DR: If we make TypeCtor optional in the production rule BasicTypeTypeCtor(Type), the D type grammar can be improved and we have a way to solve the 14-year-old issue 2753.


What is an expression syntax?

In D, as in most programming languages, there is the concept of a primary expression, that, simply put, lets you put an arbitrary expression in parentheses giving you an expression again. Without it, (a + b) * c wouldn’t even be expressible.

The scheme is like this:
ExpressionPrimaryExpression
PrimaryExpression(Expression)

Imagine you could only use parentheses where they’re needed and (a * b) + c would be an error, since a * b + c is in no way different. This is how D’s types behave. The type grammar is quite a mouthful and I reworked it in the past to make it somewhat understandable for an outsider.

D types almost have an expression syntax

There’s one particular interaction that makes D’s types almost have a primary expression:
TypeBasicType
BasicTypeTypeCtor(Type)

This means, a Type can be (among other options) just a BasicType, and a BasicType can be (among other options) a TypeCtor followed by a Type in parentheses. If we make the the TypeCtor optional, we get first-class type expression syntax. We should do this today and – taking advantage of it – do even more. (If you have experience with the parser, please let me know if this would be a difficult change. To me, it doesn’t seem like it would.)

Does it solve anything?

Yes. This isn’t just an academic, puritan, inner-monk-pleasing exercise. D’s type syntax doesn’t let you express types that are 100 % valid and useful and doesn’t let you clarify your intentions! Have you ever taken the address of a function that returns by reference? Normally, the function pointer type is written the same as a function declaration, just with the function name replaced by the function keyword:

bool isEven  (int i) => i % 2 == 0;
bool function(int i) isEvenPtr = &isEven; // ok

ref int refId   (ref int i) => i;
ref int function(ref int i) refIdPtr = &refId; // Doesn’t parse!

You can declare refIdPtr with auto because the type of &refId is 100 % well-formed, it’s just a syntax issue spelling it out in code; if you pragma(msg, typeof(refIdPtr)) you get:

int function(ref int i) ref

Interesting where the ref is, isn’t it? Go ahead, try using that instead of auto. It doesn’t parse! And frankly, it shouldn’t; it’s confusing to read.

The reason is that the grammar works by max munch and we don’t have the type in isolation, it’s part of a declaration. The ref is parsed as a storage class for the declaration: It makes refIdPtr a reference to an object of type int function(ref int) – or, better, it would if it could. In this context, references aren’t allowed. Additionally, the type and value category of &refId don’t fit the declaration, but the parser doesn’t even get there.

One way to do it is to use an alias:

alias FP = ref int function(ref int);
FP refIdPtr = &refId;

Why, then, does the alias definition of FP parse? Essentially because the alias declaration rules can boil down to this:
AliasDeclarationalias Identifier = ref Type
Simply put, alias declaration rules accept it as a special case.

We can use auto, so what’s the deal? The deal is that there are cases where auto cannot be used, e.g. in function parameter lists. A function with a function pointer parameter of type FP cannot be declared without an alias:

void takesFP(ref int function(int) funcPtr) { pragma(msg, typeof(funcPtr)); }

This compiles, but doesn’t work as intended: The parameter funcPtr is of type int function(int) and taken by reference. Max munch reads ref and sees a ParameterStorageClass, then it sees the Type int function(int). That’s perfectly valid and one could want that.

Here’s the catch: We can solve a lot of syntax issues if we not only make TypeCtor optional (as suggested initially), but also allow ref as the initial part of a Type if followed by an appropriate TypeSuffix: the function and delegate ones. (Here is the precise grammar change.)

This means, not only can you put types in parentheses to clarify your intent, it meaningfully affects parsing:

void takesFP((ref int function(int)) funcPtr) { } // NEW! Doesn’t work yet.

Now, ref cannot be interpreted as a parameter storage class! It must be the first token of a Type, which necessitates a function or delegate type, but that’s what we indeed have.

This also applies to return types:

 ref int function(int)  makesFPbyRef() { }
(ref int function(int)) makesByRefFP() { }

According to max munch parsing, the first function returns an object of type int function(int) by reference, which is a function pointer that returns by value.
The second function returns an object of type ref int function(int) by value, which is a function pointer that returns by reference. As soon as the parser sees the opening parenthesis, it must parse a type.

The first of those should be deprecated in favor of this:

ref (int function(int)) makesFPbyRef() { }

The same goes for parameters:

void takesFP(ref  int function(int)  funcPtr) // Make this an error …
void takesFP(ref (int function(int)) funcPtr) // … and require this!

This is in the same spirit as disallowing the nested lambdas => { }. Together with that, we should deprecate applying type constructors to function and delegate types without clarification:

 const Object  function()  f0; // Make this an error …
const (Object  function()) f1; // … and require this!
(const Object) function()  f2; // To differentiate from this.
 const(Object) function()  f3; // (Same type as f2)

We should do the same for type constructors as storage classes for non-static member function when used in front of the declaration:

struct S
{
    const void action() { } // Make this an error …
    void action() const { } // … and require this!
}

D requires ref on the front, why should we have an issue with requiring that type constructors go to the end?

Are there unintended side effects?

There would be another way to express const(int): (const int). Because const(int) is everywhere, it cannot be deprecated, and that’s fine. In my opinion, (const int) is better in every regard. A newcomer would probably guess correctly that const(int)[] is a mutable slice of read-only integers, but it’s no way as clear as (const int)[]. If we imagine D some years in the future, when everyone uses “modern-style types,” i.e. (const int)[], seeing const(int)[] probably looks weird to you: const normally applies to everything that trails it, but here, because const is followed by an opening parenthesis, it applies precisely to what is in there, nothing more.

May 05, 2023

On Thursday, 4 May 2023 at 15:40:20 UTC, Quirin Schroll wrote:

>

TL;DR: If we make TypeCtor optional in the production rule BasicTypeTypeCtor(Type), the D type grammar can be improved and we have a way to solve the 14-year-old issue 2753.

That's a great idea. You need a dip, and of course, it's best to have a tool that can automatically modify incorrect formats. It's also best to have a backup.

May 05, 2023

On Thursday, 4 May 2023 at 15:40:20 UTC, Quirin Schroll wrote:

>

TL;DR: If we make TypeCtor optional in the production rule BasicTypeTypeCtor(Type), the D type grammar can be improved and we have a way to solve the 14-year-old issue 2753.

[...]

Wouldn't making ref a TypeCtor solve the issue more simply ?

While using parens is the natural way to disambiguate two different constructs, it seems that the only case that causes a problem is that we cannot express "return by ref" in function types.

With ref as a TypeCtor:

// a function that returns a function ptr that returns an int by ref
ref(int) function(int) makesFPbyRef();
// function ptr as a parameter type thar returns an int by ref
void takesFP(ref(int) function() funcPtr)
// function ptr as a ref parameter type thar returns an int by ref
void takesFP(ref ref(int) function() funcPtr)
// normal variable type, can be a NOOP or a semantic error
struct S { ref(int) member; }

That being said, I understand that your proposal is not only about the ref issue but to make the grammar for Type nicer.

Then, what is following will be a bit off-topic but, I'd like to bring the fact that it might be desirable to keep parens for builtin tuples, although it's been a while that the topic was not discussed here. In particular a TypeTuple made of a single Type would be hard to express properly.

May 05, 2023

On Friday, 5 May 2023 at 02:30:32 UTC, Basile B. wrote:

>

On Thursday, 4 May 2023 at 15:40:20 UTC, Quirin Schroll wrote:

>

TL;DR: If we make TypeCtor optional in the production rule BasicTypeTypeCtor(Type), the D type grammar can be improved and we have a way to solve the 14-year-old issue 2753.

[...]

Wouldn't making ref a TypeCtor solve the issue more simply ?

The main problem with making ref a TypeCtor is that it would allow semantically-invalid types like ref(int)[] and Tuple!(ref(int), ref(int)) to parse.

May 05, 2023

On Thursday, 4 May 2023 at 15:40:20 UTC, Quirin Schroll wrote:

>

ref int refId (ref int i) => i;
ref int function(ref int i) refIdPtr = &refId; // Doesn’t parse!

You can declare `refIdPtr` with `auto` because the type of `&refId` is 100 % well-formed, it’s just a syntax issue spelling it out in code; if you `pragma(msg, typeof(refIdPtr))` you get:
```d
int function(ref int i) ref

Interesting where the ref is, isn’t it? Go ahead, try using that instead of auto. It doesn’t parse! And frankly, it shouldn’t; it’s confusing to read.

It could be trailing return type syntax:

function(ref int i) -> ref int refIdPtr;

Then we don't need the alias A = ref T rule, and we can keep parentheses for expressions (including tuples in future).

Trailing return is much easier to read, when you read a function signature the return type is not as salient as the name or the parameter list (particularly as there's no overloading by return type). The return type gets in the way, especially when its a complex type:

some_package.Foo!(const(ElementType!R)[]) oh_hi(long parameter, list following)

Some people also recommend putting attributes after the parameter list rather than before the declaration for this reason.

Imagine in an IDE seeing a list of function overloads, if they were shown with trailing return syntax it would be much easier to find the overload you want.

>
 const Object  function()  f0; // Make this an error …
const (Object  function()) f1; // … and require this!
(const Object) function()  f2; // To differentiate from this.
 const(Object) function()  f3; // (Same type as f2)

I don't get what the issue is with f0 and f3, aren't they clear enough?

>

We should do the same for type constructors as storage classes for non-static member function when used in front of the declaration:

struct S
{
    const void action() { } // Make this an error …
    void action() const { } // … and require this!
}

D requires ref on the front, why should we have an issue with requiring that type constructors go to the end?

Amen, we should have done this long ago IMO. A needless hindrance for those learning D.

May 05, 2023

On Friday, 5 May 2023 at 15:11:50 UTC, Nick Treleaven wrote:

>

On Thursday, 4 May 2023 at 15:40:20 UTC, Quirin Schroll wrote:

>

[...]

It could be trailing return type syntax:

function(ref int i) -> ref int refIdPtr;

Then we don't need the alias A = ref T rule, and we can keep parentheses for expressions (including tuples in future).

[...]

IMO the big problem here, so with const is that const cannot be put for an hidden parameter...

That's essentially why this strange syntax exists.

May 06, 2023

On Friday, 5 May 2023 at 02:30:32 UTC, Basile B. wrote:

>

On Thursday, 4 May 2023 at 15:40:20 UTC, Quirin Schroll wrote:

>

TL;DR: If we make TypeCtor optional in the production rule BasicTypeTypeCtor(Type), the D type grammar can be improved and we have a way to solve the 14-year-old issue 2753.

[...]

Wouldn't making ref a TypeCtor solve the issue more simply?

Technically, the answer is yes, but practically and realistically, the answer isn’t just no, it is: absolutely not.

>

While using parens is the natural way to disambiguate two different constructs, it seems that the only case that causes a problem is that we cannot express "return by ref" in function types.

That was the starter. I looked at various ways you could express that. I’m convinced I once wrote a forum post asking for opinions on a syntax to propose, but now I cannot find it anymore (and that makes me doubt my memory). At some point, it randomly occurred to me that making D’s type syntax an expression syntax might naturally solve this – and working it out, it turns out it does.

>

With ref as a TypeCtor:

// a function that returns a function ptr that returns an int by ref
ref(int) function(int) makesFPbyRef();
// function ptr as a parameter type thar returns an int by ref
void takesFP(ref(int) function() funcPtr)
// function ptr as a ref parameter type thar returns an int by ref
void takesFP(ref ref(int) function() funcPtr)
// normal variable type, can be a NOOP or a semantic error
struct S { ref(int) member; }

That being said, I understand that your proposal is not only about the ref issue but to make the grammar for Type nicer.

It has ref as its main motivation.

First, there’s a difference between making ref a TypeCtor and making it a type constructor. The first is grammar/syntax-only and the other is a semantic language construct. I’ll assume you mean both; listing it as a TypeCtor without actually following up semantically would definitely be possible, but very confusing.

I’m proposing a grammar change that is mere addition. The deprecations I suggested are entirely optional and not required to make the syntax work. Also, I’m proposing a grammar change, no semantics change.

Looking at struct S { ref(int) member; }, it really seems you propose that ref be a full-fledged type constructor: Look no further than C++ – they have a reference type constructor and in C++, essential (core!) language and library constructs don’t work with references: Pointer to a reference and (as a logical consequence) an array of references are not well-formed as a type. A non-compositional type system is the last thing D needs.

I’ve thought long about this. ref(Type) as a syntax construct doesn’t get you far.

void takesFP(ref ref(int) function() funcPtr)

This is akin to member functions written like this:

const const(int) f() { … }

It’s not a nice read.


>

Then, what is following will be a bit off-topic but, I'd like to bring the fact that it might be desirable to keep parens for builtin tuples, although it's been a while that the topic was not discussed here. In particular a TypeTuple made of a single Type would be hard to express properly.

This is no issue at all. With parentheses for tuple syntax, the 1-tuple for values must be expressed as (value,) because (value) is a plain expression, so it would just be consistent to require the trailing comma with types as well: (int) is int, but (int,) is a 1-tuple with an int component.

For the record: I’ve objected to parentheses for tuples from the beginning. Parentheses are for grouping, not for constructing. My take on the tuples front was to look at static arrays as the special case of tuples that are homogeneous (the same type iterated), and generalize them for heterogeneous components: int[2] is a shorthand for [int, int]; and [int, string] is a heterogeneous tuple. You build such a tuple like you build an array: [1, "abc"]. Indexing with a run-time index is supported by Design by Introspection: A [int, immutable int] can give you ref const(int) access, a [int, long] can give you long access by value, but no ref access.

May 06, 2023

On Friday, 5 May 2023 at 01:23:15 UTC, zjh wrote:

>

On Thursday, 4 May 2023 at 15:40:20 UTC, Quirin Schroll wrote:

>

TL;DR: If we make TypeCtor optional in the production rule BasicTypeTypeCtor(Type), the D type grammar can be improved and we have a way to solve the 14-year-old issue 2753.

That's a great idea. You need a dip, and of course, it's best to have a tool that can automatically modify incorrect formats. It's also best to have a backup.

I have a very stupid question now: Why does a minor grammar change require a DIP? How do people determine what is an enhancement and what requires a DIP?

May 07, 2023
On 07/05/2023 3:55 AM, Quirin Schroll wrote:
> I have a very stupid question now: Why does a minor grammar change require a DIP? How do people determine what is an enhancement and what requires a DIP?

Minor grammar change != minor change.

The implications of it as suggested in this thread is its a lot more involved than a tiny adjustment, which means DIP in this case.
May 06, 2023

On Friday, 5 May 2023 at 15:11:50 UTC, Nick Treleaven wrote:

>

On Thursday, 4 May 2023 at 15:40:20 UTC, Quirin Schroll wrote:

>
ref int refId   (ref int i) => i;
ref int function(ref int i) refIdPtr = &refId; // Doesn’t parse!

You can declare refIdPtr with auto because the type of &refId is 100 % well-formed, it’s just a syntax issue spelling it out in code; if you pragma(msg, typeof(refIdPtr)) you get:

int function(ref int i) ref

Interesting where the ref is, isn’t it? Go ahead, try using that instead of auto. It doesn’t parse! And frankly, it shouldn’t; it’s confusing to read.

It could be trailing return type syntax:

function(ref int i) -> ref int refIdPtr;

Then we don't need the alias A = ref T rule, and we can keep parentheses for expressions (including tuples in future).

I agree that trailing return types (TRT) are really well readable – in a language designed around them, which D isn’t. I find it irritating that the -> Type is followed by the variable name. In languages that have them, variables are declared by a keyword, then follows the name, then a separator and then comes the type. In that order, TRT makes sense.

>

Trailing return is much easier to read, when you read a function signature the return type is not as salient as the name or the parameter list (particularly as there's no overloading by return type). The return type gets in the way, especially when its a complex type:

some_package.Foo!(const(ElementType!R)[]) oh_hi(long parameter, list following)

Some people also recommend putting attributes after the parameter list rather than before the declaration for this reason.

I do that. I’ll never understand why people use them in front. And why unittest don’t work with trailing attributes. It’s inconsistent and annoying.

I put parts of a declaration in their own line when they’re getting big: Big return type → own line. 1 big (template) parameter or lots of them → each its own line. Constraints and contracts → own line (always). It’s another thing what a documentation generator makes of them.

some_package.Foo!(const(ElementType!R)[])
oh_hi(R, T)(
    T parameter,
    ...
)
const @safe pure
if (T.sizeof > 0)
in (true)
out(r; true)
{
    …
}
>

Imagine in an IDE seeing a list of function overloads, if they were shown with trailing return syntax it would be much easier to find the overload you want.

Could be. When looking through overloads, the important part is the parameters.

TRT are are a much greater language change, though, compared to making a token at a very specific place optional. Their grammar must really be implemented and maintained separately. Also, we get the same as C++: Two ways to declare a function.

> >
 const Object  function()  f0; // Make this an error …
const (Object  function()) f1; // … and require this!
(const Object) function()  f2; // To differentiate from this.
 const(Object) function()  f3; // (Same type as f2)

I don't get what the issue is with f0 and f3, aren't they clear enough?

With f0: Does const apply to the return value or the variable f0? I’m quite adept with D and I’m only 95% sure it applies to f0. 95 is not 100, and frankly, when it comes to syntax, it should be 100.
With f3: Less of an issue, but basically the same. To someone new to D, it might be odd that const applies to something different if a parenthesis follows.
f1 and f2 are beyond misapprehension. If you show any programmer what const applies to (return type or the declared variable), they’ll assume it’s a trick question because the answer is plain obvious.

« First   ‹ Prev
1 2 3 4 5