Thread overview
Third and Hopefully Last Draft: Primary Type Syntax
Sep 22
IchorDev
Sep 22
Tim
September 21

The obligatory permalink and latest draft

September 22
I recommend that you put it through Grammarly prior to Mike getting it, it'll lessen his workload.

I.e. ``excpetion``

Otherwise it is looking pretty good, and good job on doing the implementation!
September 21

On Saturday, 21 September 2024 at 13:29:05 UTC, Richard (Rikki) Andrew Cattermole wrote:

>

I recommend that you put it through Grammarly prior to Mike getting it, it'll lessen his workload.

I.e. excpetion

Otherwise it is looking pretty good, and good job on doing the implementation!

I gave it two people to proofread and probably one just didn't do it (he said it's good), the other sent me a revised version, which did contain some style suggestions. It's not like I didn't try something.

I'll try Grammerly. Haven't used it in ages.

The implementation has some workarounds that I'd hope won't make it into the compiler. But as Walter pointed out in the Monthly Meeting, it's not obvious the grammar changes won't lead to weird parsings. Therefore, I hope the implementation can give people like you, Paul Backus, and Timon Gehr the opportunity to find holes or, hopefully, find none, which might be enough for Walter to dispel his concerns.

September 22
On 22/09/2024 4:01 AM, Quirin Schroll wrote:
> On Saturday, 21 September 2024 at 13:29:05 UTC, Richard (Rikki) Andrew Cattermole wrote:
>> I recommend that you put it through Grammarly prior to Mike getting it, it'll lessen his workload.
>>
>> I.e. ``excpetion``
>>
>> Otherwise it is looking pretty good, and good job on doing the implementation!
> 
> I gave it two people to proofread and probably one just didn't do it (he said it's good), the other sent me a revised version, which did contain some style suggestions. It's not like I didn't try something.

Yeah you did good, its just that tools are guaranteed to catch stuff like this :)

> The implementation has some workarounds that I'd hope won't make it into the compiler. But as Walter pointed out in the Monthly Meeting, it's not obvious the grammar changes won't lead to weird parsings. Therefore, I hope the implementation can give people like you, Paul Backus, and Timon Gehr the opportunity to find holes or, hopefully, find none, which might be enough for Walter to dispel his concerns.

Grammar stuff like this isn't where I shine, as long as it passes buildkite I'm happy. The text shows you've done your research and put in the effort.

Ideally we'd throw a fuzzer at the parser to verify that it works as expected.

https://llvm.org/docs/LibFuzzer.html

https://johanengelen.github.io/ldc/2018/01/14/Fuzzing-with-LDC.html
September 22

On Saturday, 21 September 2024 at 01:01:22 UTC, Quirin Schroll wrote:

>

The obligatory permalink and latest draft

Not sure what was wrong with the other two drafts, but this one seems equally great. This feature would represent a massive improvement to string mixin code generation, and general language cohesion. Much like how you’re always allowed to use trailing commas in comma-separated lists.

September 22

On Saturday, 21 September 2024 at 01:01:22 UTC, Quirin Schroll wrote:

>

The obligatory permalink and latest draft

The grammar changes look good. I found some new ambiguities, but the implementation seems to always prefer the old meaning, so it should be no problem.

Attributes with optional parens

// deprecated (size_t) x1 = 1; // Syntax error
// align (size_t) x2 = 1; // Syntax error
// package (size_t) x3 = 1; // Syntax error
// extern (size_t) x4 = 1; // Syntax error
struct UDA{}
// @UDA (size_t) x5 = 1; // Syntax error

The attributes deprecated, align, package and extern as well as
UDAs can be followed with optional arguments in parens, like the
deprecation message. These parens are now ambiguous with a basic type in
parens.

The implementation seems to always try to parse the parens as arguments
for the attribute, so it remains backward compatible.

Maybe this could be confusing for the user, when a declaration uses a
type in parens and later an attribute is added.

Scope guards

alias exit = Object;
Object x1;
void main()
{
    scope (exit) x1 = new Object(); // Still a scope guard
    // scope (Object) x2 = new Object(); // Syntax error
    // scope (int) x3 = 3; // Syntax error
    @0 scope (exit) x4 = new Object(); // Declares variable with type exit
}

The first statement is a scope guard with the current grammar. With the
new grammar it could also be a variable declaration of type exit and
storage class scope. The implementation still parses it as a scope
guard, so it remains backward compatible.

The next line could also be a variable declaration, but it is still
parsed as a scope guard. DMD then prints an error, because Object
is not a valid scope identifier. The line with x3 is a syntax error
for the same reason.

The last statement is parsed as a variable declaration, because scope
guards can't have UDAs.

Function literals

auto test1 = function (float){return 0;};
// auto test2 = function (float)(int){return 0;}; // Syntax error

Function literals have an optional return type and optional parameters.
The type float for test1 could be a parameter or a return type in
parens. The implementation always parses the parens as parameters,
so it remains backward compatible.

The second function literal has both a return type and parameters, but
it results in a syntax error, because the parens are parsed as
parameters and no other parens are expected after that.

Anonymous classes

void main()
{
    auto o1 = new class (Object) {};
}

The parens could be constructor arguments or a basic type in
AnonBaseClassList?. The implementation always tries to parse
constructor arguments, which should be fine.

September 23

On Sunday, 22 September 2024 at 10:58:55 UTC, Tim wrote:

>

On Saturday, 21 September 2024 at 01:01:22 UTC, Quirin Schroll wrote:

>

The obligatory permalink and latest draft

The grammar changes look good. I found some new ambiguities, but the implementation seems to always prefer the old meaning, so it should be no problem.

In general, ambiguities are resolved considering Maximum Munch: If the next token can be parsed as part of the entity that the grammar suggests, it will be; only if it can’t, the entity is closed or it’s an error.

>

Attributes with optional parens

// deprecated (size_t) x1 = 1; // Syntax error
// align (size_t) x2 = 1; // Syntax error
// package (size_t) x3 = 1; // Syntax error
// extern (size_t) x4 = 1; // Syntax error
struct UDA{}
// @UDA (size_t) x5 = 1; // Syntax error

Those all fall under Maximum Munch: A parenthesis following any of these attributes constitutes their optional arguments. Attributes with optional arguments are greedy.

I wasn’t even aware of align without argument.

The biggest one is extern because it’s realistically used with the new parsing. If you have a class C, extern (C) is ambiguous – except for Maximum Munch.

>

The attributes deprecated, align, package and extern as well as
UDAs can be followed with optional arguments in parens, like the
deprecation message. These parens are now ambiguous with a basic type in
parens.

The implementation seems to always try to parse the parens as arguments
for the attribute, so it remains backward compatible.

Yes, and it follows MM, which is generally something programmers can rely on.

What can be done about those? For one:

attribute
{
    declaration;
}

Always works at declaration scope, but for statement scope, that’s not possible. Here, I thought one could use an empty UDA list @(), but those are expressly illegal, so one has to resort to using a dummy UDA like @(""). Not nice, but if you insist on expressing something at statement scope in one swath, I guess we can ask the programmer for some concessions.

>

Maybe this could be confusing for the user, when a declaration uses a type in parens and later an attribute is added.

There’s unfortunately little that can be done about it. A better implementation can possibly backtrack and re-interpret what used to be an attribute’s argument as a basic type, but to be honest, that is a lot of work.

>

Scope guards

alias exit = Object;
Object x1;
void main()
{
    scope (exit) x1 = new Object(); // Still a scope guard
    // scope (Object) x2 = new Object(); // Syntax error
    // scope (int) x3 = 3; // Syntax error
    @0 scope (exit) x4 = new Object(); // Declares variable with type exit
}

The big issue with these is, basically, that IMO this must work:

scope (ref void function())* fpp = null;

And it doesn’t.

>

The first statement is a scope guard with the current grammar. With the
new grammar it could also be a variable declaration of type exit and
storage class scope. The implementation still parses it as a scope
guard, so it remains backward compatible.

IIRC, I ran into this and implemented a look-ahead to handle scope guards correctly. The Scope guards utilize magic identifiers, and unlike __traits or pragma, there is no-arg scope.

>

The next line could also be a variable declaration, but it is still
parsed as a scope guard. DMD then prints an error, because Object
is not a valid scope identifier. The line with x3 is a syntax error
for the same reason.

I just fixed that because it was fairly easy to do so. My implementation now looks ahead to see if it’s scope(exit/success/failure) and if it’s not, it tries to parse it as scope attribute.

>

The last statement is parsed as a variable declaration, because scope
guards can't have UDAs.

This is interesting. It’s unlikely that something like that is going to be a real-world problem, though, as it requires two unlikely things: Someone naming a type exit and putting parentheses around it and using a UDA on statement scope.

My fix from above doesn’t change that, but again, it’s really unlikely to be in code anyways.

>

Function literals

auto test1 = function (float){return 0;};
// auto test2 = function (float)(int){return 0;}; // Syntax error

Function literals have an optional return type and optional parameters.
The type float for test1 could be a parameter or a return type in
parens. The implementation always parses the parens as parameters,
so it remains backward compatible.

Yes, for backwards compatibility, it must be done that way. However, this is a MM violation and must be mentioned in the DIP.

>

The second function literal has both a return type and parameters, but
it results in a syntax error, because the parens are parsed as
parameters and no other parens are expected after that.

The second one should be allowed; otherwise some things aren’t expressible. This should work because there’s no valid reason why it can’t:

auto fp = function (ref int function()) () => null;

However, this currently works and must keep behavior:

auto fp = function (ref int function()) => null;
static assert(is(typeof(fp) : typeof(null) function(ref int function())));

The implementation will do a look-ahead to figure out if it’s seeing (Params) FunctionLiteralBody or (Type)(params) FunctionLiteralBody.

It might be noteworthy that this is not a MM violation. There is no other way to parse (Type)(Parameters) FunctionLiteralBody.

>

Anonymous classes

void main()
{
    auto o1 = new class (Object) {};
}

The parens could be constructor arguments or a basic type in
AnonBaseClassList?. The implementation always tries to parse
constructor arguments, which should be fine.

I going to look into this. Probably this is low-priority because a base class or interface name following new class never requires parens. But it should not be an error either. Probably I’ll do the same as with function literals: Look ahead and see if there’s another set of parens. If yes, it’s new class (Type)(Arguments) {}. If not, it’s new class /*implicit Object*/(Arguments) {} because of backwards compatibility.

I’ll commit my stuff probably tomorrow. I can’t do it now, unfortunately.

September 24

On Monday, 23 September 2024 at 19:03:47 UTC, Quirin Schroll wrote:

>

I’ll commit my stuff probably tomorrow. I can’t do it now, unfortunately.

Done. And I updated the DIP draft to include the new Maximum Munch exceptions.

I did everything as suggested in my post, except for the anonymous class stuff. There, I was mistaken. The constructor arguments go first, then the base class / interfaces follow:

new class ConstructorArgs? AnonBaseClassList? AggregateBody

This means there is no real issue. If someone writes new class (Object), that’s a compile error today (if Object refers to a type, which it usually does) as parsing takes (Object) as the argument list, and it will stay one. Someone who wants to surround a the first base class / interface with parentheses has to use an explicit empty argument list, e.g. new class () (Object) {}.


Please review the latest draft here.