D Language Foundation October 2024 Monthly Meeting Summary

March 07

Posted by Mike Parker

Permalink

Mike Parker

Permalink

The D Language Foundation's monthly meeting for October 2024 took place on Friday the 11th. It lasted about an hour and twenty minutes, though there was a long discussion at the end. I was unable to attend, so Razvan ran the meeting and Dennis recorded it for me. Quirin Schroll attended to discuss his Primary Type Syntax DIP.

The Attendees

The following people attended:

Walter Bright
Rikki Cattermole
Jonathan M. Davis
Timon Gehr
Martin Kinkelin
Dennis Korpel
Mathias Lang
Átila Neves
Razvan Nitu
Quirin Schroll
Adam Wilson

The Summary

Primary Type Syntax DIP

Quirin had previously joined our August monthly meeting to discuss his Primary Type Syntax DIP. At the time, it was in its second round of feedback in the DIP Development forum and he had a working implementation, but he wasn't sure if it was ready to move forward with Formal Assessment.

In that meeting, Walter had raised concerns about potential grammar ambiguities related to the proposed use of parentheses. He didn’t want D to suffer from a problem that C++ had, where in some contexts something could be either an expression or a type.

At the time of this meeting, the DIP was on its fourth draft.

As part of the forum feedback, someone had tried their best to find parsing issues and uncovered a few issues in the implementation. For the most part, the fix was to do nothing. The issues didn't cause anything weird to happen. They resulted in parse errors, which meant the programmer would have to alter their code to express what they wanted differently.

One example was scope. We have both the single attribute and the scope guard, which is the keyword plus parentheses. He mentioned align and extern as other examples. In each of these cases, you could just rearrange the keywords to resolve the error.

The sort of weirdness that happened in C++, where you could put parentheses around the identifier that was being declared, couldn't happen here. It would never parse in D.

Walter said there was an ambiguity in the C grammar where (identifier) could initiate two completely different parses.

Quirin said that was impossible in D because when the parser tried to match a declaration, it preferred to parse it as a type. If you wanted the parser to treat it as an identifier, it couldn't be in parentheses. Whatever was in parentheses could never be the declared object.

Walter said that one ambiguity required a symbol table to resolve in C and C++. He avoided it in D by requiring the cast keyword to disambiguate it.

Quirin said his DIP didn't touch any of that. It did touch cast indirectly in that it affected types, and you could have a type in a cast, but it was completely inside the parentheses of the cast.

He said cast was actually pretty nice because you could have not only a basic type but a general type inside the parentheses. There was no problem with that. The only parsing issues were with attributes that had both a standalone form and a form with parentheses.

For scope guards, he implemented a look ahead that recognized success, failure, and exit. If the parser saw those in the parentheses, then it knew it wasn't dealing with a type, but a scope guard. If instead, it saw anything else in the parentheses, it parsed it as a type. It couldn't be a scope guard in that case.

Walter asked what would happen if there were a type named exit. Quirin said the parser would prefer to treat scope(exit) as a scope guard in that case.

He said that in the current implementation, when the opening parenthesis was found after scope, it was then treated as a scope guard. Any unexpected identifier inside the parentheses resulted in a parse error. His implementation instead did a look ahead before deciding it was a scope guard because with this DIP it could instead be a type. So it was more lenient.

He added that if you had a type called exit in current D, that wasn't a declaration. In his implementation, it looked like a declaration, but because it's one of the three possible identifiers in a scope guard, it remained a scope guard.

Walter said it still sounded ambiguous to him.

Quirin agreed it was ambiguous, but reiterated that his implementation resolved the ambiguity by treating it as a scope guard if it found success, failure, or exit in parentheses following scope. If it wasn't one of those, then hopefully it was a type. The semantic analysis would find out.

Walter suggested another way to deal with it would be to look beyond the closing parenthesis to see what comes after. Quirin agreed the implementation could be smarter. Walter said it could see if the rest of it parsed as a declaration or an expression. Quirin said that was much more work and more expensive. Walter agreed.

Quirin said his implementation wasn't trying to be perfect. It was like a proof of concept. Walter said he wasn't faulting him for it. He was just trying to think of ways to resolve the ambiguities.

Timon said you couldn't resolve the ambiguities just by looking further ahead. In the chat, he gave the example of scope(exit) foo, which could be a paren-free call to foo. You could always parse it as a declaration and do something else if the semantic analysis figured out that it wasn't.

Quirin asked if there was something that could parse as a declaration and then turn out not to be one. Timon said there were so many kinds of declarations that the answer was probably "yes". The foo example, without Quirin's disambiguation, could parse as a declaration but was actually a scope guard.

Quirin said choosing to look ahead only for the scope guard identifiers was the right thing to do. If instead, the parser checked to see if exit was a valid type, then that would change the meaning of existing D code. And that was not okay.

If you wanted your type named exit in parentheses, you could write something between it and scope and it would be okay. It was that easy.

Walter said these ambiguous cases should be clearly identified in the DIP. Quirin said he didn't put it in the DIP because the DIP specified maximal munch. He thought it wasn't noteworthy whenever maximal munch did a great job at disambiguation because the default behavior did what was intended. Some maximal munch exceptions were described in detail in the DIP.

Walter didn't think maximal munch solved this. The implementation was just deciding it was a scope guard when it saw exit in the parentheses.

Rikki noted that when trying to solve something like this, it also still had to work with a parser for a text editor or an IDE. If they couldn't do it, then it wasn't a good solution. Basically, just limit the cleverness.

Quirin said he had included something about syntax highlighting in one of the drafts, but couldn't remember if it was still there. He talked a bit about how some requirements for highlighters were difficult in D without semantic analysis, and some examples involving simple vs. complex highlighters. In short, in answer to Rikki's question, he said simple syntax highlighters should have no problem. They would recognize the scope keyword and there parentheses, and then do what they did.

He then mentioned extern. For its use as a linkage attribute, e.g., extern(C), the specification only required C. Everything else, including Windows, was implementation-defined. Walter said the idea was that any identifier could appear within the parentheses. Quirin said that arbitrary tokens could also be between them.

There was some discussion then about specific details of what the parser currently accepted and what it rejected with regards to extern. Then Martin said he would like to see a requirement that when specifying a linkage attribute, an opening parenthesis must be required immediately following extern, with nothing in between, to more clearly distinguish it from the extern storage class.

Quirin mentioned an earlier DIP draft had included text that made whitespace between a type constructor and the opening parenthesis significant. He ended up removing it because it was too hard for simple syntax highlighters, as they didn't like semantic whitespace. He then went into some detail about it to make the point that potential problems that existed with const, whitespace, parentheses, and types didn't really exist with extern.

He said the issue with parentheses starting a basic type was not specific to the Primary Types DIP. Any DIP proposing tuples had the same issue. Timon disagreed because tuples were signified by commas, so his tuples proposal didn't have that problem. He said it could be solved in the way Quririn was describing, but his proposal didn't do that.

Walter said that scope and extern could be specified such that they couldn't be followed by a type. Dennis thought that was best. Using exit, failure, and success to disambiguate would mean we couldn't add something new without breaking anything that used the new thing as a type name.

Walter agreed and said the syntax of scope and extern was specifically designed to allow anything in the parentheses so that we could extend them in the future. He suggested the implementation be changed so that when it sees an opening parenthesis after extern or scope, then it should never treat anything in the parentheses as a type. That would then be forward-compatible. He thought it a reasonable solution.

Quirin said that scope and extern were different in that regard. scope was always four tokens long, with three of the tokens nailed down, and the remaining token was always one of three possible identifiers: exit, success, and failure. For all intents and purposes, a scope guard was effectively a single token. It was easy to recognize and distinguish from something that wasn't a scope guard. It would only ever have an identifier in the parentheses.

Walter repeated that the syntax was intended to allow anything in the parentheses to enable future extensions. Quirin gave the following example:

scope (ref int function()) fp = null;

He asked how he would then distinguish this from a scope guard.

Walter repeated that it was all about allowing for future extensions. Quirin said he couldn't imagine wanting anything in a scope guard other than an identifier.

Walter said he couldn't think of an example at the moment, but that wasn't the point. He didn't think it was a terrible limitation to say that a scope guard that didn't have a type in the parentheses was its own separate entity. He couldn't see how that harmed Quirin's proposal or would compromise it.

Quirin said it probably wouldn't.

Timon said the following did not work in his unpacking branch:

extern (a,b) = tuple(1,2);

He had no problem with scope, only extern, because it just ate everything.

Quirin said the question to answer was what the programmer could do when they wanted an extern or scope variable instead of a linkage attribute or a scope guard. He thought the typical solution with scope would just be to rely on inference. You could alias the type, or you could put something in between the scope and the opening parenthesis. Even a comment or a UDA would work. The scope would then be unambiguous because it was no longer followed by an opening parenthesis. But it would only work in declaration scope, not statement scope.

He asked if you could have meaningful extern variables in statement scope. Martin said you definitely couldn't initialize those. They were for forward declarations. Quirin said he didn't know how to resolve it for local variables. Martin said he had no idea if it was feasible to declare a global extern variable in function scope.

Jonathan said that made no sense semantically. extern should be at the global level. Quirin said it was allowed. He had tried it. Maybe we should disallow it. Jonathan said at the very least the parser should ignore it. It made no sense.

Martin agreed that made no sense. He said he wouldn't like to see a special case for scope here. Quirin's current approach to disambiguate the scope guard was simple. It should be fine. If we did need to extend scope guards in the future, we could then change the implementation to disambiguate differently as needed.

Walter preferred the simpler way: just have scope( mean it was a scope guard.

Quirin came back to his previous example:

scope (ref int function()) fp = null;

He said a good compiler would decide this wasn't a scope guard. It wasn't explicitly allowed, but it could be parsed. The programmer could then get an error, where the compiler was saying, "Hey, I know what this is, I know what you want, but because we want flexibility and extendability with scope guards, I need you to explicitly do this to let me know you really didn't want to write a scope guard."

He asked Walter what the this in the message should be. What should the programmer put between the scope and the opening parenthesis?

Walter said the solution was to use an alias:

alias X = ref int function();
scope X fp = null;

Átila agreed. Jonathan said that as ugly as it was, there were already cases where we were forced to do that. Quirin said the one solution he had found was to add a UDA between scope and the type:

scope @0 (ref int function()) fp = null;

The UDA could have any meaning, so it wasn't 100% equivalent.

Átila noted that the opening of the DIP said the goal was that "every type expressible by D’s type system also has a representation as a sequence of D tokens". But he didn't see anything about function types anywhere in the DIP.

Quirin said he hadn't put much thought into function types, as there weren't many places where you could actually use them. Walter said he thought the whole point of the DIP was function types. Quirin clarified that it was function pointer types and delegate types that return a reference or have a linkage that isn't extern(D).

He said the whole point was that you could write (ref int function(args) @someAttributes), and this was the type. If you had something like this as parameters or return return types, it should be a fully formed type. But if you saw it in an error message, you couldn't copy-paste it into the code because it wouldn't parse. He said that function types, on the other hand, were a weird artifact of how they were implemented in the compiler. That was how he saw them, anyway.

Átila said they could probably be inferred in templates as well. The reason he was talking about function types was because they were in C++, but hardly anyone used them except in templates. He and Quirin then had a bit of a discussion about how function types were used in C++.

Quirin said he could try to extend the DIP to include function types. He didn't know how hard it would be. He talked about some of the difficulties he'd had with linkage in the current draft. He and Átila talked a bit about what the implementation might look like.

Walter said the DIP needed a full enumeration of all the ambiguous cases and how they were resolved. Other people wanting to implement the language would need a precise guide for how the disambiguation was handled.

Quirin said he could do that. We'd actually want a simpler, less lenient disambiguation protocol if we wanted to stay more flexible for the future. A smart implementation that could diagnose errors would be more complicated.

Walter said a simpler implementation was definitely something to consider for people who were going to write their own formatters and things that required parsing. Simplicity of disambiguation was important.

Razvan suggested Quirin wrap this discussion up in the interest of time. He asked if Quirin had gotten any conclusions about the next steps.

Quirin said that Walter had convinced him on the scope guard stuff, but he still wanted the implementation to recognize what the user intended and suggest a way to rewrite ambiguous code.

Copy constructor generation

Razvan said that Walter had proposed in a forum thread a kind of extra syntax for move constructors to distinguish them from normal constructors. Though it seemed on the surface that the new syntax shouldn't affect copy constructors, it actually did. We should want the constructor syntax to be consistent. It would be weird if the copy constructor looked like a normal constructor, but the move constructor had some kind of extra syntax requiring a UDA or additional tokens.

The forum discussion had gotten sidetracked a bit to talk about a couple of limitations of copy constructors. One was that if you had a templated constructor that was supposed to be a copy constructor, there was no way for the compiler to know that without instantiating it. The second was that if you had a struct A that had a field of struct B, and A had no copy constructor defined but B did, the compiler would then define an inout copy constructor that was basically useless and wouldn't be callable.

When Razvan had written the copy constructor DIP, his initial approach to copy constructor generation was to look at all of the fields to see what copy constructors they define and do an intersection---define all those copy constructors and if they type check, then generate them; if they don't type check, just disable them.

At the time, Walter had been against that approach as being too complex. It was easier to generate a single copy constructor. Razvan said people had been people reporting issues with that approach, and seeing that inout copy constructors were useless, he had decided to go ahead and implement his original approach and submitted a PR for it.

Razvan thought that approach matched what people expected. He knew that Walter still didn't like it, but he suspected most people in the meeting would prefer it. He wasn't sure though, so that was why he wanted to bring it up.

Walter said he'd posted some new comments on the (at the time, Bugzilla) issue about inout copy constructor generation the night before the meeting.

He then emphasized that this issue was completely orthogonal to move constructors and he didn't understand why it kept coming up in that forum thread. It should be in a separate thread.

Jonathan said that move constructors would have the same issue. Walter agreed, and because it affected both copy and move constructors, it was a separate issue from either of them.

Timon noted that a move constructor modified the object it was coming from, so there was a question of what to do with the qualifiers there. Walter said that a copy constructor with a non-const from argument could modify it.

Timon agreed but said we were now talking about qualified arguments. It was a similar problem with move constructors because it also applied to shared. He thought it was a separate discussion in terms of what should be done there.

Walter agreed, then went back to the reported issue. He said he'd boiled down the simple example to something much simpler that made the problem much clearer. He agreed that the generation of an inout constructor was just wrong, but if you went through the fields and some of them were mutable, then you generated a copy constructor with a mutable rvalue, he didn't see why multiple copy constructors needed to be generated.

Another thing he didn't like was what happened with shared. That was a weird beast, because when you did things with shared, you didn't do things as you did with normal code. Trying to make a shared struct work like a normal struct seemed like something that wasn't going to work anyway, so why worry about the shared thing? All you were really dealing with was const and immutable. The thing to do, then, was to look through the fields for const and immutable, or for mutable initializations, and generate a single copy constructor with a const, immutable, or mutable argument as required. He didn't see a reason to have a combinatorial collection of constructors.

Martin said that was what he would expect, too: check through all the fields and if any required a mutable reference, then the parent aggregate's copy constructor would require a mutable reference, too. Check for immutable, then const, then mutable, then we should be finished.

Walter said if copy constructors for the fields were inout, then we'd need to generate an inout copy constructor in that case. He didn't see any issues with it.

Razvan said the issue was that he didn't know how people were defining their copy constructors. If you had fields with different kinds of copy constructors, he wasn't sure that generating just one copy constructor was sufficient to cover all the cases. And it was technically possible to have a copy constructor that was shared, so what would you do when a field had that?

Jonathan said that as soon as you had multiple members that were shared, there was no way a generated shared copy constructor could be valid in terms of what it needed to do. He thought it was no problem, in that case, to just say, sorry, but we're not going to generate anything for you because we're not sure we can do the semantically correct thing with shared. Even if your member variables each handled shared correctly individually, the semantics changed once you were dealing with the type as a whole. We couldn't just do that automatically. So it was fine to say we're not going to do this for you, you have to write one yourself.

Walter agreed. If you were going to mess around with shared, you should be explicit and write your own copy constructor to do what you wanted it to do.

He said the problem with having multiple copy constructors was that you really wouldn't. The compiler would just end up picking one, and that's what it would always pick. So there was only one copy constructor you had to worry about.

Razvan wanted clarification that when we had multiple fields and only one was shared, then a copy constructor shouldn't be generated. Jonathan confirmed yes because there was no correct way to define it.

Walter said you couldn’t automatically know what the user intended since dealing with shared would likely require calling atomic operations and such. The whole concept of copy construction with a shared type seemed problematic to him.

Jonathan added you'd need locks and stuff internally to deal with it properly, but you had to make sure you'd written your code specifically to deal with it properly. The compiler would have no idea how to do that.

Razvan said these rules sounded simple, but thinking out loud, you could have all kinds of weird cases. If you had a field that had a mutable to mutable copy constructor, but then you also had one with a const to mutable copy constructor...

Walter said that when you mixed const and mutable, then you got const. And if the fields were mutable, then you generated a mutable copy constructor. If anyone could come up with a case where this was wrong and he'd overlooked something, he invited them to let him know, but he thought this approach would work.

He added that although a mutable copy constructor was legitimate, he thought a lot of people wrote them by default when they should really be writing a const one by default.

Jonathan said for a case he had, he would like to be able to say you can't copy immutable. Walter said there were legitimate reasons for that, but they were unusual.

This led to a digression about copying immutable fields and the transitiveness of const. Then Walter asked Razvan to revise his PR to try the other scheme they had discussed and see if there was something they hadn't thought of. If it turned out to be wrong, they could revisit it.

Conclusion

With the agenda items covered and having gone about an hour and 20 minutes, Razvan asked if anyone had anything else to discuss. Martin said he hadn't followed the forum thread on move destructors and it was too big to dig into now. This launched a surprisingly long discussion that got deep into the weeds of move constructors, the C++ implementation, Weka's use case for the old opPostMove DIP, and more. It's a difficult conversation to follow, and I don't see that anything actionable came out of it, so I'll skip the summary.

Our next monthly meeting took place on November 8th.

If you have something you'd like to discuss with us in one of our monthly meetings, feel free to reach out and let me know.

Forums

The Attendees

The Summary

Primary Type Syntax DIP

Copy constructor generation

Conclusion