D frontend

Solving the spurious forward/cyclic reference errors in DMD

Jul 17, 2017

Jul 20, 2017

Jul 21, 2017

As there has been some interest at DConf, I have pushed the experimental D frontend I have been hacking on to github under the boost licence (for compatibility with DMD). https://github.com/tgehr/d-compiler It is not yet feature-complete (e.g. constructors, destructors, exception handling,... is missing), but it handles e.g. large portions of templates and CTFE. (CTFE uses a byte code interpreter.) One major goal of this project is to figure out a good way to handle non-trivial compile-time dependency structures (i.e. solving the "forward reference error" problem). As it does explicit dependency tracking on AST nodes, it can possibly be used for fine-grained (AST-level) incremental compilation in the future. Unfortunately the only version of DMD that builds the project is 2.060 (due to forward reference errors with newer versions). It is not exactly well documented at the moment, so feel free to contact me if something does not make sense. (There are some really ugly hacks working around the lack of UDAs in 2.060, for example.)

On 5/6/16 11:34 AM, Timon Gehr wrote: > As there has been some interest at DConf, I have pushed the experimental > D frontend I have been hacking on to github under the boost licence (for > compatibility with DMD). > > https://github.com/tgehr/d-compiler > > It is not yet feature-complete (e.g. constructors, destructors, > exception handling,... is missing), but it handles e.g. large portions > of templates and CTFE. (CTFE uses a byte code interpreter.) > > One major goal of this project is to figure out a good way to handle > non-trivial compile-time dependency structures (i.e. solving the > "forward reference error" problem). As it does explicit dependency > tracking on AST nodes, it can possibly be used for fine-grained > (AST-level) incremental compilation in the future. > > Unfortunately the only version of DMD that builds the project is 2.060 > (due to forward reference errors with newer versions). > > It is not exactly well documented at the moment, so feel free to contact > me if something does not make sense. (There are some really ugly hacks > working around the lack of UDAs in 2.060, for example.) Great news! I think an excellent first step would be to get you or another contributor to reduce the forward reference matter to a small size. That would help a fix. -- Andrei

On Friday, 6 May 2016 at 09:34:39 UTC, Timon Gehr wrote: > One major goal of this project is to figure out a good way to handle non-trivial compile-time dependency structures (i.e. solving the "forward reference error" problem). As it does explicit dependency tracking on AST nodes, it can possibly be used for fine-grained (AST-level) incremental compilation in the future. Great initiative! While mapping big C++ librairies with a lot of templates with Calypso I still hit forward referencing errors from time to time. Some of them were easily fixed but I feel that making the whole thing more robust is a complex task that can't be achieved through small fixes/hacks, it needs a drastic solution like yours.

Timon, any update on this? What are the insights you gained with your frontend? I recently reported two cases without a simple fix: https://issues.dlang.org/show_bug.cgi?id=17656 https://issues.dlang.org/show_bug.cgi?id=17194#c1 and have seen a lot more referencing errors with Calypso, especially when this gets enabled: https://github.com/Syniurge/Calypso/commit/1e1ae319e32120bd9ef0009716ddabed92f69ac2 Calypso makes its mapped C++ symbols go through the same importAll -> semantic1,2,3 route that D symbols take. Ultimately this is mostly useless work that should be skipped, the reason it currently works this way being that I wasn't familiar yet with the DMD source code when I started. But what this hard and ungrateful work has also been doing (and many large libraries are blocked by this) is exposing a seemingly infinite number of bogus forward/circular/misc referencing DMD errors. Those errors boil down to semantic calls getting triggered at the wrong time, on symbols that the caller doesn't really depend upon. Because most of the time, the semantic() call on the LHS of DotXXXExp, inside AggregateDeclaration.determineSize, etc. is there in case there are: - mixins to expand - attributes whose members have to be added to the parent symtab - if LHS is a template to instantiate These are (AFAIK) the only cases where the symtab of the LHS or the aggregate may get altered, and if I understand correctly that's what the semantic call is checking before searching for the RHS or determining the aggregate fields and then its size. So would splitting semantic() into determineMembers() preceding the rest of semantic() be worth exploring? The thing is, this would help in most cases but I can imagine scenarios where simply splitting may not be enough. Example: enum E { OOO = S.UUU } import std.conv; string genMember1() { return "enum G8H9 = " ~ (cast(int)E.OOO).to!string; } string genMember2() { return "enum UUU = 1;"; } struct S { mixin(genMember1()); mixin(genMember2()); } We'll have S.determineMembers -> E.OOO.semantic -> S.determineMembers, and although in this case the value of OOO may be interpreted to 1, at this point the compiler can't easily know whether mixins will generate zero, one or more UUU members or not. To attenuate the problem determineMembers() could be made be callable multiple times (recursively), each time starting from where the previous (on-going) call left off, so in this particular case the second S.determineMembers call would expand the second mixin to enum UUU = 1. But then how does the compiler knows and react if genMember1 generate a new UUU member? Ok a second UUU enum will error, but what if UUU was a function and genMember1() generates a more specialized overload of UUU? I.e: enum E { OOO = S.UUU(1) } import std.conv; string genMember1() { return "static int UUU(int n) { return n; }; enum G8H9 = " ~ (cast(int)E.OOO).to!string; } string genMember2() { return "static int UUU(int n, int a = 5) { return n + 5; }"; } struct S { mixin(genMember1()); mixin(genMember2()); } At this point well it's getting a bit contrived, so maybe it's not really worth finding a solution to make this compile (but ideally the compiler should still warn the user). Should I try splitting semantic() and make a PR? It might be a lot of work, so I'd like to know if this makes sense first.

July 21, 2017

Re: Solving the spurious forward/cyclic reference errors in DMD

Posted by Timon Gehr
in reply to Elie Morisse

Permalink

Timon Gehr

Posted in reply to Elie Morisse

Permalink

On 17.07.2017 03:16, Elie Morisse wrote:
> Timon, any update on this?

Not really. I did not have much time to work in this in the past year. (Also, it is not very convenient to work with DMD 2.060.)

> What are the insights you gained with your frontend?
> ...

What I have implemented works well but still has a few limitations, e.g. the following code does not compile:

enum x = "enum xx = q{int y = 0;};";

struct SS{
    mixin(xx);
    mixin(x); // error
}

I believe this should be allowed. An easy but slightly ugly solution would be to sequentialize analysis of string mixins even in scopes that have no ordering. (Then the above would still not compile, but it would compile once we change the order of the two string mixins.)

The idea behind the algorithm used by my frontend is to analyze all AST nodes concurrently, blocking them as soon as they need information that is not yet available. This continues until the analysis has finished or all AST nodes are blocked. In that case, the compiler examines the top strongly connected components of the dependency graph (i.e. the smallest set of nodes that everything else depends on). Some kinds of analysis dependencies can be discharged by assuming a natural default result. (For example, we can assume by default that a certain member is not present, or that a more specialized overload does not exist, etc.) The assumptions are made simultaneously for all nodes in the top strongly connected components and analysis proceeds. The compiler prints an error if any assumptions end up being violated.

> I recently reported two cases without a simple fix:
> 
> https://issues.dlang.org/show_bug.cgi?id=17656
> https://issues.dlang.org/show_bug.cgi?id=17194#c1
> ...

Those are easily handled by my algorithm.

> and have seen a lot more referencing errors with Calypso, especially when this gets enabled: https://github.com/Syniurge/Calypso/commit/1e1ae319e32120bd9ef0009716ddabed92f69ac2 
> 
> 
> Calypso makes its mapped C++ symbols go through the same importAll -> semantic1,2,3 route that D symbols take. Ultimately this is mostly useless work that should be skipped, the reason it currently works this way being that I wasn't familiar yet with the DMD source code when I started. But what this hard and ungrateful work has also been doing (and many large libraries are blocked by this) is exposing a seemingly infinite number of bogus forward/circular/misc referencing DMD errors.
> Those errors boil down to semantic calls getting triggered at the wrong time, on symbols that the caller doesn't really depend upon.
> ...

I don't think that in the end there will be a better solution than explicitly tracking what depends on what.

> Because most of the time, the semantic() call on the LHS of DotXXXExp, inside AggregateDeclaration.determineSize, etc. is there in case there are:
>   - mixins to expand

Also, conditional compilation like static if, I guess.

>   - attributes whose members have to be added to the parent symtab
>   - if LHS is a template to instantiate
> 
> These are (AFAIK) the only cases where the symtab of the LHS or the aggregate may get altered, and if I understand correctly that's what the semantic call is checking before searching for the RHS or determining the aggregate fields and then its size.
> ...

I handle more things, for example inheritance, or adding members to aggregates whose instances have already been used in CTFE etc., but those are probably not a big deal for your use case.

> So would splitting semantic() into determineMembers() preceding the rest of semantic() be worth exploring?

Probably yes, but it will at most be a partial solution.

> The thing is, this would help in most cases but I can imagine scenarios where simply splitting may not be enough. Example:
> 
> enum E { OOO = S.UUU }
> 
> import std.conv;
> string genMember1() { return "enum G8H9 = " ~ (cast(int)E.OOO).to!string; }
> string genMember2() { return "enum UUU = 1;"; }
> 
> struct S {
>      mixin(genMember1());
>      mixin(genMember2());
> }

genMember1() is missing a semicolon in the code it generates, but otherwise this works with what I have.

> 
> We'll have S.determineMembers -> E.OOO.semantic -> S.determineMembers, and although in this case the value of OOO may be interpreted to 1, at this point the compiler can't easily know whether mixins will generate zero, one or more UUU members or not. To attenuate the problem determineMembers() could be made be callable multiple times (recursively), each time starting from where the previous (on-going) call left off,

How do you make sure you don't go on indefinitely and overflow the stack?

> so in this particular case the second S.determineMembers call would expand the second mixin to enum UUU = 1. But then how does the compiler knows and react if genMember1 generate a new UUU member? Ok a second UUU enum will error, but what if UUU was a function and genMember1() generates a more specialized overload of UUU? I.e:
> 
> enum E { OOO = S.UUU(1) }
> 
> import std.conv;
> string genMember1() { return "static int UUU(int n) { return n; }; enum G8H9 = " ~ (cast(int)E.OOO).to!string; }
> string genMember2() { return "static int UUU(int n, int a = 5) { return n + 5; }"; }
> 
> struct S {
>      mixin(genMember1());
>      mixin(genMember2());
> }
>  > At this point well it's getting a bit contrived, so maybe it's not
> really worth finding a solution to make this compile

My opinion is that the following is good enough:

enum E { OOO = S.UUU(1) }

string genMember1() { return "static int UUU(int n) { return n; };"; }
string genMember2() { return "enum G8H9 = " ~ (cast(int)E.OOO).to!string~";"; }
string genMember3() { return "static int UUU(int n, int a = 5) { return n + 5; }"; }

struct S {
    mixin(genMember1());
    mixin(genMember2());
    mixin(genMember3());
}

It is possible to make many cases like yours (that have spurious dependencies through string values) work too, but I don't think it is remotely worth the effort. One would need to e.g. symbolically compute the mixed in string and partially evaluate the parser to extract some declarations before we know the entire mixed in string.

> (but ideally the compiler should still warn the user).
> ...

My frontend says:
<mixin@test.d:7>:1:1: error: introduction of new overload is invalid
static int UUU(int n) { return n; }; enum G8H9 = 6;
^──────────────────────────────────
test.d:1:16: note: overload set was sealed here
enum E { OOO = S.UUU(1) }
               ^────

In general, the frontend is designed such that compiling the original code with all mixins and conditional compilation expanded always gives the same results as compiling the original code. (And if it is not able to achieve this, the frontend prints an error.)

> Should I try splitting semantic() and make a PR? It might be a lot of work, so I'd like to know if this makes sense first.

It's hard to tell for me, as I am not extremely familiar with DMD's internals. If this solves your issues with Calypso, I'd say, go for it. (Also, who knows, you might fix enough forward reference issues to make my frontend compile with the latest DMD again.)

Forums