Thread overview
[dmd-internals] How not to report unused identifiers
Mar 29, 2016
cy
Mar 30, 2016
Jonathan M Davis
Apr 01, 2016
cy
Mar 30, 2016
Daniel Murphy
March 29, 2016
Well, Module in dmodule.d has a list called "members" returned from parsing the code, each item a symbol that's supposedly a member of that module. But, the list also includes template specializations and package names, which are apparantly never searched for?

In dmd, I made a bool[const(char)*] searched array in the Module class, and added to that a symbol's "toChars" representation every time it was searched for in that module. Then, in the third pass (was that a mistake?) I compared every item in Module.members to see if its toChars representation was in that array. If not, it warns about an unused identifier.

So, it searched for like 5 symbols, and there were 2000 symbols give or take in the members array, resulting in a wall of text of unused identifiers, that are actually used I'm sure, but I have no idea how to determine that, if they're never looked up.
_______________________________________________
dmd-internals mailing list
dmd-internals@puremagic.com
http://lists.puremagic.com/mailman/listinfo/dmd-internals
March 29, 2016
On Tuesday, March 29, 2016 21:11:09 cy via dmd-internals wrote:
> Well, Module in dmodule.d has a list called "members" returned from parsing the code, each item a symbol that's supposedly a member of that module. But, the list also includes template specializations and package names, which are apparantly never searched for?
>
> In dmd, I made a bool[const(char)*] searched array in the Module class, and added to that a symbol's "toChars" representation every time it was searched for in that module. Then, in the third pass (was that a mistake?) I compared every item in Module.members to see if its toChars representation was in that array. If not, it warns about an unused identifier.
>
> So, it searched for like 5 symbols, and there were 2000 symbols give or take in the members array, resulting in a wall of text of unused identifiers, that are actually used I'm sure, but I have no idea how to determine that, if they're never looked up.

I don't know all that much about the compiler internals, so I don't know quite what you're looking at, but I would point out that any attempt to warn about "unused identifiers" would have to be done very carefully. Aside from the normal issues with RAII and the like where something is declared and then not used but still does something, it's _very_ common in D code to declare stuff and then not actually use it when dealing with template constraints and the traits tested in them, because it's testing for compilation, not actually running the code. And even if all of that is handled correctly, if the compiler is very aggressive about warning about unused identifers, templates and string mixins could run into trouble depending on the exact arguments used with them and how complicated some of their static if-else logic is.

So, any attempt to warn about unused identifiers is going to have to be _very_ careful, or it's going to cause problems for perfectly legitimate code - especially if you consider that if -w is used, then warnings are treated as errors. And the fact that that affects conditional compilation makes it even worse. The compiler can't ever warn about any unused identifiers which should actually stay in the code, or it's going to be a big problem.

- Jonathan M Davis
_______________________________________________
dmd-internals mailing list
dmd-internals@puremagic.com
http://lists.puremagic.com/mailman/listinfo/dmd-internals
March 30, 2016
So, a few thoughts:

- bool[const(char)*] will be indexed by the pointer address by
default, not the value of the string.  bool[const(char)[]] should
work.
- toChars is not great for this.  Using the mangled name would be better
- Not every symbol that is used will have an identifier search done
for it.  Sometimes they will be cached as members of some ast class
and referenced directly.  Doing a walk of the post-semantic AST to
build the referenced list might be better.  See what the json emission
does for an example of a post-semantic walk, although that doesn't
walk expressions (which you'd need to do).

On Wed, Mar 30, 2016 at 8:11 AM, cy via dmd-internals <dmd-internals@puremagic.com> wrote:
> Well, Module in dmodule.d has a list called "members" returned from parsing the code, each item a symbol that's supposedly a member of that module. But, the list also includes template specializations and package names, which are apparantly never searched for?
>
> In dmd, I made a bool[const(char)*] searched array in the Module class, and added to that a symbol's "toChars" representation every time it was searched for in that module. Then, in the third pass (was that a mistake?) I compared every item in Module.members to see if its toChars representation was in that array. If not, it warns about an unused identifier.
>
> So, it searched for like 5 symbols, and there were 2000 symbols give or take
> in the members array, resulting in a wall of text of unused identifiers,
> that are actually used I'm sure, but I have no idea how to determine that,
> if they're never looked up.
> _______________________________________________
> dmd-internals mailing list
> dmd-internals@puremagic.com
> http://lists.puremagic.com/mailman/listinfo/dmd-internals
_______________________________________________
dmd-internals mailing list
dmd-internals@puremagic.com
http://lists.puremagic.com/mailman/listinfo/dmd-internals
April 01, 2016
On Wednesday, 30 March 2016 at 01:24:42 UTC, Jonathan M Davis wrote:
>  any attempt to warn about "unused identifiers" would have to be done very carefully.

Well, obviously that was a vague, ignorant first attempt. I do realize some care would be needed. Though I don't think it's all that difficult to define unused identifiers, finding them is a lot more complicated than just listing the symbols a module has, and subtracting the symbols it ends up exporting. I thought search() would be called on internal symbols, not just exports.

> Aside from the normal issues with RAII and the like

So obviously any object with a constructor or a destructor would have to be excluded. Something with pure versions of those should probably be included though.

> it's _very_ common in D code to declare stuff and then not actually use it when dealing with template constraints and the traits tested in them,

I was sort of hoping to get in there after template instantiation had been calculated and only ones with true constraints were left.

> templates and string mixins could run into trouble depending on the exact arguments used with them and how complicated some of their static if-else logic is.

Again, it's important not to check for unused identifiers until you've given the programmer a chance to get rid of 'em themselves. While it would be possible to check "if this constraint had been true, THEN this code here with an unused identifier would be used" I don't see any real utility to that. I assume you're talking about like...

void main() {
  int i = 42;
  static if(false) {
    int j = 23;
  }
  import std.stdio: writeln;
  writeln(i);
}

I definitely would not include it as a goal to warn that "j" would be an unused identifier, if you ever set the condition to true. I'd be happy with a warning that only appeared when I did set the condition to true, and angry with one that wouldn't go away even if I used a CTFE to comment out the whole block of code "#if 0" style.

It might be safe not to report any unused identifier that appeared to be a type, not a variable or a module.

Anyway, if I could figure it out, this is what a "careful" algorithm would do. First, it would only check after the module has been fully instantiated, and all the compile time code has been executed. Of all the symbols a module has, it removes the ones that are touched by other modules (not just imported, but actually used, by this algorithm recursively). Then it removes any things that can have side effects even if not used, like structs or classes with non-pure constructors/destructors.

Then, generally the only thing that crops up in heavily modified code is unused imports, so there would be an option to report all unused identifiers and if not, it would eliminate all remaining symbols that weren't declared in a different module than this one.

Finally, it would have a set of unused identifiers. If the aforementioned option were false, it'd build a set of statements where those identifiers were imported, and highlight them giving a warning about unused import. And if true, it'd report each unused identifier, possibly grouped by the modules they came from.

Which is to say, that's what I would do, if I knew when to instrument it post-CTFE but pre-runtime, if I knew how to check if a symbol were an instantiation of an object with a constructor/destructor, if I knew how to look up those functions, if I knew how to check if a function is pure, if I knew how to get the module a symbol was declared in, if I knew how to get the statement a symbol was imported from, if I knew how to untangle complicated templated types into the identifiers they're constituted of, and if I knew how to add options to invoking dmd.

But for now, I'll have to use the strategy of repeatedly deleting import statements, and recompiling to see if it breaks my code.
_______________________________________________
dmd-internals mailing list
dmd-internals@puremagic.com
http://lists.puremagic.com/mailman/listinfo/dmd-internals