That's exactly the problem. If these large projects are incorporated into the autotester, who is going to isolate/fix problems arising with them?
The test suite is designed to be a collection of already-isolated issues, so understanding what went wrong shouldn't be too difficult. Note that already it is noticeably much harder to debug a phobos unit test gone awry than the other tests. A full blown project that nobody understands would fare far worse.