On Sunday, 14 May 2023 at 12:47:59 UTC, Mike Parker wrote:>
Dennis started by saying that the CI was randomly failing again. He didn't have a Mac, so he'd been unable to chase down the problem. Random CI failures are a recurring problem. There are so many checks, and he doesn't know who created them or who knows exactly what the checks are doing. He wishes the tests had someone responsible for them who he could turn to when they fail.
Walter asked who had previously been in charge of the tests. Razvan said he didn't recall if one person was ever in charge of them. At some point, someone decided it was a good idea to have a particular test and it got added to the pipeline.
Dennis asked if we should only keep tests that have a maintainer. Martin and Mathias quickly rejected that. Martin said the tests are good. CI failures are usually caused by CI image bumps or a PR. CI image changes are a PITA for LDC's tests, and PR-related failures may not be easy to resolve, but failures are hardly ever the fault of a test. And there's never been any specific person responsible for any of DMD's CI systems. They just grew organically. Then someone who knew the details moved on and no one else knows them... it's a constant maintenance burden, but it's worth the effort.
There was a bit more discussion about the maintenance burden, after which I noted that this is the story of our ecosystem. We're responsible now for things none of us set up, and we need to get a handle on it all. Dennis agreed and added that the CI is in a special position. When one of them is outdated, it doesn't just sit there out of the way, it becomes an annoyance to development.
(NOTE: This is one of the many aspects of our ecosystem that we'll be working to improve under our new workflow.)
Some thoughts on testing:
This (MacOS) failure has been fixed (by me). It apparently also occurred with some other LibCs out there prior to that too. In future these kinds of failures must be prioritized a little more aggressively, this didn't just mean "Oh well, we'll ignore that pipeline for a while", it meant that Phobos effectively didn't work on MacOS (Oops).
At a bigger scale: We probably have too many CI pipelines. The main ones that I have
in mind that really could go are the OMF pipelines --- In OMF we have some ancient baggage which we don't need and shouldn't want to support anymore: Microsoft barely mention OMF anymore, its not the default from dmd on 32bit windows anymore, and having it in the testsuite ties us to the
Digital Mars ecosystem for likely zero benefit (Would you, reader, use Digital Mars if you were building C code on Windows today?)
The testing process could also use some love in terms of exactly how they're setup. Does everything that should/could use the host compiler use that compiler? Although I think its partly his own doing in not exerting much control over the compiler codebase other than when others try to organize it, Walter is right that the test suite should ideally be segmented into tests ordered by some measure of the number of features they depend upon.
We should have either digger or something like digger (likely a shell script, shooting from the hip I think digger is a very good idea but too complicated, myself and others have all had it not work in mysterious ways) being checked on every PR to make sure its easy to reproduce.
Automatic bisect? When github issues are done this could be an interesting use of richer integration with the concept of an issue to make developers productive. When a bug report is filed it, finding the commit that caused the issue can and should be done by a bot.