Jump to page: 1 2
Thread overview
[phobos] Time taken for running unit tests
Sep 26, 2011
Don Clugston
Sep 26, 2011
Jonathan M Davis
Sep 27, 2011
Jonathan M Davis
Sep 27, 2011
Jonathan M Davis
Sep 27, 2011
Vladimir Panteleev
Sep 27, 2011
Sean Kelly
Sep 28, 2011
Jonathan M Davis
Sep 29, 2011
Don Clugston
September 26, 2011
I think we need to have a strategy for managing the amount of time
required for running the unittests. Currently, on Windows, the time
for all compiler tests + phobos tests is one hour. (The druntime tests
are only 30 seconds).
By contrast, running all compiler tests + phobos tests on D1, takes
about four minutes.

I used to be able to use test-driven development extensively --
running all tests after every change. But this is no longer viable on
D2. It's quite terrible to have to wait for one hour before finding
that you broke something.
A very large fraction of the time is used in testing only a tiny part
of the Phobos API. So I have some anxiety over what may happen in the
long term -- if current trends continue, we could easily have ten
hours of Phobos tests eventually.

How about defining a version, eg:
version=ExtendedPhobosTests;
which contains the more exhaustive, black-box tests, which take almost
all of the time. So that the standard tests would consist of
(1) regression tests, which should have a corresponding bugzilla
entry, unless they were discovered during development of the library
(The feature of these tests, is that at some point, they failed);
and
(2) code coverage tests.

Both of these execute quickly, and since they are linear with the number of reported bugs + the number of lines of code, they should always remain manageable. The black-box tests, on the other hand, are potentially unlimited.
September 26, 2011
Seems eerily similar to performance I have with dcollections' unit tests.

What about making the compiler more efficient?

http://d.puremagic.com/issues/show_bug.cgi?id=4900

-Steve


>________________________________
>From: Don Clugston <dclugston at googlemail.com>
>To: Discuss the phobos library for D <phobos at puremagic.com>
>Sent: Monday, September 26, 2011 5:37 AM
>Subject: [phobos] Time taken for running unit tests
>
>I think we need to have a strategy for managing the amount of time
>required for running the unittests. Currently, on Windows, the time
>for all compiler tests + phobos tests is one hour. (The druntime tests
>are only 30 seconds).
>By contrast, running all compiler tests + phobos tests on D1, takes
>about four minutes.
>
>I used to be able to use test-driven development extensively --
>running all tests after every change. But this is no longer viable on
>D2. It's quite terrible to have to wait for one hour before finding
>that you broke something.
>A very large fraction of the time is used in testing only a tiny part
>of the Phobos API. So I have some anxiety over what may happen in the
>long term -- if current trends continue, we could easily have ten
>hours of Phobos tests eventually.
>
>How about defining a version, eg:
>version=ExtendedPhobosTests;
>which contains the more exhaustive, black-box tests, which take almost
>all of the time. So that the standard tests would consist of
>(1) regression tests, which should have a corresponding bugzilla
>entry, unless they were discovered during development of the library
>(The feature of these tests, is that at some point, they failed);
>and
>(2) code coverage tests.
>
>Both of these execute quickly, and since they are linear with the
>number of reported bugs + the number of lines of code, they should
>always remain manageable. The black-box tests, on the other hand, are
>potentially unlimited.
>_______________________________________________
>phobos mailing list
>phobos at puremagic.com
>http://lists.puremagic.com/mailman/listinfo/phobos
>
>
>
September 26, 2011
On Monday, September 26, 2011 11:37:30 Don Clugston wrote:
> I think we need to have a strategy for managing the amount of time
> required for running the unittests. Currently, on Windows, the time
> for all compiler tests + phobos tests is one hour. (The druntime tests
> are only 30 seconds).
> By contrast, running all compiler tests + phobos tests on D1, takes
> about four minutes.
> 
> I used to be able to use test-driven development extensively --
> running all tests after every change. But this is no longer viable on
> D2. It's quite terrible to have to wait for one hour before finding
> that you broke something.
> A very large fraction of the time is used in testing only a tiny part
> of the Phobos API. So I have some anxiety over what may happen in the
> long term -- if current trends continue, we could easily have ten
> hours of Phobos tests eventually.
> 
> How about defining a version, eg:
> version=ExtendedPhobosTests;
> which contains the more exhaustive, black-box tests, which take almost
> all of the time. So that the standard tests would consist of
> (1) regression tests, which should have a corresponding bugzilla
> entry, unless they were discovered during development of the library
> (The feature of these tests, is that at some point, they failed);
> and
> (2) code coverage tests.
> 
> Both of these execute quickly, and since they are linear with the number of reported bugs + the number of lines of code, they should always remain manageable. The black-box tests, on the other hand, are potentially unlimited.

In principle, I think that it's a solid idea, though I'm not sure that it's necessarily at all obvious where the line should be drawn between the normal and extended tests, and that could be a bit tricky. We wouldn't want the normal unit tests to become not extensive enough simply in an effort to reduce the amount of time that it takes to run tests, but on the other hand, we don't want to the tests to take forever, so it's a bit of a balancing act.

For whatever it's worth, on the whole, I believe that the main culprit for the length of time that it takes to run the unit tests is the amount of time that it takes to compile them. Actually running the tests is generally quite quick. And Windows does that much worse then, because it compiles all of the modules together for unit tests instead of separately like Posix does - though that has the advantage of helping to find module interdependencies with module constructors which don't get found otherwise, so I'm not sure that we necessarily want to change how Windows does the unit tests. Maybe one way to deal with that would be to adjust the makefiles so that the modules are normally tested separately on _all_ of the OSes, but there's make target on all OSes to build them all as one if we want to test for module constructor issues - though then that target risks not every being called.

The other question is what the autotester should be running. Should it run the normal unit tests or the extend unit tests? Should it run then the normal ones normally and the extended ones periodically? It'll be easier to miss platform- specific bugs if the extended tests are only ever run on an individual's machine and never the autotester.

On the whole though, I think that this is a good idea.

- Jonathan M Davis
September 26, 2011
On 9/26/11 2:37 AM, Don Clugston wrote:
> I think we need to have a strategy for managing the amount of time
> required for running the unittests. Currently, on Windows, the time
> for all compiler tests + phobos tests is one hour. (The druntime tests
> are only 30 seconds).
> By contrast, running all compiler tests + phobos tests on D1, takes
> about four minutes.
[snip]

Breaking the tests into two categories would work, but would entail a lot of work and further effort (making a decision for each new unittest).

A better solution (that's been mentioned already) would be to do for Windows what we already do for Unix - each module's unittest defines a separate binary, linked with the regular (non-unittest) library. Then whenever you work on a module you can unittest only that one.


Andrei
September 26, 2011
On Monday, September 26, 2011 17:46:31 Andrei Alexandrescu wrote:
> On 9/26/11 2:37 AM, Don Clugston wrote:
> > I think we need to have a strategy for managing the amount of time
> > required for running the unittests. Currently, on Windows, the time
> > for all compiler tests + phobos tests is one hour. (The druntime tests
> > are only 30 seconds).
> > By contrast, running all compiler tests + phobos tests on D1, takes
> > about four minutes.
> 
> [snip]
> 
> Breaking the tests into two categories would work, but would entail a lot of work and further effort (making a decision for each new unittest).
> 
> A better solution (that's been mentioned already) would be to do for Windows what we already do for Unix - each module's unittest defines a separate binary, linked with the regular (non-unittest) library. Then whenever you work on a module you can unittest only that one.

This would definitely improve the Windows build, but I would suggest that both the Windows and Linux builds have a separate target which builds and runs all of the modules' unit tests at once in order to catch circular dependencies caused by static constructors. Obviously, both the Posix and Windows makefiles would need to be adjusted in order for that to work.

Also, the fact that dmd is having the issues that its having compiling Phobos' unit tests really should be addressed unless there's something intrinsic to the problem which makes it essentially unsolvable. dmd needs to be able to build large projects, and the fact that it's having the trouble that it's having right now does not bode well for large projects.

- Jonathan M Davis
September 26, 2011
On 9/26/11 6:05 PM, Jonathan M Davis wrote:
> This would definitely improve the Windows build, but I would suggest that both the Windows and Linux builds have a separate target which builds and runs all of the modules' unit tests at once in order to catch circular dependencies caused by static constructors. Obviously, both the Posix and Windows makefiles would need to be adjusted in order for that to work.

Wouldn't the issues manifest themselves without the unittests?

> Also, the fact that dmd is having the issues that its having compiling Phobos' unit tests really should be addressed unless there's something intrinsic to the problem which makes it essentially unsolvable. dmd needs to be able to build large projects, and the fact that it's having the trouble that it's having right now does not bode well for large projects.

Good point. Time to compiling Phobos is an excellent benchmark for dmd.


Andrei
September 26, 2011
On Monday, September 26, 2011 18:27:05 Andrei Alexandrescu wrote:
> On 9/26/11 6:05 PM, Jonathan M Davis wrote:
> > This would definitely improve the Windows build, but I would suggest that both the Windows and Linux builds have a separate target which builds and runs all of the modules' unit tests at once in order to catch circular dependencies caused by static constructors. Obviously, both the Posix and Windows makefiles would need to be adjusted in order for that to work.
> 
> Wouldn't the issues manifest themselves without the unittests?

Unfortunately, they can't be found at compile time. They only get found at runtime. If we compile all of Phobos' unit tests in one executable, then we find them when the unit tests run. If we don't, then we only find them when someone runs into the issue and reports a bug. Every circular dependency that I've seen in Phobos thus far was caught by the Windows unit tests and not the Posix unit tests, because the Windows units compiling everything into one executable, and the Phobos unit tests split all of the modules up. In general, splitting the modules up is better, but it does fail to catch this one problem, which is why I'm suggesting that we have a separate make target for building and running all of Phobos unit tests in one executable. It will make it easier to test for circular dependencies, and it will also make it easier to benchmark how well dmd is doing at compiling a lot of code at once.

- Jonathan M Davis
September 27, 2011
On Mon, 26 Sep 2011 12:37:30 +0300, Don Clugston <dclugston at googlemail.com> wrote:

> I think we need to have a strategy for managing the amount of time
> required for running the unittests. Currently, on Windows, the time
> for all compiler tests + phobos tests is one hour. (The druntime tests
> are only 30 seconds).
> By contrast, running all compiler tests + phobos tests on D1, takes
> about four minutes.

For the record, I've already put some new tests in my std.socket cleanup under version(SlowTests). These are tests which take more than a second to run (e.g. testing network timeouts).

-- 
Best regards,
  Vladimir                            mailto:vladimir at thecybershadow.net
September 27, 2011
On Sep 26, 2011, at 2:37 AM, Don Clugston wrote:
> 
> How about defining a version, eg:
> version=ExtendedPhobosTests;
> which contains the more exhaustive, black-box tests, which take almost
> all of the time. So that the standard tests would consist of
> (1) regression tests, which should have a corresponding bugzilla
> entry, unless they were discovered during development of the library
> (The feature of these tests, is that at some point, they failed);
> and
> (2) code coverage tests.
> 
> Both of these execute quickly, and since they are linear with the number of reported bugs + the number of lines of code, they should always remain manageable. The black-box tests, on the other hand, are potentially unlimited.

I'd definitely support something like this.  At a prior job, the product I was working on had extensive tests (a full regression run took over 24 hours) and every commit had to be preceded by a MAT (minimal acceptance test) run.  This caught most product-breaking bugs and completed in only a few minutes.

What if the modules each contained what are essentially a MAT validation suite and the exhaustive tests lived in other files?  This is already basically how Phobos testing works on Posix, and avoids cluttering up module files with thousands of lines of test code.

September 28, 2011
On Monday, September 26, 2011 11:37:30 Don Clugston wrote:
> I think we need to have a strategy for managing the amount of time required for running the unittests. Currently, on Windows, the time for all compiler tests + phobos tests is one hour.h

Wow. Your machine must not be particularly new. Mine (a Phenom II X6) manages to compile and run the Phobos unit tests on Windows - in Virtual Box - in about 8 minutes and 14 seconds. In Linux 64, on the other hand, they take about 3 and half minutes (though that includes both the debug and release builds, and I'm not sure that Windows does, so the disparity is likely that much greate). So, the differences in how long it takes on my machine and your machine are pretty large - larger than I would have expected vs a slower machine, but maybe I shouldn't be surprised.

Interestingly enough though, the dmd build that I had on Windows initially (which was from towards the beginning of this month) before I updated today took just under 6 minutes. So, the changes since then (probably the GC stuff) made it take about a third longer than before.

Regardless, dmd's performance obviously needs to be improved, and there would be a significant speed gain in redoing the Windows unit test build to be split up like the Linux build.

- Jonathan M Davis
« First   ‹ Prev
1 2