Parallel execution of unittests (page 6) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Parallel execution of unittests (page 6)

May 01, 2014

Re: Parallel execution of unittests

Posted by Xavier Bigand
in reply to Andrei Alexandrescu

Xavier Bigand

Posted in reply to Andrei Alexandrescu

Le 30/04/2014 17:59, Andrei Alexandrescu a écrit :
> On 4/30/14, 8:54 AM, bearophile wrote:
>> Andrei Alexandrescu:
>>
>>> A coworker mentioned the idea that unittests could be run in parallel
>>
>> In D we have strong purity to make more safe to run code in parallel:
>>
>> pure unittest {}
>
> This doesn't follow. All unittests should be executable concurrently. --
> Andrei
>
But sometimes unittests have to use shared data that need to be initialized before them. File system operations are generally a critical point, if many unittest are based on same auto-generated file data it's a good idea to run this generation once before all tests (they eventually do a file copy that is fast with copy-on-write file system or those data can be used as read only by all tests).
So for those kind of situation some functions have must be able to run before unittest, and I think that the case of static this() function of modules?

May 01, 2014

Re: Parallel execution of unittests

Posted by Xavier Bigand
in reply to Jonathan M Davis

Xavier Bigand

Posted in reply to Jonathan M Davis

Le 30/04/2014 19:50, Jonathan M Davis via Digitalmars-d a écrit :
> On Wed, 30 Apr 2014 08:59:42 -0700
> Andrei Alexandrescu via Digitalmars-d <digitalmars-d@puremagic.com>
> wrote:
>
>> On 4/30/14, 8:54 AM, bearophile wrote:
>>> Andrei Alexandrescu:
>>>
>>>> A coworker mentioned the idea that unittests could be run in
>>>> parallel
>>>
>>> In D we have strong purity to make more safe to run code in
>>> parallel:
>>>
>>> pure unittest {}
>>
>> This doesn't follow. All unittests should be executable concurrently.
>> -- Andrei
>>
>
> In general, I agree. In reality, there are times when having state
> across unit tests makes sense - especially when there's expensive setup
> required for the tests. While it's not something that I generally
> like to do, I know that we have instances of that where I work. Also, if
> the unit tests have to deal with shared resources, they may very well be
> theoretically independent but would run afoul of each other if run at
> the same time - a prime example of this would be std.file, which has to
> operate on the file system. I fully expect that if std.file's unit
> tests were run in parallel, they would break. Unit tests involving
> sockets would be another type of test which would be at high risk of
> breaking, depending on what sockets they need.
>
> Honestly, the idea of running unit tests in parallel makes me very
> nervous. In general, across modules, I'd expect it to work, but there
> will be occasional cases where it will break. Across the unittest
> blocks in a single module, I'd be _very_ worried about breakage. There
> is nothing whatsoever in the language which guarantees that running
> them in parallel will work or even makes sense. All that protects us is
> the convention that unit tests are usually independent of each other,
> and in my experience, it's common enough that they're not independent
> that I think that blindly enabling parallelization of unit tests across
> a single module is definitely a bad idea.
>
> - Jonathan M Davis
>
I shared this kind of experience too.
pure unittest name {} seems a good idea, and it's more intuitive to have the same behaviour of other functions with a closer signature.

May 01, 2014

Re: Parallel execution of unittests

Posted by Xavier Bigand
in reply to Atila Neves

Xavier Bigand

Posted in reply to Atila Neves

Le 30/04/2014 19:58, Atila Neves a écrit :
> On Wednesday, 30 April 2014 at 17:50:34 UTC, Jonathan M Davis via
> Digitalmars-d wrote:
>> On Wed, 30 Apr 2014 08:59:42 -0700
>> Andrei Alexandrescu via Digitalmars-d <digitalmars-d@puremagic.com>
>> wrote:
>>
>>> On 4/30/14, 8:54 AM, bearophile wrote:
>>> > Andrei Alexandrescu:
>>> >
>>> >> A coworker mentioned the idea that unittests could be run in
>>> >> parallel
>>> >
>>> > In D we have strong purity to make more safe to run code in
>>> > parallel:
>>> >
>>> > pure unittest {}
>>>
>>> This doesn't follow. All unittests should be executable concurrently.
>>> -- Andrei
>>>
>>
>> In general, I agree. In reality, there are times when having state
>> across unit tests makes sense - especially when there's expensive setup
>> required for the tests. While it's not something that I generally
>> like to do, I know that we have instances of that where I work. Also, if
>> the unit tests have to deal with shared resources, they may very well be
>> theoretically independent but would run afoul of each other if run at
>> the same time - a prime example of this would be std.file, which has to
>> operate on the file system. I fully expect that if std.file's unit
>> tests were run in parallel, they would break. Unit tests involving
>> sockets would be another type of test which would be at high risk of
>> breaking, depending on what sockets they need.
>>
>> Honestly, the idea of running unit tests in parallel makes me very
>> nervous. In general, across modules, I'd expect it to work, but there
>> will be occasional cases where it will break. Across the unittest
>> blocks in a single module, I'd be _very_ worried about breakage. There
>> is nothing whatsoever in the language which guarantees that running
>> them in parallel will work or even makes sense. All that protects us is
>> the convention that unit tests are usually independent of each other,
>> and in my experience, it's common enough that they're not independent
>> that I think that blindly enabling parallelization of unit tests across
>> a single module is definitely a bad idea.
>>
>> - Jonathan M Davis
>
> You're right; blindly enabling parallelisation after the fact is likely
> to cause problems.
>
> Unit tests though, by definition (and I'm aware there are more than one)
> have to be independent. Have to not touch the filesystem, or the
> network. Only CPU and RAM. In my case, and since I had the luxury of
> implementing a framework first and only writing tests after it was done,
> running them in parallel was an extra check that they are in fact
> independent.
>
Why a test don't have to touch filesystem? That really restrictive, you just can't have a good code coverage on a lot libraries with a such restriction. I had work on a Source Control Management software, and all tests have to deal with a DB which requires file system and network operations.
IMO it's pretty much like impossible to miss testing of functions relations, simple integration tests are often needed to ensure that the application is working correctly. If D integrate features to support automatized testing maybe it must not be to restrictive mainly if everybody will expect more features commonly used (like named tests, formated result output,...).
Some of those common features have to be added to phobos instead of the language.

> Now, it does happen that you're testing code that isn't thread-safe
> itself, and yes, in that case you have to run them in a single thread.
> That's why I added the @SingleThreaded UDA to my library to enable that.
> As soon as I tried calling legacy C code...
>
> We could always make running in threads opt-in.
>
> Atila

May 01, 2014

Re: Parallel execution of unittests

Posted by Xavier Bigand
in reply to Dicebot

Xavier Bigand

Posted in reply to Dicebot

Le 30/04/2014 21:23, Dicebot a écrit :
> On Wednesday, 30 April 2014 at 18:19:34 UTC, Jonathan M Davis via
> Digitalmars-d wrote:
>> On Wed, 30 Apr 2014 17:58:34 +0000
>> Atila Neves via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
>>> Unit tests though, by definition (and I'm aware there are more than
>>> one) have to be independent. Have to not touch the filesystem, or the
>>> network. Only CPU and RAM.
>>
>> I disagree with this. A unit test is a test that tests a single piece
>> of functionality - generally a function - and there are functions which
>> have to access the file system or network.
>
> They _use_ access to file system or network, but it is _not_ their
> functionality. Unit testing is all about verifying small perfectly
> separated pieces of functionality which don't depend on correctness /
> stability of any other functions / programs. Doing I/O goes against it
> pretty much by definition and is unfortunately one of most common
> testing antipatterns.

Splitting all features at an absolute atomic level can be achieve for open-source libraries, but it's pretty much impossible for an industrial software. Why being so restrictive when it's possible to support both vision by extending a little the language by something already logical?

May 01, 2014

Re: Parallel execution of unittests

Posted by Xavier Bigand
in reply to Russel Winder

Xavier Bigand

Posted in reply to Russel Winder

Le 30/04/2014 22:09, Russel Winder via Digitalmars-d a écrit :
> On Wed, 2014-04-30 at 11:19 -0700, Jonathan M Davis via Digitalmars-d
> wrote:
> […]
>> I disagree with this. A unit test is a test that tests a single piece
>> of functionality - generally a function - and there are functions which
>> have to access the file system or network. And those tests are done in
>
> These are integration/system tests not unit tests. For unit tests
> network activity should be mocked out.
>
And how you do when your mock is bugged? Or you risk to have the mock up to date when changing the code but not the running application cause before the commit you'll run only your unittests.
IMO every tests can be automatize and run in a few time have to be run before each commit even if some are integration tests.

>> unittest blocks just like any other unit test. I would very much
>> consider std.file's tests to be unit tests. But even if you don't
>> want to call them unit tests, because they access the file system, the
>> reality of the matter is that tests like them are going to be run in
>> unittest blocks, and we have to take that into account when we decide
>> how we want unittest blocks to be run (e.g. whether they're
>> parallelizable or not).
>
> In which case D is wrong to allow them in the unittest blocks and should
> introduce a new way of handling these tests. And even then all tests can
> and should be parallelized. If they cannot be then there is an
> inappropriate dependency.
>

May 01, 2014

Re: Parallel execution of unittests

Posted by Andrei Alexandrescu
in reply to Xavier Bigand

Andrei Alexandrescu

Posted in reply to Xavier Bigand

On 4/30/14, 6:20 PM, Xavier Bigand wrote:
> Le 30/04/2014 17:59, Andrei Alexandrescu a écrit :
>> On 4/30/14, 8:54 AM, bearophile wrote:
>>> Andrei Alexandrescu:
>>>
>>>> A coworker mentioned the idea that unittests could be run in parallel
>>>
>>> In D we have strong purity to make more safe to run code in parallel:
>>>
>>> pure unittest {}
>>
>> This doesn't follow. All unittests should be executable concurrently. --
>> Andrei
>>
> But sometimes unittests have to use shared data that need to be
> initialized before them. File system operations are generally a critical
> point, if many unittest are based on same auto-generated file data it's
> a good idea to run this generation once before all tests (they
> eventually do a file copy that is fast with copy-on-write file system or
> those data can be used as read only by all tests).
> So for those kind of situation some functions have must be able to run
> before unittest, and I think that the case of static this() function of
> modules?

Yah version(unittest) static shared this() { ... } covers that. -- Andrei

May 01, 2014

Re: Parallel execution of unittests

Posted by Xavier Bigand
in reply to Byron

Xavier Bigand

Posted in reply to Byron

Le 30/04/2014 18:19, Byron a écrit :
> On Wed, 30 Apr 2014 09:02:54 -0700, Andrei Alexandrescu wrote:
>
>>
>> I think indeed a small number of unittests rely on order of execution.
>> Those will be still runnable with a fork factor of 1. We'd need a way to
>> specify that - either a flag or:
>>
>> static shared this() { Runtime.unittestThreads = 1; }
>>
>>
>> Andrei
>
> Named tested seems like a no brainier to me.
>
> Maybe nested unittests?
>
> unittest OrderTests {
>    // setup for all child tests?
>
>    unittest a {
>
>    }
>
>    unittest b {
>
>    }
>
> }
>
> I also wonder if its just better to extend/expose the unittest API for
> more advanced things like order of execution, test reporting, and parallel
> execution. And we can just support an external unittesting library to do
> all the advanced testing options.
>
I don't see the usage? I'll find nice enough if IDEs will be able to put unittest in a tree and using the module's names for the hierarchy.

May 01, 2014

Re: Parallel execution of unittests

Posted by Xavier Bigand
in reply to Andrei Alexandrescu

Xavier Bigand

Posted in reply to Andrei Alexandrescu

Le 01/05/2014 03:54, Andrei Alexandrescu a écrit :
> On 4/30/14, 6:20 PM, Xavier Bigand wrote:
>> Le 30/04/2014 17:59, Andrei Alexandrescu a écrit :
>>> On 4/30/14, 8:54 AM, bearophile wrote:
>>>> Andrei Alexandrescu:
>>>>
>>>>> A coworker mentioned the idea that unittests could be run in parallel
>>>>
>>>> In D we have strong purity to make more safe to run code in parallel:
>>>>
>>>> pure unittest {}
>>>
>>> This doesn't follow. All unittests should be executable concurrently. --
>>> Andrei
>>>
>> But sometimes unittests have to use shared data that need to be
>> initialized before them. File system operations are generally a critical
>> point, if many unittest are based on same auto-generated file data it's
>> a good idea to run this generation once before all tests (they
>> eventually do a file copy that is fast with copy-on-write file system or
>> those data can be used as read only by all tests).
>> So for those kind of situation some functions have must be able to run
>> before unittest, and I think that the case of static this() function of
>> modules?
>
> Yah version(unittest) static shared this() { ... } covers that. -- Andrei
>
Then I am pretty much ok with the parallelization of all unittests.

It stay the question of name, I don't really know if it have to be in the language or in phobos like other tests features (test-logger, benchmark,...).

May 01, 2014

Re: Parallel execution of unittests

Posted by Jonathan M Davis

Jonathan M Davis

On Wed, 30 Apr 2014 15:33:17 -0700
"H. S. Teoh via Digitalmars-d" <digitalmars-d@puremagic.com> wrote:

> On Wed, Apr 30, 2014 at 02:48:38PM -0700, Jonathan M Davis via Digitalmars-d wrote:
> > On Wed, 30 Apr 2014 21:09:14 +0100
> > Russel Winder via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
> [...]
> > > In which case D is wrong to allow them in the unittest blocks and should introduce a new way of handling these tests. And even then all tests can and should be parallelized. If they cannot be then there is an inappropriate dependency.
> > 
> > Why? Because Andrei suddenly proposed that we parallelize unittest blocks? If I want to test a function, I'm going to put a unittest block after it to test it. If that means accessing I/O, then it means accessing I/O. If that means messing with mutable, global variables, then that means messing with mutable, global variables. Why should I have to put the tests elsewhere or make is that they don't run whenthe -unttest flag is used just because they don't fall under your definition of "unit" test?
> [...]
> 
> What about allowing pure marking on unittests, and those unittests that are marked pure will be parallelized, and those that aren't marked will be run serially?

I think that that would work, and if we added purity inferrence to unittest blocks as Nordlow suggests, then you wouldn't even have to mark them as pure unless you wanted to enforce that it be runnable in parallel.

- Jonathan M Davis

May 01, 2014

Re: Parallel execution of unittests

Posted by Jonathan M Davis
in reply to Steven Schveighoffer

Jonathan M Davis

Posted in reply to Steven Schveighoffer

On Wed, 30 Apr 2014 20:33:06 -0400
Steven Schveighoffer via Digitalmars-d <digitalmars-d@puremagic.com>
wrote:

> On Wed, 30 Apr 2014 13:50:10 -0400, Jonathan M Davis via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
> 
> > On Wed, 30 Apr 2014 08:59:42 -0700
> > Andrei Alexandrescu via Digitalmars-d <digitalmars-d@puremagic.com>
> > wrote:
> >
> >> On 4/30/14, 8:54 AM, bearophile wrote:
> >> > Andrei Alexandrescu:
> >> >
> >> >> A coworker mentioned the idea that unittests could be run in parallel
> >> >
> >> > In D we have strong purity to make more safe to run code in parallel:
> >> >
> >> > pure unittest {}
> >>
> >> This doesn't follow. All unittests should be executable concurrently. -- Andrei
> >>
> >
> > In general, I agree. In reality, there are times when having state across unit tests makes sense - especially when there's expensive setup required for the tests.
> 
> int a;
> unittest
> {
>     // set up a;
> }
> 
> unittest
> {
>     // use a;
> }
> 
> ==>
> 
> unittest
> {
>     int a;
>     {
>        // set up a;
>     }
>     {
>        // use a;
>     }
> }
> 
> It makes no sense to do it the first way, you are not gaining anything.

It can make sense to do it the first way when it's more like

LargeDocumentOrDatabase foo;

unittest
{
    // set up foo;
}

unittest
{
   // test something using foo
}

unittest
{
  // do other tests using foo which then take advantage of changes made
  // by the previous test rather than doing all of those changes to
  // foo in order to set up this test
}

In general, I agree that tests shouldn't be done that way, and I don't think that I've ever done it personally, but I've seen it done, and for stuff that requires a fair bit of initialization, it can save time to have each test build on the state of the last. But even if we all agree that that sort of testing is a horrible idea, the language supports it right now, and automatically parallelizing unit tests will break any code that does that.

> > Honestly, the idea of running unit tests in parallel makes me very nervous. In general, across modules, I'd expect it to work, but there will be occasional cases where it will break.
> 
> Then you didn't write your unit-tests correctly. True unit tests-anyway.
> 
> In fact, the very quality that makes unit tests so valuable (that they are independent of other code) is ruined by sharing state across tests. If you are going to share state, it really is one unit test.

All it takes is that tests in two separate modules which have separate functionality access the file system or sockets or some other system resource, and they could end up breaking due to the fact that the other test is messing with the same resource. I'd expect that to be a relatively rare case, but it _can_ happen, so simply parallelizing tests across modules does risk test failures that would not have occurred otherwise.

> > Across the unittest
> > blocks in a single module, I'd be _very_ worried about breakage.
> > There is nothing whatsoever in the language which guarantees that
> > running them in parallel will work or even makes sense. All that
> > protects us is the convention that unit tests are usually
> > independent of each other, and in my experience, it's common enough
> > that they're not independent that I think that blindly enabling
> > parallelization of unit tests across a single module is definitely
> > a bad idea.
> 
> I think that if we add the assumption, the resulting fallout would be easy to fix.
> 
> Note that we can't require unit tests to be pure -- non-pure functions need testing too :)

Sure, they need testing. Just don't test them in parallel, because they're not guaranteed to work in parallel. That guarantee _does_ hold for pure functions, because they don't access global, mutable state. So, we can safely parallelize a unittest block that is pure, but we _can't_ safely paralellize one that isn't - not in a guaranteed way.

> I can imagine that even if you could only parallelize 90% of unit tests, that would be an effective optimization for a large project. In such a case, the rare (and I mean rare to the point of I can't think of a single use-case) need to deny parallelization could be marked.

std.file's unit tests would break immediately. It wouldn't surprise me if std.socket's unit tests broke. std.datetime's unit tests would probably break on Posix systems, because some of them temporarily set the local time zone - which sets it for the whole program, not just the current thread (those tests aren't done on Windows, because Windows only lets you set it for the whole OS, not just the program). Any tests which aren't pure risk breakage due to changes in whatever global, mutable state they're accessing.

I would strongly argue that automatically parallelizing any unittest block which isn't pure is a bad idea, because it's not guaranteed to work, and it _will_ result in bugs in at least some cases. If we make it so that unittest blocks have their purity inferred (and allow you to mark them as pure to enforce that they be pure if you want to require it), then any unittest blocks which can safely be parallelized will be known, and the test runner could then parallelize those unittest functions and then _not_ parallelize the ones that it can't guarantee are going to work in parallel.

So, then we get safe, unittest parallelization without having to insist that folks write their unit tests in a particular way or that they do or don't do particular things in a unit test. And maybe we can add add some sort of UDA to tell the test runner that an impure test can be safely parallelized, but automatically parallelizing impure unittest functions would be akin to automatically treating @system functions as @safe just because we thought that only @safe code should be used in this particular context.

- Jonathan M Davis

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation