Thread overview
[phobos] Split std.datetime in two?
Feb 11, 2011
Jonathan M Davis
Feb 12, 2011
Jonathan M Davis
February 11, 2011
----- Original Message -----

> From:Andrei Alexandrescu <andrei at erdani.com>
> 
> > ----- Original Message -----
> >> From:Andrei Alexandrescu<andrei at erdani.com>
> >> 
> >> On Feb 11, 2011, at 2:39 PM, Steve
> Schveighoffer<schveiguy at yahoo.com>
> >> wrote:
> > 
> >    assert(foo(x)>  5);
> > 
> > which x caused the problem?  all I get is the line number for the assert.
> 
> When the unittest fails I go and edit the code as such:
> 
>     assert(foo(x), text("foo(", x, ") failed"));

As long as the unit test looks like this to begin with, I have no problem.  It should be the responsibility of the unit test writer to make sure the error messages clearly indicate what inputs cause the error.

I also want to stress that this is a simple loop example, and not too difficult to understand.  I don't think complex factored code is a good idea to have.

BTW, where is text defined?  It should be available everywhere.

> >>> I don't sympathize with you, we have tools to easily do this
> without
> >> much burden.
> >> 
> >> A little while ago you didn't care to start a debugger or use
> writeln - two
> >> simple tools. I cry double standard.
> > 
> > Um.. how are you reading the code if not using an editor?  Using ctrl-F and
> typing in the function name is several orders of magnitude simpler than adding writelns, recompiling, or recompiling in debug mode (I'm counting the part where you have to dissect the makefile to figure out how to put the thing in debug mode in the first place) and debugging.  It's like saying the burden of using the claw of a hammer that's already in your hand is the same as going to get a reciprocating saw to remove a nail.  It's not even close to a double standard.
> 
> How are you debugging code if not using stdio or a debugger?

It's an extra cycle to find the error conditions.  Essentially, I have to first instrument the unit test to find out what is failing, then go instrument the function that is failing.  Sometimes, I don't even need to do that, I can see the problem and just fix it, requiring no debugging.

> > Then I guess we differ.  I prefer to have unit tests be simple statements
> that are easily proven mentally without much context.  Complex code is more difficult to reason about, and as spir said, we don't want to have to unit test our unit tests.
> 
> A simple loop is simpler than 100 unrolled instances of it.

Only if you are reading all 100 instances.  When a unit test fails, I only care about 1.  When reading code, I skip all unit tests, be they a loop of 100 or 100 individual ones.

> 
> > I attach to this statement that the unit tests should test *different*
> things even if they look repetitive.  For example, testing two corner cases might look very repetitive, but they both are looking for weaknesses in different places.
> 
> I am certain that the unittests in std.datetime are well beyond 100% code coverage. IMHO they should stop at 100% coverage (which is probably at 20% of their size).

I cannot give an opinion on the coverage, but I agree that they should stop at 100%.

> 
> Again, I feel like you do, save the numbers. Let me give you an example. Many would agree that sex is good. Yet few would agree that being required to perform it for 12 hours can be terrible. Eating is good too. Eating seven pounds of even the best food is a horrific experience.

I'm sure you meant something different about the sex analogy :)

I think these analogies are not relevant.  One does not read scores of unit test, and writing a loop does not prevent you from having to come up with all the unit tests.

> There is a limit to everything. That limit has been passed many times over by the unittests in std.datetime.

The limit should be 100% coverage.  If std.datetime passes that, it should be trimmed.

-Steve




____________________________________________________________________________________
It's here! Your new message!
Get new email alerts with the free Yahoo! Toolbar.
http://tools.search.yahoo.com/toolbar/features/mail/
February 11, 2011
On 2/11/11 4:13 PM, Steve Schveighoffer wrote:
>> From:Andrei Alexandrescu<andrei at erdani.com>
>> When the unittest fails I go and edit the code as such:
>>
>>      assert(foo(x), text("foo(", x, ") failed"));
>
> As long as the unit test looks like this to begin with, I have no problem.  It should be the responsibility of the unit test writer to make sure the error messages clearly indicate what inputs cause the error.

I never start with the detailed message. I add them opportunistically as unittests fail. Works /perfectly/.

> I also want to stress that this is a simple loop example, and not too difficult to understand.  I don't think complex factored code is a good idea to have.

Agreed, but I repeat: an overwhelming majority of std.datetime's unittests can be converted in simple loops, as I took the time to exemplify in another message.

> BTW, where is text defined?  It should be available everywhere.

std.conv.


Andrei
February 11, 2011
On Friday, February 11, 2011 14:13:29 Steve Schveighoffer wrote:
> The limit should be 100% coverage.  If std.datetime passes that, it should be trimmed.

The problem with that is that all code coverage indicates is whether each line of code is run. In some cases, that may be enough, but as soon as anything like math is involved, it certainly isn't. Just because you covered a particular code path doesn't mean that that code path works correctly. What if a particular value ends up on a particular code path when it should be on another, or if it _does_ belong on that code path, but the result is incorrect for that particular value even though it works with other values?

Also, does code coverage take into account take into account the various specializations of a template? And even if it does, it could be that the template is correctly written for a particular type but not another, and yet both instantiate with it, generating the same code. You'd only catch that if you tested both of those types with the template.

Code coverage is a useful metric, but I don't think that it grasps the whole picture at all. I think that a module which has 100% code coverage can still be buggy, because its tests didn't cover enough cases. There is a difference between testing every code path and testing that the function works correctly for all inputs. Obviously, you're not going to test for _all_ possible inputs - it's completely impractical to do so in most cases - but often, I don't think that even the number of inputs required to get 100% code coverage is sufficient.

By the way, with how -cov is currently set up, getting 100% test coverage is often impossible anyway, simply because you end up with code paths with assert(0) in them which should _never_ happen and so that code path _never_ gets hit. A prime example would be a function which is nothrow but calls functions which can technically throw even though you know that they never will with the given  input. You're pretty much forced to use a catch block with assert(0) in it. Even perfectly tested code will never hit that code path, because it should never actually happen if the code is logically correct.

- Jonathan M Davis
February 11, 2011
On 2/11/11 4:53 PM, Jonathan M Davis wrote:
> On Friday, February 11, 2011 14:13:29 Steve Schveighoffer wrote:
>> The limit should be 100% coverage.  If std.datetime passes that, it should be trimmed.
>
> The problem with that is that all code coverage indicates is whether each line of code is run. In some cases, that may be enough, but as soon as anything like math is involved, it certainly isn't. Just because you covered a particular code path doesn't mean that that code path works correctly. What if a particular value ends up on a particular code path when it should be on another, or if it _does_ belong on that code path, but the result is incorrect for that particular value even though it works with other values?

Agreed.

> Also, does code coverage take into account take into account the various specializations of a template? And even if it does, it could be that the template is correctly written for a particular type but not another, and yet both instantiate with it, generating the same code. You'd only catch that if you tested both of those types with the template.

Agreed.

> Code coverage is a useful metric, but I don't think that it grasps the whole picture at all. I think that a module which has 100% code coverage can still be buggy, because its tests didn't cover enough cases. There is a difference between testing every code path and testing that the function works correctly for all inputs. Obviously, you're not going to test for _all_ possible inputs - it's completely impractical to do so in most cases - but often, I don't think that even the number of inputs required to get 100% code coverage is sufficient.
>
> By the way, with how -cov is currently set up, getting 100% test coverage is often impossible anyway, simply because you end up with code paths with assert(0) in them which should _never_ happen and so that code path _never_ gets hit. A prime example would be a function which is nothrow but calls functions which can technically throw even though you know that they never will with the given  input. You're pretty much forced to use a catch block with assert(0) in it. Even perfectly tested code will never hit that code path, because it should never actually happen if the code is logically correct.


Agreed. I was hasty to posit that coverage is enough. That doesn't dilute any of my other arguments and points :o).


Andrei
February 11, 2011
On Friday, February 11, 2011 16:56:02 Andrei Alexandrescu wrote:
> On 2/11/11 4:53 PM, Jonathan M Davis wrote:
> > On Friday, February 11, 2011 14:13:29 Steve Schveighoffer wrote:
> >> The limit should be 100% coverage.  If std.datetime passes that, it should be trimmed.
> > 
> > The problem with that is that all code coverage indicates is whether each line of code is run. In some cases, that may be enough, but as soon as anything like math is involved, it certainly isn't. Just because you covered a particular code path doesn't mean that that code path works correctly. What if a particular value ends up on a particular code path when it should be on another, or if it _does_ belong on that code path, but the result is incorrect for that particular value even though it works with other values?
> 
> Agreed.
> 
> > Also, does code coverage take into account take into account the various specializations of a template? And even if it does, it could be that the template is correctly written for a particular type but not another, and yet both instantiate with it, generating the same code. You'd only catch that if you tested both of those types with the template.
> 
> Agreed.
> 
> > Code coverage is a useful metric, but I don't think that it grasps the whole picture at all. I think that a module which has 100% code coverage can still be buggy, because its tests didn't cover enough cases. There is a difference between testing every code path and testing that the function works correctly for all inputs. Obviously, you're not going to test for _all_ possible inputs - it's completely impractical to do so in most cases - but often, I don't think that even the number of inputs required to get 100% code coverage is sufficient.
> > 
> > By the way, with how -cov is currently set up, getting 100% test coverage is often impossible anyway, simply because you end up with code paths with assert(0) in them which should _never_ happen and so that code path _never_ gets hit. A prime example would be a function which is nothrow but calls functions which can technically throw even though you know that they never will with the given  input. You're pretty much forced to use a catch block with assert(0) in it. Even perfectly tested code will never hit that code path, because it should never actually happen if the code is logically correct.
> 
> Agreed. I was hasty to posit that coverage is enough. That doesn't dilute any of my other arguments and points :o).

No, it doesn't dilute them. But if someone argues that the number of tests in std.datetime (or anything else for that matter) should be reduced based on the fact that it doesn't need that many to reach 100% code coverage, I'm going to argue strongly against that. There may be other valid arguments for reducing the number of tests, and there are obviously arguments for consolidating how much code is used to test a particular amount (as you've been arguing in the case of std.datetime), but I won't buy that reaching 100% code coverage is sufficient. Low code coverage is definitely a bad sign, but high code coverage doesn't necessarily mean that the tests are anywhere near sufficient.

- Jonathan M Davis