November 18, 2013
On 11/18/13 2:04 PM, Walter Bright wrote:
> On 11/18/2013 1:02 PM, Jacob Carlborg wrote:
>> I really don't understand the point in having some functions thrown
>> and others
>> assert.
>
> That's a red flag that there is a faulty design somewhere, in this case
> I think it's not having a separation between input data validation and
> program bug detection.

The question is what's "program" and what's "input". Consider:

int a = fun();
std.gun(a);

There are two possible takes on this:

1. The standard library is considered part of the user's program, and the whole thing is a unit. In that case, passing the wrong int to std.gun is an PROGRAM error and 100% blame goes to the programmer who wrote the caller code. In that case, assert/assert(0)/contracts are the appropriate constructs to be used inside std.gun.

This is the approach taken by the C standard library, which is free to do whatever it wants (including crashing the program) upon calls such as strlen(NULL) etc.

2. The standard library is a separate entity from the PROGRAM, and as far as it cares, any data from the user is INPUT. So the standard library with SCRUB the input, in which case enforce() and throwing exceptions are appropriate.

This is the approach taken by the Windows API, Java, C#, and to a good extent the newer parts of C++'s standard library.

To claim that one approach is exactly right and the other is exactly wrong would miss important insights.


Andrei

November 18, 2013
On 11/18/2013 2:37 PM, Andrei Alexandrescu wrote:
> On 11/18/13 1:50 PM, Walter Bright wrote:
>> I believe this is solving the problem in the wrong place. The function
>> being called should not be throwing, and should be defined as not
>> throwing. See my other posts in this thread on the topic.
>
> I think a stance of unceremoniously halting the program upon passing the wrong
> parameters to a standard library function is a bit excessive.

Dang, I gotta argue this with you too? :-)

Forget the philosophical argument for the moment and let's try a pragmatic one - code that doesn't check its arguments and doesn't throw is going to inline better, run faster, and be smaller. To have a high performance language, we have to have this.

This is why we have a -release switch: so we can run with or without checking the arguments for bugs.

Input data validation is where exceptions come in. Those are relatively rare compared with processing validated data, and so separating out the slow data validation path from the validated high speed path makes pragmatic sense.

Rejecting this approach throws (!) out the whole concept of contract programming.

November 18, 2013
On 11/18/13 2:49 PM, Walter Bright wrote:
> On 11/18/2013 2:37 PM, Andrei Alexandrescu wrote:
>> On 11/18/13 1:50 PM, Walter Bright wrote:
>>> I believe this is solving the problem in the wrong place. The function
>>> being called should not be throwing, and should be defined as not
>>> throwing. See my other posts in this thread on the topic.
>>
>> I think a stance of unceremoniously halting the program upon passing
>> the wrong
>> parameters to a standard library function is a bit excessive.
>
> Dang, I gotta argue this with you too? :-)

Yes. I agree with your fundamental point but it is missing important nuances. I destroyed the misunderstanding in http://goo.gl/3xD5Aj.

Andrei

November 18, 2013
On 11/18/2013 2:16 PM, Jonathan M Davis wrote:
> Sometimes that's true, but if you're asserting that it's not going to be the
> case that folks are going to need to calling functions which can throw but
> won't throw from inside nothrow functions, I think that your dead wrong. If
> nothing else, you don't always have control over what is and isn't going to
> throw. Maybe it's uncommon enough that it's not worth trying to optimize out
> the try-catch blocks, but I think that asserting that you won't ever need to
> call a throwing function from a nothrow function when you know it won't throw
> is like asserting that @trusted isn't needed.

What I'm saying is it is bad API design to conflate data processing with input data validation. You may not always have control over the API design and may have to live with bad API design, but in std.datetime's case we do have control and we can do the right thing.

Joel Spolsky wrote a nice column http://www.joelonsoftware.com/articles/Wrong.html about this years ago, about conceptually separating input data validation operations from data crunching operations. This distinction is baked into D's design:

exceptions - input data validation

asserts - crunching on data that has been validated

Code like this:

   try {
        func();
   } catch (Exception e) {
       assert(0, "func() should not have thrown");
   }

clearly shows a design failure, either in func()'s design or how func() is used.
November 18, 2013
On 11/18/2013 2:45 PM, Andrei Alexandrescu wrote:
> There are two possible takes on this:
>
> 1. The standard library is considered part of the user's program, and the whole
> thing is a unit. In that case, passing the wrong int to std.gun is an PROGRAM
> error and 100% blame goes to the programmer who wrote the caller code. In that
> case, assert/assert(0)/contracts are the appropriate constructs to be used
> inside std.gun.
>
> This is the approach taken by the C standard library, which is free to do
> whatever it wants (including crashing the program) upon calls such as
> strlen(NULL) etc.
>
> 2. The standard library is a separate entity from the PROGRAM, and as far as it
> cares, any data from the user is INPUT. So the standard library with SCRUB the
> input, in which case enforce() and throwing exceptions are appropriate.
>
> This is the approach taken by the Windows API, Java, C#, and to a good extent
> the newer parts of C++'s standard library.
>
> To claim that one approach is exactly right and the other is exactly wrong would
> miss important insights.

Or:

3. Input validation and data processing should have separate functions in the library.

(The Windows API is a special case - it must regard all input as untrusted, unvalidated input, and it must protect Windows itself from malicious input. This is not true of Phobos.)
November 18, 2013
On 11/18/2013 2:52 PM, Andrei Alexandrescu wrote:
> Yes. I agree with your fundamental point but it is missing important nuances. I
> destroyed the misunderstanding in http://goo.gl/3xD5Aj.

And I destroyed your destruction!

Anyhow, can you use real links rather than goo ones?

November 18, 2013
On 11/18/2013 11:45 PM, Andrei Alexandrescu wrote:
> On 11/18/13 2:04 PM, Walter Bright wrote:
>> On 11/18/2013 1:02 PM, Jacob Carlborg wrote:
>>> I really don't understand the point in having some functions thrown
>>> and others
>>> assert.
>>
>> That's a red flag that there is a faulty design somewhere, in this case
>> I think it's not having a separation between input data validation and
>> program bug detection.
>
> The question is what's "program" and what's "input". Consider:
>
> int a = fun();
> std.gun(a);
>
> There are two possible takes on this:
>
> 1. The standard library is considered part of the user's program, and
> the whole thing is a unit. In that case, passing the wrong int to
> std.gun is an PROGRAM error and 100% blame goes to the programmer who
> wrote the caller code. In that case, assert/assert(0)/contracts are the
> appropriate constructs to be used inside std.gun.
>
> This is the approach taken by the C standard library, which is free to
> do whatever it wants (including crashing the program) upon calls such as
> strlen(NULL) etc.
>
> 2. The standard library is a separate entity from the PROGRAM, and as
> far as it cares, any data from the user is INPUT. So the standard
> library with SCRUB the input, in which case enforce() and throwing
> exceptions are appropriate.
>
> This is the approach taken by the Windows API, Java, C#, and to a good
> extent the newer parts of C++'s standard library.
>
> To claim that one approach is exactly right and the other is exactly
> wrong would miss important insights.
>...

1. Has the shortcoming that you may actually get assertion failures.

2. is not wrong but aesthetically unpleasing as it essentially widens the interface to allow all possible inputs and extends it with behaviour that is useless in order to avoid the situation that the caller does not satisfy the interface. There is also:

3. Require both caller and callee to construct a compile-time checkable proof that they satisfy their part of the interface.

This is the approach taken by dependently typed programming languages, and program verifiers. This subsumes 1. and 2. and is a solution to the original problem since in such a setting it can be manually proven that a function call will not throw.
November 18, 2013
On Monday, November 18, 2013 14:45:54 Andrei Alexandrescu wrote:
> On 11/18/13 2:04 PM, Walter Bright wrote:
> > On 11/18/2013 1:02 PM, Jacob Carlborg wrote:
> >> I really don't understand the point in having some functions thrown
> >> and others
> >> assert.
> > 
> > That's a red flag that there is a faulty design somewhere, in this case I think it's not having a separation between input data validation and program bug detection.
> 
> The question is what's "program" and what's "input". Consider:
> 
> int a = fun();
> std.gun(a);
> 
> There are two possible takes on this:
> 
> 1. The standard library is considered part of the user's program, and the whole thing is a unit. In that case, passing the wrong int to std.gun is an PROGRAM error and 100% blame goes to the programmer who wrote the caller code. In that case, assert/assert(0)/contracts are the appropriate constructs to be used inside std.gun.
> 
> This is the approach taken by the C standard library, which is free to
> do whatever it wants (including crashing the program) upon calls such as
> strlen(NULL) etc.
> 
> 2. The standard library is a separate entity from the PROGRAM, and as far as it cares, any data from the user is INPUT. So the standard library with SCRUB the input, in which case enforce() and throwing exceptions are appropriate.
> 
> This is the approach taken by the Windows API, Java, C#, and to a good extent the newer parts of C++'s standard library.
> 
> To claim that one approach is exactly right and the other is exactly wrong would miss important insights.

+1

- Jonathan M Davis
November 18, 2013
On Monday, November 18, 2013 14:40:51 Walter Bright wrote:
> I'm glad we're discussing this. There's an important misunderstanding.
> 
> On 11/18/2013 2:16 PM, Jonathan M Davis wrote:
> > But that would just duplicate the validation. You validate by parsing the string, and you extract the necessary data from it by parsing it. Validating the data first would just double the work - on top of the fact that strings are most likely to have come from outside the program rather than having been generated internally and then parsed internally. This is exactly the sort of case where I think that separate validation makes no sense. Separate validation only makes sense when the result is _less_ overhead, not more.
> The point of asserts is that they are supposed to be redundant. (If they were not redundant, then they would trip and you'd have a program bug.) Asserts are there to detect program bugs, not to validate input data. This is also why the optimizer can remove asserts without affecting the meaning of the code.

I understand this. The problem is that in some cases, in order to do the check, you have to do all of the work that the function you're trying to protect bad input from has to do anyway - even without it asserting anything. So, having a separate function do the checking would just be extra overhead. For instance, if you had to parse a string in order to get data out of it, the function checking the string's validity would have parse the string out and get all of the data out of it, validating the string's format and the data's validity in the process, whereas the function that does the actual parsing to give you the result (as opposed to checking the input) has to do all of that same parsing and data extraction. Maybe, if another function had already validated the string, it could avoid a few of the checks, but many of them have to be done just to parse the string (e.g. if its format is wrong, you can't even get at the data properly, regardless of whether the data is valid or not). So, you don't save much in the way of checking if you have a validation function, and you add overhead, because the data has to be processed twice.

In other cases, validation is as simple as asserting something about the input, in which case, it's simple enough to assert within the function (which would then go away in -release) and to have a validation function do the checking and get no extra overhead, but that's not always the case, and when it's not, it makes no sense to me to use DbC. In such cases, defensive programming makes far more sense.

Also, if the data _always_ has to be checked (which isn't always the case in std.datetime), then it makes no sense to separate the validation from the function doing the work.

I think that whether DbC or defensive programming is more appropriate comes down primarily to two things:

1. Does the validation need to be part of the function for it to do its job, or does doing the validation require doing what the function is going to do anyway? If so, the defensive programming makes more sense. If not, then DbC makes more sense.

2. Is this function treating its caller as part of the program or as a user? If the caller is being treated as part of the program, then DbC tends to make sense, as its reasonable to require that the caller knows what the function requires and is effectively part of the same code as the function. If the caller is being treated as a user (as is often going to be the case with libraries), then it's generally better to use defensive programming, because it ensures that the function gets and operates on valid input rather than resulting in undefined behavior when the caller gives bad input (and unless the library is compiled without -release or the function is templated, assertions won't do anything to help in a library).

Efficiency tends toward lean towards using DbC, whereas user-friendliness leans toward defensive programming. In general, I would use DbC internally to a program and defensive programming in a library.

Having validator functions definitely helps with DbC, as it gives the caller a way to validate the input when necessary and avoid the validation when it isn't. But it puts all the onus on the caller and makes it much, much more likely that functions will be misused, and if the function is in a library, then the odds are that if validation is done incorrectly by the caller, it'll never get checked by the callee, and you'll end up with buggy code with undefined behavior.

I think that you bring up good points, but I also don't think that the situation is anywhere near as clearcut as you do.

- Jonathan M Davis
November 19, 2013
On 2013-11-18 23:42:13 +0000, "Jonathan M Davis" <jmdavisProg@gmx.com> said:

> I understand this. The problem is that in some cases, in order to do the
> check, you have to do all of the work that the function you're trying to
> protect bad input from has to do anyway - even without it asserting anything.
> So, having a separate function do the checking would just be extra overhead.

Very true.

I'll just point out now that you could actually have only one function that does one or the other or both depending on template parameters. One template parameter is the output sink, which could be a dummy type that does nothing or an actual type that saves the data somewhere. Another parameter is the error handler, which can be a dummy type that does nothing, or it could assert or throw when an error is found. Let the optimizer remove the do-nothing code paths that will result.

Now, I really have no idea but that could be overkill in this situation.

-- 
Michel Fortin
michel.fortin@michelf.ca
http://michelf.ca