February 23, 2012
On Thu, Feb 23, 2012 at 12:07:40PM -0800, Jonathan M Davis wrote:
> On Thursday, February 23, 2012 07:47:55 H. S. Teoh wrote:
[...]
> > The way I understand it, DbC is used for ensuring *program* correctness (ensure that program logic does not get itself into a bad state); defensive programming is for sanitizing *user input* (ensure that no matter what the user does, the program doesn't get into a bad state).
> > 
> > That's why DbC is compiled out in release mode -- the assumption is that you have thoroughly tested your program logic and verified there are no logic problems. Input sanitizing is never compiled out, because you never know what users will do, so you always have to check.
> > 
> > The two do somewhat overlap, of course. For example, failing to sanitize user input may eventually lead to passing invalid arguments to an internal function.
> 
> Exactly. But where things tend to blur is the concept of "user input." For instance, if you're using a 3rd party library, should it be asserting on the arguments that you pass it?

In my book, a linked library shares equal status with the "main program", therefore the definition of "user input" still sits at the internal-to-program and external boundary.


> Unless you compile it in non-release mode, it obviously won't, which could be an argument for using exceptions, but regardless of that, from the library's perspective, you're a user.

I believe the traditional way is to ship a debug or devel version of the library which is linked when you compile in non-release mode (e.g., libc-dbg), and then in release mode the release mode library is linked (libc proper). That way DbC will be enforced in non-release mode by the library, and suppressed in the release mode binary.

If libraries only ship in release mode, then that sorta defeats the point of DbC, which is to ensure program correctness before release and not get in the way after. Now the library has to be paranoid and always sanitize all inputs.


> If it used DbC, it would be putting assertions it in its own code to test _your_ code.  And since you're a user, it arguably should use exceptions to make sure that the arguments that it gets are correct.

No, the library should ship a development version with all contracts compiled-in, so that contract violations will be enforced during development & testing.

Sadly, this isn't often done in practice, which leads to the sad situation where the program/library boundary has a lot of overhead, because the library must be paranoid and always sanitize all inputs no matter what.


[...]
> Arguably, the best thing would be if there was a way for the caller to indicate whether it wanted the callee to have DbC enable and possibly even indicate whether it wanted the callee to use DbC or defensive programming. But there's no way to do that in D, and I'm not sure that it could even be done with the C linking model - at least, there's no way to it without templatizing everything and giving an argument to the template indicating what you want, which obviously isn't a good solution (and won't work at all in the case of virtual functions, since they can't be templatized).
[...]

No need to templatize anything, just ship two versions of the library, one with DbC compiled in, one without. Let the user decide which one to link in.


T

-- 
Never trust an operating system you don't have source for! -- Martin Schulze
February 24, 2012
On Thursday, February 23, 2012 15:18:27 H. S. Teoh wrote:
> On Thu, Feb 23, 2012 at 12:07:40PM -0800, Jonathan M Davis wrote: In my book, a linked library shares equal status with the "main program", therefore the definition of "user input" still sits at the internal-to-program and external boundary.

Yes, "in your book." Some people will agree with you and some won't. It really depends on what the code is doing though IMHO. In some cases, one is better and in some cases, the other is better. But it _is_ important to remember that there's a big difference between linking against a library over which you have control and a 3rd party library.

And there are times when it just plain makes more sense to have a function which throws an exception on bad input regardless of whether it's an "internal" function or not. For instance, if you want to convert a string to something else (e.g. with SysTime's fromISOExtString or even just with std.conv.to), you need to actually verify that the string has a value which can be correctly converted. It's actually cheaper to have the function doing the conversion do the checking rather than have another function do a check first, and then have the converting function not check (save perhaps for an assertion outside of release mode), because then you'll be processing the string _twice_.

This is _not_ a cut-and-dried issue. Sometimes DbC makes more sense, and sometimes defensive programming does. You pick the one that works best for a given situation.

The whole thing is a gray area, and you're not going to get a consensus that a library should always use DbC on its functions or that it should always use defensive programming.

> > Arguably, the best thing would be if there was a way for the caller to indicate whether it wanted the callee to have DbC enable and possibly even indicate whether it wanted the callee to use DbC or defensive programming. But there's no way to do that in D, and I'm not sure that it could even be done with the C linking model - at least, there's no way to it without templatizing everything and giving an argument to the template indicating what you want, which obviously isn't a good solution (and won't work at all in the case of virtual functions, since they can't be templatized).

> No need to templatize anything, just ship two versions of the library, one with DbC compiled in, one without. Let the user decide which one to link in.

There _is_ a need to do that if the caller wants to control whether an assertion or an exception is used. There's also a need if you want to enable it in some places and not in others. However, the reality of the matter is that using a debug version of a library is as close as you're likely to get. And I'm certainly not arguing that templatizing functions in this manner would be a good idea. I'm just pointing aut that there are issues with how DbC is currently implemented.

And the primary problem with how DbC is implemented is the fact that its assertions test the caller, not the callee, but the assertions end up in the callee. So, the assertions are separated from the code that they're actually testing. It's the best that we can do at this point, but it does result in a weird situation where you end up using assertions to test _other_ people's code rather than your own.

- Jonathan M Davis
February 24, 2012
On Thu, 23 Feb 2012 15:13:17 -0000, James Miller <james@aatch.net> wrote:
> On 23 February 2012 05:09, Regan Heath <regan@netmail.co.nz> wrote:
>> On Tue, 21 Feb 2012 14:19:17 -0000, Andrei Alexandrescu
>> <SeeWebsiteForEmail@erdani.org> wrote:
>>
>>> On 2/21/12 5:55 AM, Regan Heath wrote:
>>>>
>>>> On Sun, 19 Feb 2012 23:04:59 -0000, Andrei Alexandrescu
>>>> <SeeWebsiteForEmail@erdani.org> wrote:
>>>>
>>>>> On 2/19/12 4:00 PM, Nick Sabalausky wrote:
>>>>
>>>>
>>>>>> Seriously, how is this not *already* crystal-clear? I feel as if
>>>>>> every few
>>>>>> weeks you're just coming up with deliberately random shit to argue so
>>>>>> the
>>>>>> rest of us have to waste our time spelling out the obvious in insanely
>>>>>> pedantic detail.
>>>>>
>>>>>
>>>>> It sometimes happened to me to be reach the hypothesis that my
>>>>> interlocutor must be some idiot. Most often I was missing something.
>>>>
>>>>
>>>> I get the impression that you find "Devil's advocate" a useful tool for
>>>> generating debate and out of the box thinking.. there is something to be
>>>> said for that, but it's probably less annoying to some if you're clear
>>>> about that from the beginning. :p
>>>
>>>
>>> Where did it seem I was playing devil's advocate? Thanks.
>>
>>
>> "Devil's Advocate" is perhaps not the right term, as you don't seem to ever
>> argue the opposite to what you believe.  But, it occasionally seems to me
>> that you imply ignorance on your part, in order to draw more information
>> from other posters on exactly what they think or are proposing.  So, some
>> get frustrated as they feel they have to explain "everything" to you (and
>> not just you, there have been times where - for whatever reason - it seems
>> that anything less than a description of every single minute detail results
>> in a miss understanding - no doubt partly due to the medium in which we are
>> communicating).
>>
>>
>> Regan
>>
>> --
>> Using Opera's revolutionary email client: http://www.opera.com/mail/
>
> I think that is technically called being facetious.

Doesn't seem quite right to me:
http://dictionary.reference.com/browse/facetious

R

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/
February 24, 2012
On Thu, Feb 23, 2012 at 07:06:02PM -0500, Jonathan M Davis wrote:
> On Thursday, February 23, 2012 15:18:27 H. S. Teoh wrote:
> > On Thu, Feb 23, 2012 at 12:07:40PM -0800, Jonathan M Davis wrote: In my book, a linked library shares equal status with the "main program", therefore the definition of "user input" still sits at the internal-to-program and external boundary.
> 
> Yes, "in your book." Some people will agree with you and some won't. It really depends on what the code is doing though IMHO. In some cases, one is better and in some cases, the other is better. But it _is_ important to remember that there's a big difference between linking against a library over which you have control and a 3rd party library.
> 
> And there are times when it just plain makes more sense to have a function which throws an exception on bad input regardless of whether it's an "internal" function or not. For instance, if you want to convert a string to something else (e.g. with SysTime's fromISOExtString or even just with std.conv.to), you need to actually verify that the string has a value which can be correctly converted. It's actually cheaper to have the function doing the conversion do the checking rather than have another function do a check first, and then have the converting function not check (save perhaps for an assertion outside of release mode), because then you'll be processing the string _twice_.

I guess this is a judgment call. Personally, I would consider arguments to string conversion functions to be "user input" even though, technically speaking, you could just be passing a byte array literal to it, in which case it's just a case of bad input parameters.


> This is _not_ a cut-and-dried issue. Sometimes DbC makes more sense, and sometimes defensive programming does. You pick the one that works best for a given situation.
> 
> The whole thing is a gray area, and you're not going to get a consensus that a library should always use DbC on its functions or that it should always use defensive programming.

I wasn't trying to say that library code should always use DbC and application code should always use defensive programming. I'm saying that if it makes sense for a function to use DbC (or vice versa) then it should use DbC regardless of whether it's in a library or not.  If I were to write a string conversion function, for example, I wouldn't use contracts to enforce the right encoding, regardless of whether it's in a library or in application code. I would use exceptions, simply because that's what makes sense in this case. Just because something is in the library shouldn't change whether DbC or defensive programming is used. It's the semantics that matter, not whether it's in a library.


[...]
> > No need to templatize anything, just ship two versions of the library, one with DbC compiled in, one without. Let the user decide which one to link in.
> 
> There _is_ a need to do that if the caller wants to control whether an assertion or an exception is used. There's also a need if you want to enable it in some places and not in others. However, the reality of the matter is that using a debug version of a library is as close as you're likely to get.  And I'm certainly not arguing that templatizing functions in this manner would be a good idea. I'm just pointing aut that there are issues with how DbC is currently implemented.
[...]

Actually, I wonder if it makes sense for the compiler to insert in-contract code in the *caller* instead of the callee. Conceptually speaking, an in-contract means "you have to fulfill these conditions before calling this function". So why not put the check in the caller?

Similarly, an out-contract means "this function's return value will satisfy these conditions" - so let the caller verify that this is true.

Semantically it amounts to the same thing, but this gives us more flexibility: the library doesn't have to be compiled with contracts on/off, the contracts are in the library API, and the user tells the compiler whether or not to wrap the contract code around each call to the library.

(Yes this bloats the code everywhere a DbC function is called, but this is supposed to be done in non-release builds only anyway, so I don't think that matters so much.)


T

-- 
Тише едешь, дальше будешь.
February 24, 2012
On Fri, Feb 24, 2012 at 07:57:13AM -0800, H. S. Teoh wrote:
> > On Thursday, February 23, 2012 15:18:27 H. S. Teoh wrote:
> > > In my book, a linked library shares equal status with the "main program", therefore the definition of "user input" still sits at the internal-to-program and external boundary.
[...]
> I wasn't trying to say that library code should always use DbC and application code should always use defensive programming. I'm saying that if it makes sense for a function to use DbC (or vice versa) then it should use DbC regardless of whether it's in a library or not.
[...]

Argh, I just realized that my first post was so poorly worded it made no sense at all. My second post was what I meant to say. :)

What I was trying to express in the first post was that "user input" comes from a source external to the program, whether from a user typing at the keyboard, or from a file or network resource, and this data traverses program code paths until eventually they are converted into the internal form the program uses for further processing.  Input sanitization should be done along this code path until the input is processed into program-internal form, at which point, DbC begins to take effect, the assumption being that after preprocessing by the input sanitization code, all data should be valid, and if not, it's a failure of the input processing code and represents a logic flaw in the program, therefore an assertion should be thrown.


T

-- 
Those who don't understand Unix are condemned to reinvent it, poorly.
February 24, 2012
On Friday, February 24, 2012 07:57:13 H. S. Teoh wrote:
> Actually, I wonder if it makes sense for the compiler to insert in-contract code in the *caller* instead of the callee. Conceptually speaking, an in-contract means "you have to fulfill these conditions before calling this function". So why not put the check in the caller?
> 
> Similarly, an out-contract means "this function's return value will satisfy these conditions" - so let the caller verify that this is true.
> 
> Semantically it amounts to the same thing, but this gives us more flexibility: the library doesn't have to be compiled with contracts on/off, the contracts are in the library API, and the user tells the compiler whether or not to wrap the contract code around each call to the library.
> 
> (Yes this bloats the code everywhere a DbC function is called, but this is supposed to be done in non-release builds only anyway, so I don't think that matters so much.)

It wouldn't work unless the source were available, because otherwise you just have the function signature. So, the result would be inconsistent, and in the case of non-templated functions built against a non-release version of the callee, you'd end up having the checks twice, because the callee would have to have them regardless (since it's compiled separately, and not every function calling it would necessarily have its source). For templated functions, it already depends on the caller whether the assertions are enabled, because it's instantiated when the module with the caller in it is built. It's an interesting idea though.

- Jonathan M Davis
February 24, 2012
On Friday, February 24, 2012 08:27:44 H. S. Teoh wrote:
> On Fri, Feb 24, 2012 at 07:57:13AM -0800, H. S. Teoh wrote:
> > > On Thursday, February 23, 2012 15:18:27 H. S. Teoh wrote:
> > > > In my book, a linked library shares equal status with the "main program", therefore the definition of "user input" still sits at the internal-to-program and external boundary.
> 
> [...]
> 
> > I wasn't trying to say that library code should always use DbC and application code should always use defensive programming. I'm saying that if it makes sense for a function to use DbC (or vice versa) then it should use DbC regardless of whether it's in a library or not.
> 
> [...]
> 
> Argh, I just realized that my first post was so poorly worded it made no sense at all. My second post was what I meant to say. :)
> 
> What I was trying to express in the first post was that "user input" comes from a source external to the program, whether from a user typing at the keyboard, or from a file or network resource, and this data traverses program code paths until eventually they are converted into the internal form the program uses for further processing. Input sanitization should be done along this code path until the input is processed into program-internal form, at which point, DbC begins to take effect, the assumption being that after preprocessing by the input sanitization code, all data should be valid, and if not, it's a failure of the input processing code and represents a logic flaw in the program, therefore an assertion should be thrown.

Yes. In general, that's the core difference between assertions and exceptions. If an assertion fails, it's a bug in the code, whereas if an exception is thrown, then it may or may not be caused by a program bug (and is frequently caused by interacting with I/O - be it directly or indirectly).

But that does require a judgement call sometimes as to which approach is better in a particular situation, and if you're being utterly paranoid (which some programs probably need to be but most don't), then you could end up using exceptions where you'd normally use assertions simply because you want to _guarantee_ that the check is always done. But hopefully, that sort of thing would be kept to a minimum.

Regarldess, at the core, assertions are for verifying program correctness, and exceptions are for reporting error conditions caused by bad stuff happening during the normal operation of the program.

- Jonathan M Davis
February 24, 2012
On Fri, Feb 24, 2012 at 01:46:56PM -0500, Jonathan M Davis wrote:
> On Friday, February 24, 2012 07:57:13 H. S. Teoh wrote:
> > Actually, I wonder if it makes sense for the compiler to insert in-contract code in the *caller* instead of the callee. Conceptually speaking, an in-contract means "you have to fulfill these conditions before calling this function". So why not put the check in the caller?
> > 
> > Similarly, an out-contract means "this function's return value will satisfy these conditions" - so let the caller verify that this is true.
> > 
> > Semantically it amounts to the same thing, but this gives us more flexibility: the library doesn't have to be compiled with contracts on/off, the contracts are in the library API, and the user tells the compiler whether or not to wrap the contract code around each call to the library.
> > 
> > (Yes this bloats the code everywhere a DbC function is called, but this is supposed to be done in non-release builds only anyway, so I don't think that matters so much.)
> 
> It wouldn't work unless the source were available, because otherwise you just have the function signature.
[...]

In my mind, contract code belongs in the function signature, because they document how the function expects to be called, and what it guarantees in return. It doesn't seem to make sense to me that contracts would be hidden from the user of the library. Sorta defeats the purpose, since how is the user supposed to know what the function expects? Rely on documentation, perhaps, but docs aren't as reliable as actual contract code.


T

-- 
Без труда не выловишь и рыбку из пруда.
February 24, 2012
On 2/24/12 1:13 PM, H. S. Teoh wrote:
> In my mind, contract code belongs in the function signature, because
> they document how the function expects to be called, and what it
> guarantees in return. It doesn't seem to make sense to me that contracts
> would be hidden from the user of the library. Sorta defeats the purpose,
> since how is the user supposed to know what the function expects? Rely
> on documentation, perhaps, but docs aren't as reliable as actual
> contract code.

Yah, and that's why we managed, with great implementation effort, to allow contract checks in interfaces. The concept has still to take off though.

Andrei
February 24, 2012
On Sat, 18 Feb 2012 13:52:05 -0500, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:

> There's a discussion that started in a pull request:
>
> https://github.com/alexrp/phobos/commit/4b87dcf39efeb4ddafe8fe99a0ef9a529c0dcaca
>
> Let's come up with a good doctrine for exception defining and handling in Phobos. From experience I humbly submit that catching by type is most of the time useless.

OK, so after reading about 100 or so of these messages, I stopped.  Sorry if this has been said before, but here is my take:

many many times when dealing with exceptions I hate to do this:

class ExceptionTypeA : Exception {...}
class ExceptionTypeB : Exception {...}


try
{
}
catch(ExceptionTypeA ex)
{
   // code
}
catch(ExceptionTypeB ex)
{
   // same f'ing code
}

I know, I could do this:

catch(Exception e)
{
   if(cast(ExceptionTypeA)e || cast(ExceptionTypeB)e)
   {
      // code
   }
   else
     throw e;
}

But that sucks, and as others have pointed out, it kills the stack trace.

So I love the idea that others have specified to have a 'template constraint' type piece for catch.  IMO, it should not be a template, but a runtime check.  i.e., I don't think we need to templatize the catch, we just need to add extra code to the 'should we catch' check that currently consists of 'does the type match'.  This would be a huge improvement over existing exception catching techniques.

So the above would become:

catch(Exception e) if(e is ExceptionTypeA or ExceptionTypeB) // not sure of the exact syntax, maybe the if statement I used above?

I agree the constraints should be pure and nothrow.


On to my second point.  One of the issues I have with Java is that exceptions are *overused*.  For example, EOF should not be an exception, most files have ends, it's not a very exceptional situation.  If there is an intuitive way to use an existing return value to convey an error rather than an exception, I'd prefer the return value.  There is a runtime cost just for setting up a try/catch block, even if no exceptions are thrown.

-Steve