The extent of trust in errors and error handling - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » The extent of trust in errors and error handling

Thread overview

The extent of trust in errors and error handling
Feb 01, 2017 Ali Çehreli
Feb 01, 2017 Joakim
Feb 04, 2017 Ali Çehreli
Feb 01, 2017 Dukc
Feb 02, 2017 Paolo Invernizzi
Feb 02, 2017 Joakim
Feb 02, 2017 Chris Wright
Feb 04, 2017 Ali Çehreli
Feb 04, 2017 Chris Wright
Feb 05, 2017 Ali Çehreli
Feb 05, 2017 Chris Wright
Feb 06, 2017 Ali Çehreli
Feb 06, 2017 Chris Wright
Feb 06, 2017 Ali Çehreli
Feb 06, 2017 Chris Wright
Feb 06, 2017 Caspar Kielwein
Feb 06, 2017 Chris Wright
Feb 06, 2017 Dominikus Dittes Scherkl
Feb 06, 2017 Chris Wright
Feb 06, 2017 Ali Çehreli
Feb 05, 2017 Cym13
Feb 05, 2017 Ali Çehreli
Feb 05, 2017 Profile Anaysis
Feb 06, 2017 Walter Bright
Feb 06, 2017 Jacob Carlborg
Feb 06, 2017 Chris Wright
Feb 06, 2017 Walter Bright
Feb 07, 2017 Steve Biedermann

February 01, 2017

The extent of trust in errors and error handling

Posted by Ali Çehreli

Ali Çehreli

tl;dr - Seeking thoughts on trusting a system that allows "handling" errors.

One of my extra-curricular interests is the Mill CPU[1]. A recent discussion in that context reminded me of the Error-Exception distinction in languages like D.

1) There is the well-known issue of whether Error should ever be caught. If Error represents conditions where the application is not in a defined state, hence it should stop operating as soon as possible, should that also carry over to other applications, to the OS, and perhaps even to other systems in the whole cluster?

For example, if a function detected an inconsistency in a DB that is available to all applications (as is the case in the Unix model of user-based access protection), should all processes that use that DB stop operating as well?

2) What if an intermediate layer of code did in fact handle an Error (perhaps raised by a function pre-condition check)? Should the callers of that layer have a say on that? Should a higher level code be able to say that Error should not be handled at all?

For example, an application code may want to say that no library that it uses should handle Errors that are thrown by a security library.

Aside, and more related to D: I think this whole discussion is related to another issue that has been raised in this forum a number of times: Whose responsibility is it to execute function pre-conditions? I think it was agreed that pre-condition checks should be run in the context of the caller. So, not the library, but the application code, should require that they be executed. In other words, it should be irrelevant whether the library was built in release mode or not, its pre-condition checks should be available to the caller. (I think we need to fix this anyway.)

And there is the issue of the programmer making the right decision: One person's Exception may be another person's Error.

It's fascinating that there are so many fundamental questions with CPUs, runtimes, loaders, and OSes, and that some of these issues are not even semantically describable. For example, I think there is no way of requiring that e.g. a square root function not have side effects at all: The compiler can allow a piece of code but then the library that was actually linked with the application can do anything else that it wants.

Thoughts? Are we doomed? Surprisingly, not seems to be as we use computers everywhere and they seem to work. :o)

Ali

[1] http://millcomputing.com/

February 01, 2017

Re: The extent of trust in errors and error handling

Posted by Joakim
in reply to Ali Çehreli

Joakim

Posted in reply to Ali Çehreli

On Wednesday, 1 February 2017 at 19:25:07 UTC, Ali Çehreli wrote:
> tl;dr - Seeking thoughts on trusting a system that allows "handling" errors.
>
> [...]

Have you seen this long post from last year, where Joe Duffy laid out what they did with Midori?

http://joeduffyblog.com/2016/02/07/the-error-model/

Some relevant stuff in there.

February 01, 2017

Re: The extent of trust in errors and error handling

Posted by Dukc
in reply to Ali Çehreli

Dukc

Posted in reply to Ali Çehreli

On Wednesday, 1 February 2017 at 19:25:07 UTC, Ali Çehreli wrote:
> Aside, and more related to D: I think this whole discussion is related to another issue that has been raised in this forum a number of times: Whose responsibility is it to execute function pre-conditions?

Regarding that, I have trought that wouldn't it be better if it was bounds checking instead of debug vs release what determined if in contracts are called? If the contract had asserts, they would still be compiled out in release mode like all asserts are. But if it had enforce():s, their existence would obey the same logic as array bounds checks.

This would let users to implement custom bounds checked types. Fibers for example could be made @trusted, with no loss in performance for @system code in release mode.

February 02, 2017

Re: The extent of trust in errors and error handling

Posted by Chris Wright
in reply to Ali Çehreli

Chris Wright

Posted in reply to Ali Çehreli

On Wed, 01 Feb 2017 11:25:07 -0800, Ali Çehreli wrote:
> 1) There is the well-known issue of whether Error should ever be caught. If Error represents conditions where the application is not in a defined state, hence it should stop operating as soon as possible, should that also carry over to other applications, to the OS, and perhaps even to other systems in the whole cluster?

My programs tend to apply operations to a queue of data. It might be a queue over time, like incoming requests, or it might be a queue based on something else, like URLs that I extract from HTML documents.

Anything that does not impact my ability to manipulate the queue can be safely caught and recovered from.

Stack overflow? Be my guest.

Null pointer? It's a bug, but it's probably specific to a small subset of queue items -- log it, put it in the dead letter queue, move on.

RangeError? Again, a bug, but I can successfully process everything else.

Out of memory? This is getting a bit dangerous -- if I dequeue another item after OOM, I might be able to process it, and it might work (for instance, maybe you tried to download a 40GB HTML, but the next document is reasonably small). But it's not necessarily that easy to fix, and it might compromise my ability to manipulate the queue.

Assertions? That obviously isn't a good situation, but it's likely to apply only to a subset of the data.

This requires me to have two flavors of error handling: one regarding queue operations and one regarding the function I'm applying to the queue.

> For example, if a function detected an inconsistency in a DB that is available to all applications (as is the case in the Unix model of user-based access protection), should all processes that use that DB stop operating as well?

As stated, that implies each application tags itself with whether it accesses that database. Then, when the database is known to be inconsistent, we immediately shut down every application that's tagged as uing that database -- and presumably prevent other applications with the tag from starting.

It seems much more friendly not to punish applications when they're not trying to use the affected resource. Maybe init read a few configuration flags from the database on startup and it doesn't have to touch it ever again. Maybe a human will resolve the problem before this application makes its once-per-day query.

> 2) What if an intermediate layer of code did in fact handle an Error (perhaps raised by a function pre-condition check)? Should the callers of that layer have a say on that? Should a higher level code be able to say that Error should not be handled at all?
> 
> For example, an application code may want to say that no library that it uses should handle Errors that are thrown by a security library.

There's a bit of a wrinkle there. "Handling" an error might include catching it, adding some extra data, and then rethrowing.

> I think there is no way of
> requiring that e.g. a square root function not have side effects at all:
> The compiler can allow a piece of code but then the library that was
> actually linked with the application can do anything else that it wants.

You can write a compiler with its own object format and linker, which lets you verify these promises at link time.

As an aside on this topic, I might recommend looking at Vigil, the eternally morally vigilant programming language: https://github.com/munificent/vigil

It has a rather effective way of dealing with errors that aren't explicitly handled.

February 02, 2017

Re: The extent of trust in errors and error handling

Posted by Paolo Invernizzi
in reply to Dukc

Paolo Invernizzi

Posted in reply to Dukc

On Wednesday, 1 February 2017 at 21:55:40 UTC, Dukc wrote:
> On Wednesday, 1 February 2017 at 19:25:07 UTC, Ali Çehreli
>
> Regarding that, I have trought that wouldn't it be better if it was bounds checking instead of debug vs release what determined if in contracts are called? If the contract had asserts, they would still be compiled out in release mode like all asserts are. But if it had enforce():s, their existence would obey the same logic as array bounds checks.
>
> This would let users to implement custom bounds checked types. Fibers for example could be made @trusted, with no loss in performance for @system code in release mode.

The right move is to ship a compiled debug version of the library, if closed source, along with the release one.
I still don't understand why that's not the default also for Phobos and runtime....

/Paolo

February 02, 2017

Re: The extent of trust in errors and error handling

Posted by Joakim
in reply to Paolo Invernizzi

Joakim

Posted in reply to Paolo Invernizzi

On Thursday, 2 February 2017 at 09:14:43 UTC, Paolo Invernizzi wrote:
> On Wednesday, 1 February 2017 at 21:55:40 UTC, Dukc wrote:
>> On Wednesday, 1 February 2017 at 19:25:07 UTC, Ali Çehreli
>>
>> Regarding that, I have trought that wouldn't it be better if it was bounds checking instead of debug vs release what determined if in contracts are called? If the contract had asserts, they would still be compiled out in release mode like all asserts are. But if it had enforce():s, their existence would obey the same logic as array bounds checks.
>>
>> This would let users to implement custom bounds checked types. Fibers for example could be made @trusted, with no loss in performance for @system code in release mode.
>
> The right move is to ship a compiled debug version of the library, if closed source, along with the release one.
> I still don't understand why that's not the default also for Phobos and runtime....
>
> /Paolo

It is, for both official dmd downloads and ldc:

https://www.archlinux.org/packages/community/x86_64/liblphobos/

Some packages may leave it out, not sure why.

February 03, 2017

Re: The extent of trust in errors and error handling

Posted by Ali Çehreli
in reply to Joakim

Ali Çehreli

Posted in reply to Joakim

On 02/01/2017 01:27 PM, Joakim wrote:
> On Wednesday, 1 February 2017 at 19:25:07 UTC, Ali Çehreli wrote:
>> tl;dr - Seeking thoughts on trusting a system that allows "handling"
>> errors.
>>
>> [...]
>
> Have you seen this long post from last year, where Joe Duffy laid out
> what they did with Midori?
>
> http://joeduffyblog.com/2016/02/07/the-error-model/
>
> Some relevant stuff in there.

Thank you. Yes, very much related and very interesting!

Joe Duffy says "Midori [is] a system that "drew significant inspiration from KeyKOS and its successors EROS and Coyotos." I'm happy to see that KeyKOS is mentioned there as Norm Hardy, the main architect of KeyKOS, is someone who is involved in the Mill CPU and whom I have the privilege of knowing personally and seeing weekly. :)

Ali

February 03, 2017

Re: The extent of trust in errors and error handling

Posted by Ali Çehreli
in reply to Chris Wright

Ali Çehreli

Posted in reply to Chris Wright

On 02/01/2017 06:29 PM, Chris Wright wrote:
> On Wed, 01 Feb 2017 11:25:07 -0800, Ali Çehreli wrote:
>> 1) There is the well-known issue of whether Error should ever be caught.
>> If Error represents conditions where the application is not in a defined
>> state, hence it should stop operating as soon as possible, should that
>> also carry over to other applications, to the OS, and perhaps even to
>> other systems in the whole cluster?
>
> My programs tend to apply operations to a queue of data. It might be a
> queue over time, like incoming requests, or it might be a queue based on
> something else, like URLs that I extract from HTML documents.
>
> Anything that does not impact my ability to manipulate the queue can be
> safely caught and recovered from.
>
> Stack overflow? Be my guest.
>
> Null pointer? It's a bug, but it's probably specific to a small subset of
> queue items -- log it, put it in the dead letter queue, move on.
>
> RangeError? Again, a bug, but I can successfully process everything else.

In practice, both null pointer and range error can probably be dealt with and the program can move forward.

However, in theory you cannot be sure why that pointer is null or why that index is out of range. It's possible that something horrible happened many clock cycles ago and you're seeing the side effects of that thing now.

What operations can you safely assume that you can still perform? Can you log? Are you sure? Even if you caught RangeError, are you sure that arr.ptr is still sane? etc.

In theory, at least the way I understand it, a program lives on a very narrow path. Once it steps outside that well known path, all bets are off. Can a caught Error bring it back on the path or are we on an alternate path now.

>> 2) What if an intermediate layer of code did in fact handle an Error
>> (perhaps raised by a function pre-condition check)? Should the callers
>> of that layer have a say on that? Should a higher level code be able to
>> say that Error should not be handled at all?
>>
>> For example, an application code may want to say that no library that it
>> uses should handle Errors that are thrown by a security library.
>
> There's a bit of a wrinkle there. "Handling" an error might include
> catching it, adding some extra data, and then rethrowing.

Interestingly, attempting to add extra data can very well produce the opposite effect: Stack trace information that would potentially be available can indeed be corrupted while adding that extra data.

The interesting part is trust. Once there is an Error, what can you trust?

>> I think there is no way of
>> requiring that e.g. a square root function not have side effects at all:
>> The compiler can allow a piece of code but then the library that was
>> actually linked with the application can do anything else that it wants.
>
> You can write a compiler with its own object format and linker, which lets
> you verify these promises at link time.

Good idea. :) As Joakim reminded, the designers of Midori did that and more.

> As an aside on this topic, I might recommend looking at Vigil, the
> eternally morally vigilant programming language:
> https://github.com/munificent/vigil
>
> It has a rather effective way of dealing with errors that aren't
> explicitly handled.
>

Thank you, I will look at it next.

Ali

February 04, 2017

Re: The extent of trust in errors and error handling

Posted by Chris Wright
in reply to Ali Çehreli

Chris Wright

Posted in reply to Ali Çehreli

On Fri, 03 Feb 2017 23:24:12 -0800, Ali Çehreli wrote:
> In practice, both null pointer and range error can probably be dealt with and the program can move forward.
> 
> However, in theory you cannot be sure why that pointer is null or why that index is out of range. It's possible that something horrible happened many clock cycles ago and you're seeing the side effects of that thing now.

Again, this is for a restricted type of application that I happen to write rather often. And it's restricted to a subset of the application that shares very little state with the rest.

> What operations can you safely assume that you can still perform? Can you log? Are you sure? Even if you caught RangeError, are you sure that arr.ptr is still sane? etc.

You seem to be assuming that I'll write:

  try {
    foo = foo[1..$];
  } catch (RangeError e) {
    log(foo);
  }

I'm actually talking about:

  try {
    results = process(documentName, document);
  } catch (Throwable t) {
    logf("error while processing %s: %s", documentName, t);
  }

where somewhere deep in `process` I get a RangeError.

> Even if you caught RangeError, are you sure that
> arr.ptr is still sane?

Well, yes. Bounds checking happens before the slice gets assigned for obvious reasons. But I'm not going to touch the slice that produced the problem, so it's irrelevant anyway.

February 04, 2017

Re: The extent of trust in errors and error handling

Posted by Ali Çehreli
in reply to Chris Wright

Ali Çehreli

Posted in reply to Chris Wright

On 02/04/2017 08:17 AM, Chris Wright wrote:
> On Fri, 03 Feb 2017 23:24:12 -0800, Ali Çehreli wrote:

> Again, this is for a restricted type of application that I happen to write
> rather often. And it's restricted to a subset of the application that
> shares very little state with the rest.

I agree that there are different kinds of applications that require different levels of correctness.

>> What operations can you safely assume that you can still perform? Can
>> you log? Are you sure? Even if you caught RangeError, are you sure that
>> arr.ptr is still sane? etc.
>
> You seem to be assuming that I'll write:
>
>   try {
>     foo = foo[1..$];
>   } catch (RangeError e) {
>     log(foo);
>   }
>
> I'm actually talking about:
>
>   try {
>     results = process(documentName, document);
>   } catch (Throwable t) {
>     logf("error while processing %s: %s", documentName, t);
>   }

Doesn't change what I'm saying. :) For example, RangeError may be thrown due to a rogue function writing over memory that it did not intend to. An index 42 may have become 42000 and that the RangeError may have been thrown. Fine. What if nearby data that logf depends on has also been overwritten? logf will fail as well.

What I and many others who say Errors should not be caught are saying is, once the program is in an unexpected state, attempting to do anything further is wishful thinking.

Again, in practice, it is likely that the program will log correctly but there is no guarantee that it will do so; it's merely "likely" and likely is far from "correct".

> where somewhere deep in `process` I get a RangeError.
>
>> Even if you caught RangeError, are you sure that
>> arr.ptr is still sane?
>
> Well, yes. Bounds checking happens before the slice gets assigned for
> obvious reasons. But I'm not going to touch the slice that produced the
> problem, so it's irrelevant anyway.

Agreed but the slice is just one part of the application's memory. We're not sure what happened to the rest of it.

Ali

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation