The extent of trust in errors and error handling (page 3) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » The extent of trust in errors and error handling (page 3)

February 06, 2017

Re: The extent of trust in errors and error handling

Posted by Chris Wright
in reply to Walter Bright

Chris Wright

Posted in reply to Walter Bright

On Sun, 05 Feb 2017 23:48:07 -0800, Walter Bright wrote:
> This discussion has come up repeatedly on this forum. Many people strongly disagree with me, and believe that they can recover from Errors and continue executing the program.
> 
> That's fine if the program's output is nothing one cares about, such as a game or a music player. If the program's failure could result in the loss of money, property, health or lives, it is unacceptable.

Assuming there is no intervening process whereby a human will investigate errors by hand after the program completes. Assuming that crashing results in less loss of money or lives than marching on.

In Google Compute Engine billing, it was *always* worse for us if our billing jobs failed than if they completed with reported errors. If the job failed, it was difficult to investigate. If it completed with errors, we could investigate in a straightforward way, and the errors being reported meant the data was held aside and not automatically sent to the payment processor.

February 06, 2017

Re: The extent of trust in errors and error handling

Posted by Chris Wright
in reply to Dominikus Dittes Scherkl

Chris Wright

Posted in reply to Dominikus Dittes Scherkl

On Mon, 06 Feb 2017 09:09:31 +0000, Dominikus Dittes Scherkl wrote:

> On Monday, 6 February 2017 at 06:08:22 UTC, Chris Wright wrote:
>> On Sat, 04 Feb 2017 23:48:48 -0800, Ali Çehreli wrote:
>>> What I and many others who say Errors should not be caught are saying is, once the program is in an unexpected state, attempting to do anything further is wishful thinking.
>>
>> I've been thinking about this a bit more, and I'm curious: how do you recommend that an application behave when an Error is thrown?
> It has lost its face and shall commit sucide.
> That's the japanese way, and it has its merits.
> Continuing to work and pretend nothing has happened (the european way)
> makes it just untrustworthy from the begining.

https://github.com/munificent/vigil is the programming language for you.

February 06, 2017

Re: The extent of trust in errors and error handling

Posted by Chris Wright
in reply to Ali Çehreli

Chris Wright

Posted in reply to Ali Çehreli

On Sun, 05 Feb 2017 22:23:19 -0800, Ali Çehreli wrote:

> On 02/05/2017 10:08 PM, Chris Wright wrote:
>  > How do you recommend it leave behind enough data for me to
>  > investigate the next day when I see there was a problem?
> 
> The current approach is to rely on the backtrace produced when aborting.

Which I can't log, according to you, because I don't know for certain that the logger is not corrupted. Which is provided by the runtime, which I can't trust not to be in a corrupted state. Which forces me to have at least two different logging systems.

At past jobs, I've used an SMTP logging appender with log4net. Wrangling that with a stacktrace reported only via stderr would be fun.

>  > Catching an error, logging it, and trying to move on is the obvious
> thing.
> 
> That part I can't agree with. It is not necessarily true that moving on will work the way we wanted. The invoice prepared for the next customer may have incorrect amount in it.

I've done billing. We march on, process as many invoices as possible, and detect problems. If there are any problems, we report them to a human for review instead of just submitting to the payment processor.

Besides which, you are trusting every line of code you depend on to appropriately distinguish between something that could impact shared state and something that couldn't, and to check continuously for whether shared state is corrupted. I'm merely trusting it not to share more state than it needs to.

>  > It works for every other programming language I've encountered.
> 
> This issue is language agnostic. It works in D as well but at the same level of correctness and unknowns.

I haven't heard anyone complaining about this elsewhere. Have you?

What I've heard instead is that it's a bug if state unintentionally leaks between calls and it's undesirable to have implicitly shared state. Not sharing state unnecessarily means you don't have to put forth a ton of effort trying to detect corrupted shared state in order to throw an Error to signal that your library is unsafe to use.

> I heard about the Exception-Error
> distinction first in Java and I think there are other languages that
> recommend not catching Errors.

I've only been using Java professionally for seven years, so maybe that's before my time. The common practice today is to have `catch(Exception)` at a central location and to catch other exceptions as needed to make the compiler shut up. (Which we all hate but *has* caused me to be more careful about a number of things, so there's that.)

February 06, 2017

Re: The extent of trust in errors and error handling

Posted by Caspar Kielwein
in reply to Chris Wright

Caspar Kielwein

Posted in reply to Chris Wright

On Monday, 6 February 2017 at 17:40:50 UTC, Chris Wright wrote:
>> It works for every other programming language I've encountered.
>> 
>> This issue is language agnostic. It works in D as well but at the same level of
>> correctness and unknowns.
>
> I haven't heard anyone complaining about this elsewhere. Have you?
>
> What I've heard instead is that it's a bug if state unintentionally leaks between calls and it's undesirable to have implicitly shared state. Not sharing state unnecessarily means you don't have to put forth a ton of effort trying to detect corrupted shared state in order to throw an Error to signal that your library is unsafe to use.

I absolutely agree with Walter and Ali, that there are applications where on Error anything but termination of the process is unacceptable. This really is independent of the language used.

My work is in sensors for automation of heavy mining equipment and the software I write is used by the automation systems of our customers.

When our system detects an internal error I cannot guarantee for any of its outputs. Erroneous outputs can easily cost millions of dollars in machine damage, or in the worst case even human lives. (Usually there are redundant systems to mitigate that risk)
Termination of our system is automatically detected by automation systems within the specified latencies and is generally considered to be annoying but acceptable. Nonsense outputs because of errors in our system are never acceptable!

We try to find the cause of errors by logging the raw data from our sensors and feeding them to a clone of the system which has more debugging and logging enabled. Yes we usually don't even get a stack trace from the original crash.

I have definitely seen asserts violated because of buffer overflows in completely unrelated modules. Not sharing state unnecessarily, while certainly being good engineering practice is not enough.

February 06, 2017

Re: The extent of trust in errors and error handling

Posted by Ali Çehreli
in reply to Chris Wright

Ali Çehreli

Posted in reply to Chris Wright

On 02/06/2017 09:25 AM, Chris Wright wrote:

> https://github.com/munificent/vigil is the programming language for you.

Brilliant! :)

Ali

February 06, 2017

Re: The extent of trust in errors and error handling

Posted by Walter Bright
in reply to Chris Wright

Walter Bright

Posted in reply to Chris Wright

On 2/6/2017 9:10 AM, Chris Wright wrote:
> Assuming that crashing results
> in less loss of money or lives than marching on.

Any application that must continue or lives are lost is a BADLY designed system and should not be tolerated.

http://www.drdobbs.com/architecture-and-design/assertions-in-production-code/228700788

February 06, 2017

Re: The extent of trust in errors and error handling

Posted by Chris Wright
in reply to Caspar Kielwein

Chris Wright

Posted in reply to Caspar Kielwein

On Mon, 06 Feb 2017 18:12:38 +0000, Caspar Kielwein wrote:
> I absolutely agree with Walter and Ali, that there are applications where on Error anything but termination of the process is unacceptable.

Sure, and it looks like you spend a ton of effort to make things work properly and to make things debuggable because your application has these requirements.

The position that D's runtime can make this decision for me is grating. Without the same kind of tooling that you're talking about available and shipped with dmd, it's absurd.

> I have definitely seen asserts violated because of buffer overflows in completely unrelated modules. Not sharing state unnecessarily, while certainly being good engineering practice is not enough.

Violated asserts catch this kind of problem after the fact. @safe prevents you from writing code with the problem in the first place.

February 07, 2017

Re: The extent of trust in errors and error handling

Posted by Steve Biedermann
in reply to Ali Çehreli

Steve Biedermann

Posted in reply to Ali Çehreli

On Wednesday, 1 February 2017 at 19:25:07 UTC, Ali Çehreli wrote:
> tl;dr - Seeking thoughts on trusting a system that allows "handling" errors.
>
> One of my extra-curricular interests is the Mill CPU[1]. A recent discussion in that context reminded me of the Error-Exception distinction in languages like D.
>
> 1) There is the well-known issue of whether Error should ever be caught. If Error represents conditions where the application is not in a defined state, hence it should stop operating as soon as possible, should that also carry over to other applications, to the OS, and perhaps even to other systems in the whole cluster?
>
> For example, if a function detected an inconsistency in a DB that is available to all applications (as is the case in the Unix model of user-based access protection), should all processes that use that DB stop operating as well?
>
> 2) What if an intermediate layer of code did in fact handle an Error (perhaps raised by a function pre-condition check)? Should the callers of that layer have a say on that? Should a higher level code be able to say that Error should not be handled at all?
>
> For example, an application code may want to say that no library that it uses should handle Errors that are thrown by a security library.
>
> Aside, and more related to D: I think this whole discussion is related to another issue that has been raised in this forum a number of times: Whose responsibility is it to execute function pre-conditions? I think it was agreed that pre-condition checks should be run in the context of the caller. So, not the library, but the application code, should require that they be executed. In other words, it should be irrelevant whether the library was built in release mode or not, its pre-condition checks should be available to the caller. (I think we need to fix this anyway.)
>
> And there is the issue of the programmer making the right decision: One person's Exception may be another person's Error.
>
> It's fascinating that there are so many fundamental questions with CPUs, runtimes, loaders, and OSes, and that some of these issues are not even semantically describable. For example, I think there is no way of requiring that e.g. a square root function not have side effects at all: The compiler can allow a piece of code but then the library that was actually linked with the application can do anything else that it wants.
>
> Thoughts? Are we doomed? Surprisingly, not seems to be as we use computers everywhere and they seem to work. :o)
>
> Ali
>
> [1] http://millcomputing.com/

If you can recover from an error depends on the capabilities of the language and the guarantees it makes for errors.

If the language has no pointers and it gives you the guarantee, that no memory can be unintentionally overwritten in any other way, then you can recover from an error. Because you have the guarantee, that no memory corruption can happen.

If it's exactly specified, what happens when an error happens, you can decide if it's safe to continue. But for that you need to know exactly what the runtime does when this error is raised. If you aren't 100% sure what your state is, you shouldn't continue. (this matters more in life critical software, than in command line tools, but still...).

Or if you have a software stack like erlang, where you can just restart the failing process. In erlang it doesn't matter if it's an exception or an error. If a process fails, restart it and move on. This works, because processes are isolated and an error can't corrupt other processes.

So there are many approaches to this problem and all of them are a bit different. The final answer can only be, it depends on the language and the guarantees it makes. (And how much you trust the compiler to do the right thing [https://www.ece.cmu.edu/~ganger/712.fall02/papers/p761-thompson.pdf] :D)

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation