February 05, 2017
On Wednesday, 1 February 2017 at 19:25:07 UTC, Ali Çehreli wrote:
> tl;dr - Seeking thoughts on trusting a system that allows "handling" errors.
>
> One of my extra-curricular interests is the Mill CPU[1]. A recent discussion in that context reminded me of the Error-Exception distinction in languages like D.
>
> 1) There is the well-known issue of whether Error should ever be caught. If Error represents conditions where the application is not in a defined state, hence it should stop operating as soon as possible, should that also carry over to other applications, to the OS, and perhaps even to other systems in the whole cluster?
>

No, because your logic would then extend to all of the human race, to animals, etc. It is not practical and not necessary.

1. The ball must keep rolling. All of this stuff we do is fantasy anyways so if an error occurs in that lemmings game, it is just a game. It might take down every computer in the universe(if we went with the logic above) but it can't affect humans because they are distinct from computers(it might kill a few humans but that has always been acceptable to humans).

That is, it is not practical to take everything down because an error is not that serious and ultimately has limited affect.

That is, in the practical world, we are ok with some errors. This allows us not to worry to much. The more we would have to worry about such errors the more things would have to be shut down exactly because of the logic you have given. So, it is not a problem if "should we do x or not x" but how much of x is acceptable.

(The human race has decided that quite a bit of errors are ok. We can even have errors such as a medical device malfunctioning because some error like invalid array access kill people and it's ok(it's just money, and lawyers will be happy))

2. Not all errors will systematically propagate in to all other systems. e.g., two computers not connected to in any way. If one has an error, the other won't be affected so no reason to take that computer down too.

So, what matters, like anything else, is that we try to do the best we can. We don't have to pick an arbitrary point of when to stop because we actually don't know. What we do is use reason and experience to decide what is the most likely solution and see how much risk that has. If it has too much we back off, if not enough we back off.

There is an optimal point, more or less, because risk requires energy to manage(even for no risk).

Basically if you assume, like you seem to be doing, that a singular error creates an unstable state in the whole system at every point, then you are screwed from the get go if you do not any any unstable state at any cost. The only solution is to not have any errors at any point then. (which requires perfection, something humans gave up on trying to achieve a long time ago)


3. Things are not so cut and dry. Intelligence can be used to understand the problem. Not all errors are the simple. Some errors are catastrophic and need everything shut down and some don't. Knowing those error types is important. Hence, the more descriptive something is the better as it allows one create separation. Also, designing things to be robust is another way to mitigate the problems.

Programming is not much different than banking. You have a certain amount of risk in a certain portfolio(program), you hedge your bets(create a good robust design), and hope for the best. It's up to the individual to decide how much the hedging is required as it will require time/money to do it.

Example: Windows. Obviously windows was a design that didn't care too much about robustness. Just enough to get the job done was their motto. If someone dies because of some BSOD, it's not that big a deal... it will be hard to trace the cause, and if it can be done they have enough money to afford it. (similar to the ford fiasco https://en.wikibooks.org/wiki/Professionalism/The_Ford_Pinto_Gas_Tank_Controversy)






February 05, 2017
On Saturday, 4 February 2017 at 07:24:12 UTC, Ali Çehreli wrote:
> [...]

A bit OT but I'm pretty sure you would be very interested in GOTO; 2016's conference by Kevlin Henney titled "The Error of Our Ways" which discusses the fact that most catastrophic consequences of software come from very simple errors : https://www.youtube.com/watch?v=IiGXq3yY70o
February 05, 2017
On Sat, 04 Feb 2017 23:48:48 -0800, Ali Çehreli wrote:
> Doesn't change what I'm saying. :) For example, RangeError may be thrown due to a rogue function writing over memory that it did not intend to. An index 42 may have become 42000 and that the RangeError may have been thrown. Fine. What if nearby data that logf depends on has also been overwritten? logf will fail as well.

I can't count on an error being thrown, so I may as well not run my program in the first place. That's the only defense. It's only wishful thinking that my program's data hasn't already been corrupted by the GC and the runtime but in a way that doesn't cause an Error to be thrown.
February 05, 2017
On 02/05/2017 07:17 AM, Cym13 wrote:
> On Saturday, 4 February 2017 at 07:24:12 UTC, Ali Çehreli wrote:
>> [...]
>
> A bit OT but I'm pretty sure you would be very interested in GOTO;
> 2016's conference by Kevlin Henney titled "The Error of Our Ways" which
> discusses the fact that most catastrophic consequences of software come
> from very simple errors : https://www.youtube.com/watch?v=IiGXq3yY70o

Thank you for that. I've always admired Kevlin Henney's writings and talks. He used to come to Silicon Valley at least once a year for SW conferences (the conferences are no more) and we would adjust our meetup schedules to have him as a speaker once a year.

Ali

February 06, 2017
On Sat, 04 Feb 2017 23:48:48 -0800, Ali Çehreli wrote:
> What I and many others who say Errors should not be caught are saying is, once the program is in an unexpected state, attempting to do anything further is wishful thinking.

I've been thinking about this a bit more, and I'm curious: how do you recommend that an application behave when an Error is thrown? How do you recommend it leave behind enough data for me to investigate the next day when I see there was a problem? How do you recommend I orchestrate things to minimize disruption to user activities?

Catching an error, logging it, and trying to move on is the obvious thing. It works for every other programming language I've encountered.

If you're telling me it's not good enough for D, you must have something better in mind. What is it?

Or, alternatively, you know something about D that means that, when something goes wrong, it effectively kills the entire application -- in a way that doesn't happen when an Error isn't thrown, in a way that can't happen in other languages.
February 05, 2017
On 02/05/2017 08:49 AM, Chris Wright wrote:
> On Sat, 04 Feb 2017 23:48:48 -0800, Ali Çehreli wrote:
>> Doesn't change what I'm saying. :) For example, RangeError may be thrown
>> due to a rogue function writing over memory that it did not intend to.
>> An index 42 may have become 42000 and that the RangeError may have been
>> thrown. Fine. What if nearby data that logf depends on has also been
>> overwritten? logf will fail as well.
>
> I can't count on an error being thrown, so I may as well not run my
> program in the first place.

Interesting. That's an angle I hadn't considered.

> That's the only defense. It's only wishful
> thinking that my program's data hasn't already been corrupted by the GC
> and the runtime but in a way that doesn't cause an Error to be thrown.

Yeah, all bets are off when memory is shared by different actors as is the case for conventional CPUs.

Thanks everyone who contributed to this thread. I learned more. :)

Ali

February 05, 2017
On 02/05/2017 10:08 PM, Chris Wright wrote:
> On Sat, 04 Feb 2017 23:48:48 -0800, Ali Çehreli wrote:
>> What I and many others who say Errors should not be caught are saying
>> is, once the program is in an unexpected state, attempting to do
>> anything further is wishful thinking.
>
> I've been thinking about this a bit more, and I'm curious: how do you
> recommend that an application behave when an Error is thrown?

I don't have the answers. That's why I opened this thread. However, I think I know what common approaches are.

The current recommendation is that it aborts immediately before producing (more) incorrect results.

> How do you
> recommend it leave behind enough data for me to investigate the next day
> when I see there was a problem?

The current approach is to rely on the backtrace produced when aborting.

> How do you recommend I orchestrate things
> to minimize disruption to user activities?

That's is a hard question. If the program is interacting with the user, it certainly seems appropriate to communicate with them but perhaps a drastic abort is as good.

> Catching an error, logging it, and trying to move on is the obvious thing.

That part I can't agree with. It is not necessarily true that moving on will work the way we wanted. The invoice prepared for the next customer may have incorrect amount in it.

> It works for every other programming language I've encountered.

This issue is language agnostic. It works in D as well but at the same level of correctness and unknowns. I heard about the Exception-Error distinction first in Java and I think there are other languages that recommend not catching Errors.

> If you're telling me it's not good enough for D, you must have something
> better in mind. What is it?

This is an interesting issue to think about. As Profile Anaysis and you say, this is a practical matter. We have to accept the imperfections and move on.

> Or, alternatively, you know something about D that means that, when
> something goes wrong, it effectively kills the entire application -- in a
> way that doesn't happen when an Error isn't thrown, in a way that can't
> happen in other languages.

I don't think it's possible with conventional CPUs and OSes and again.

Ali

February 05, 2017
On 2/1/2017 11:25 AM, Ali Çehreli wrote:
> 1) There is the well-known issue of whether Error should ever be caught. If
> Error represents conditions where the application is not in a defined state,
> hence it should stop operating as soon as possible, should that also carry over
> to other applications, to the OS, and perhaps even to other systems in the whole
> cluster?

If it is possible for an application to leave other applications or the OS in a corrupted state, yes, it should stop the OS as soon as possible. MS-DOS fell into this category, it was normal for a crashing program to scramble MS-DOS along with it. Attempting to continue running MS-DOS risked scrambling your hard disk as well (happened many times to me). I eventually learned to reboot every time an app failed unexpectedly. As soon as I could, I moved all development to protected mode operating systems, and would port to DOS only as the last step.


> For example, if a function detected an inconsistency in a DB that is available
> to all applications (as is the case in the Unix model of user-based access
> protection), should all processes that use that DB stop operating as well?

A DB inconsistency is not a bug in the application, it is a problem with the input to the application. Therefore, it is not an Error, it is an Exception.

Simply put, an Error is a bug in the application. An Exception is a bug in the input to the application. The former is not recoverable, the latter is.


> 2) What if an intermediate layer of code did in fact handle an Error (perhaps
> raised by a function pre-condition check)? Should the callers of that layer have
> a say on that? Should a higher level code be able to say that Error should not
> be handled at all?

If the layer has access to the memory space of the caller, an Error in the layer is an Error in the caller as well.


> For example, an application code may want to say that no library that it uses
> should handle Errors that are thrown by a security library.

Depends on what you mean by "handling" an Error. If you mean continue running the application, you're running a corrupted program. If you mean logging the Error and then terminating the application, that would be reasonable.

----

This discussion has come up repeatedly on this forum. Many people strongly disagree with me, and believe that they can recover from Errors and continue executing the program.

That's fine if the program's output is nothing one cares about, such as a game or a music player. If the program's failure could result in the loss of money, property, health or lives, it is unacceptable.

Much other confusion comes from not carefully distinguishing Errors from Exceptions.

Corollary: bad input that causes a program to crash is an Error because it is a programming bug to fail to vet the input for correctness. For example, if I feed a D source file to a C compiler and the C compiler crashes, the C compiler has a bug in it, which is an Error. If the C compiler instead writes a message "Error: D source code found instead of C source code, please upgrade to a D compiler" then that is an Exception.
February 06, 2017
On Monday, 6 February 2017 at 06:08:22 UTC, Chris Wright wrote:
> On Sat, 04 Feb 2017 23:48:48 -0800, Ali Çehreli wrote:
>> What I and many others who say Errors should not be caught are saying is, once the program is in an unexpected state, attempting to do anything further is wishful thinking.
>
> I've been thinking about this a bit more, and I'm curious: how do you recommend that an application behave when an Error is thrown?
It has lost its face and shall commit sucide.
That's the japanese way, and it has its merits.
Continuing to work and pretend nothing has happened (the european way) makes it just untrustworthy from the begining.
May be this is better for humans (they are untrustworthy anyway until some validation has been run on them), but for programs I prefer the japanese way.
February 06, 2017
On 2017-02-06 08:48, Walter Bright wrote:

> For example, if I feed a D source file to a C compiler and the C compiler
> crashes, the C compiler has a bug in it, which is an Error. If the C
> compiler instead writes a message "Error: D source code found instead of
> C source code, please upgrade to a D compiler" then that is an Exception.

Does DMC do that :) ?

-- 
/Jacob Carlborg