June 01, 2012
On 5/31/12 6:16 PM, Walter Bright wrote:
> On 5/31/2012 3:22 AM, Dmitry Olshansky wrote:
>> On 31.05.2012 13:06, deadalnix wrote:
>>> This is called failing gracefully. And this highly recommended, and you
>>> KNOW that the system will fail at some point.
>>
>> Exactly. + The point I tried to argue but it was apparently lost:
>> doing stack unwinding and cleanup on most Errors (some Errors like stack
>> overflow might not recoverable) is the best thing to do.
>
> This is all based on the assumption that the program is still in a valid
> state after an assert fail, and so any code executed after that and the
> data it relies on is in a workable state.
>
> This is a completely wrong assumption.
>
> It might be ok if the program is not critical and has no control over
> important things like delivering insulin, executing million dollar
> trades, or adjusting the coolant levels in a nuclear reactor.
>
> If the code controls anything that matters, then it is not the best
> thing to do, not at all.
>
> The right thing to do is to take the shortest path to stopping the
> program. A critical system would be monitoring those programs, and will
> restart them if they so fail, or will engage the backup system.
>
> [When I worked on flight critical airplane systems, the only acceptable
> response for a self-detected fault was to IMMEDIATELY stop the system,
> physically DISENGAGE it from the flight controls, and inform the pilot.]

I wonder how we could work into this enthusiasm fixing bugs created by comparing enumerated values of distinct types...

It did happen in a program of mine to confuse a UserID with a CityID. If the corrupt CityID would be subsequently entered into the navigation system of an airplane, it would lead the airplane on the wrong path.


Andrei
June 01, 2012
Walter Bright wrote:
> On 5/31/2012 3:22 AM, Dmitry Olshansky wrote:
> >On 31.05.2012 13:06, deadalnix wrote:
> >>This is called failing gracefully. And this highly recommended, and you KNOW that the system will fail at some point.
> >
> >Exactly. + The point I tried to argue but it was apparently lost:
> >doing stack unwinding and cleanup on most Errors (some Errors like stack
> >overflow might not recoverable) is the best thing to do.
> 
> This is all based on the assumption that the program is still in a valid state after an assert fail, and so any code executed after that and the data it relies on is in a workable state.
> 
> This is a completely wrong assumption.
> 
> It might be ok if the program is not critical and has no control over important things like delivering insulin, executing million dollar trades, or adjusting the coolant levels in a nuclear reactor.
> 
> If the code controls anything that matters, then it is not the best thing to do, not at all.
> 
> The right thing to do is to take the shortest path to stopping the program. A critical system would be monitoring those programs, and will restart them if they so fail, or will engage the backup system.
> 
> [When I worked on flight critical airplane systems, the only acceptable response for a self-detected fault was to IMMEDIATELY stop the system, physically DISENGAGE it from the flight controls, and inform the pilot.]

This is perfectly valid when developing such critical systems. But limiting D to effectively only allow developing such particular systems cannot be the appropriate response. There are plenty of other systems that do not operate in such a constrained environment.

Jens
June 01, 2012
Walter Bright wrote:
> On 5/31/2012 1:05 PM, Jens Mueller wrote:
> >Okay, let's assume I have separate processes maybe even processes on
> >different machines. In one process I get an error. Let's say I want to
> >trigger the other process that it restarts the process or just logs the
> >event whatever makes sense.
> >How do I do this if it not guaranteed that finally/scope blocks are
> >being executed?
> 
> 
> Presumably the operating system provides a means to tell when a process is no longer running as part of its inter-process communication api.

My point is that you may want to access some state of your invalid program. State that is lost otherwise. But maybe just having the core dump is actually enough, i.e. there is no other interesting state. You are probably right that you can always recover from the error when a new process is started. At least I cannot can up with a convincing case.

Since the current implementation does not follow the specification regarding scope and finally block being executed in case of Error will try ... catch (...Error) keep working? I have code that uses assertThrows!AssertError to test some in contracts. Will this code break?

Jens
June 01, 2012
On 01.06.2012 5:16, Walter Bright wrote:
> On 5/31/2012 3:22 AM, Dmitry Olshansky wrote:
>> On 31.05.2012 13:06, deadalnix wrote:
>>> This is called failing gracefully. And this highly recommended, and you
>>> KNOW that the system will fail at some point.
>>
>> Exactly. + The point I tried to argue but it was apparently lost:
>> doing stack unwinding and cleanup on most Errors (some Errors like stack
>> overflow might not recoverable) is the best thing to do.
>
> This is all based on the assumption that the program is still in a valid
> state after an assert fail, and so any code executed after that and the
> data it relies on is in a workable state.
>
> This is a completely wrong assumption.

To be frank a "completely wrong assumption" is flat-out exaggeration. The only problem that can make it "completely wrong" is memory corruption. Others just depend on specifics of system, e.g. wrong arithmetic in medical software == critical, arithmetic bug in "refracted light color component" in say 3-D game is no problem, just log it and recover. Or better - save game and then crash gracefully.

Keep in mind both of the above are likely to be assert(smth), even though the last arguably shouldn't be it. But it is logic invariant check.

@safe D code should be enough to avoid memory corruption. So in @safe D code AssertError is not memory corruption. Being able to do some logging and gracefull teardown in this case would be awesome. I mean an OPTION to do so.

Wrong values don't always corrupt "the whole program" state. It's too conservative point of view. It is a reasonable DEFAULT, not a rule.

(just look at all these PHP websites, I'd love them to crash on critical errors yet they still crawl after cascade failures with their DBs, LOL)

BTW OutOfMemory is not an Error. To me it's like can't open file. Yeah, it could be critical if your app depends on this articular file but not in general.

To summarize:
I agree there are irrecoverable errors:
	-->call abort immediately.
I agree there are some I don't know if critical:
	--> call user hook to do some logging/attempt to save data, then abort
	or
	---> provide stack undiwinding
	 so that thing cleans it up itself (more dangerous)

I don't agree that OutOfMemory is critical:
	--> make it an exception ?



-- 
Dmitry Olshansky
June 01, 2012
On 6/1/2012 12:45 AM, Jens Mueller wrote:
> This is perfectly valid when developing such critical systems. But
> limiting D to effectively only allow developing such particular systems
> cannot be the appropriate response. There are plenty of other systems
> that do not operate in such a constrained environment.

You can catch thrown asserts if you want, after all, D is a systems programming language. But that isn't a valid way to write robust software.

June 01, 2012
On 6/1/2012 1:48 AM, Dmitry Olshansky wrote:
> On 01.06.2012 5:16, Walter Bright wrote:
>> On 5/31/2012 3:22 AM, Dmitry Olshansky wrote:
>>> On 31.05.2012 13:06, deadalnix wrote:
>>>> This is called failing gracefully. And this highly recommended, and you
>>>> KNOW that the system will fail at some point.
>>>
>>> Exactly. + The point I tried to argue but it was apparently lost:
>>> doing stack unwinding and cleanup on most Errors (some Errors like stack
>>> overflow might not recoverable) is the best thing to do.
>>
>> This is all based on the assumption that the program is still in a valid
>> state after an assert fail, and so any code executed after that and the
>> data it relies on is in a workable state.
>>
>  > This is a completely wrong assumption.
>
> To be frank a "completely wrong assumption" is flat-out exaggeration. The only
> problem that can make it "completely wrong" is memory corruption. Others just
> depend on specifics of system, e.g. wrong arithmetic in medical software ==
> critical, arithmetic bug in "refracted light color component" in say 3-D game is
> no problem, just log it and recover. Or better - save game and then crash
> gracefully.

Except that you do not know why the arithmetic turned out wrong - it could be the result of memory corruption.


> @safe D code should be enough to avoid memory corruption. So in @safe D code
> AssertError is not memory corruption. Being able to do some logging and
> gracefull teardown in this case would be awesome. I mean an OPTION to do so.

You do have the option of catching assert errors in D, but such cannot be represented as a correct or robust way of doing things.

> Wrong values don't always corrupt "the whole program" state.

Right, but since you cannot know how those values got corrupt, you cannot know that the rest of the program is in a valid state. In fact, you reliably know nothing about the state of a program after an assert fail.

> It's too conservative point of view. It is a reasonable DEFAULT, not a rule.

It's a rule. Break it at your peril :-) I am not going to pretend that it is a reasonable thing to do to try and keep running the program.


> (just look at all these PHP websites, I'd love them to crash on critical errors
> yet they still crawl after cascade failures with their DBs, LOL)

Other people writing crappy, unreliable software is no excuse for us.


> BTW OutOfMemory is not an Error. To me it's like can't open file. Yeah, it could
> be critical if your app depends on this articular file but not in general.

OOM is a special case. I agree that that isn't a corruption error. But I've almost never seen a program that could recover from OOM, even if it was designed to. (For one reason, the recovery logic for such is almost never tested, and so when it is tripped, it fails.)


> I don't agree that OutOfMemory is critical:
> --> make it an exception ?

The reason it is made non-recoverable is so that pure functions can do something useful.

June 01, 2012
On 6/1/2012 1:15 AM, Jens Mueller wrote:
> Since the current implementation does not follow the specification
> regarding scope and finally block being executed in case of Error will
> try ... catch (...Error) keep working?

No. The reason for this is the implementation was not updated after the split between Error and Exception happened. It was overlooked.

> I have code that uses
> assertThrows!AssertError to test some in contracts. Will this code
> break?

I don't know exactly what your code is, but if you're relying on scope to unwind in the presence of Errors, that will break.

June 01, 2012
Le 01/06/2012 12:26, Walter Bright a écrit :
> Except that you do not know why the arithmetic turned out wrong - it
> could be the result of memory corruption.
>

Yes. wrong calculation often comes from memory corruption. Almost never from programmer having screwed up in the said calculation.

It is so perfectly reasonable and completely match my experience. I'm sure everybody here will agree.

Not to mention that said memory corruption obviously come from compiler bug. As always. What programmer does mistakes in his code ? We write programs, not bugs !
June 01, 2012
Le 31/05/2012 21:47, Walter Bright a écrit :
> On 5/31/2012 12:40 AM, Jens Mueller wrote:
>> How do I do a graceful shutdown if finally and scope is not guaranteed
>> to be executed? Assuming onAssertError, etc. is of no use because I need
>> to perform different shutdowns due to having different cases or if I
>> defined my own Error, let's say for some device.
>
> There's no way to guarantee a graceful shutdown.
>
> No way.
>
> If you must have such, then the way to do it is to divide your
> application into separate processes that communicate via interprocess
> communication, then when one component fails the rest of your app can
> restart it or do what's necessary, as the rest is not in an invalid state.
>

They're is no way to ensure that an IP packet will go throw the internet. Let just shutdown that silly thing that internet is right now.
June 01, 2012
Le 01/06/2012 12:29, Walter Bright a écrit :
> On 6/1/2012 1:15 AM, Jens Mueller wrote:
>> Since the current implementation does not follow the specification
>> regarding scope and finally block being executed in case of Error will
>> try ... catch (...Error) keep working?
>
> No. The reason for this is the implementation was not updated after the
> split between Error and Exception happened. It was overlooked.
>
>> I have code that uses
>> assertThrows!AssertError to test some in contracts. Will this code
>> break?
>
> I don't know exactly what your code is, but if you're relying on scope
> to unwind in the presence of Errors, that will break.
>

If you have an error, it is already broken in some way.

But this is unreasonable to think that the whole program is broken, except in very specific cases (stack corruption for instance) but in such a case, you can't throw an error anyway.