October 03, 2014
On Friday, 3 October 2014 at 15:43:59 UTC, Sean Kelly wrote:

> My point, and I think Kagamin's as well, is that the entire plane is a system and the redundant internals are subsystems.  They may not share memory, but they are wired to the same sensors, servos, displays, etc.  Thus the point about shutting down the entire plane as a result of a small failure is fair.

This "real life" example:

http://en.wikipedia.org/wiki/Air_France_Flight_447

I just pick some interesting statements (there are other factors described as well):

"temporary inconsistency between the measured speeds, likely as a result of the obstruction of the pitot tubes by ice crystals, causing autopilot disconnection and reconfiguration to alternate law;"


And as I can see it, all subsystems related to the "small failure" was shut down. But what is also important information was not clearly provided to the pilots:

"Despite the fact that they were aware that altitude was declining rapidly, the pilots were unable to determine which instruments to trust: it may have appeared to them that all values were incoherent"

"the cockpit lacked a clear display of the inconsistencies in airspeed readings identified by the flight computers;"

Piotrek
October 03, 2014
On Friday, 3 October 2014 at 18:00:58 UTC, Piotrek wrote:
>
> And as I can see it, all subsystems related to the "small failure" was shut down. But what is also important information was not clearly provided to the pilots:
>
> "Despite the fact that they were aware that altitude was declining rapidly, the pilots were unable to determine which instruments to trust: it may have appeared to them that all values were incoherent"
>
> "the cockpit lacked a clear display of the inconsistencies in airspeed readings identified by the flight computers;"

There's a similar issue with nuclear reactors, which is that
there are so many blinky lights and such that it can be
impossible to spot or prioritize problems in a failure scenario.
I know there have been articles written on revisions of user
interface design in reactors specifically to deal with this
issue, and I suspect the ideas are applicable to error handling
in general.
October 03, 2014
On Friday, 3 October 2014 at 17:38:40 UTC, Brad Roberts via
Digitalmars-d wrote:
>
> The part of Walter's point that is either deliberately overlooked or somewhat misunderstood here is the notion of a fault domain.  In a typical unix or windows based environment, it's a process.  A fault within the process yields the aborting of the process but not all processes.  Erlang introduces within it's execution model a concept of a process within the higher level notion of the os level process.  Within the erlang runtime it's individual processes run independently and can each fail independently.  The erlang runtime guarantees a higher level of separation than a typical threaded java or c++ application.  An error within the erlang runtime itself would justifiably cause the entire system to be halted.  Just as within an airplane, to use Walter's favorite analogy, the seat entertainment system is physically and logically separated from flight control systems thus a fault within the former has no impact on the latter.

Yep.  And I think it's a fair assertion that the default fault
domain in a D program is at the process level, since D is not
inherently memory safe.  But I don't think the language should
necessarily make that assertion to the degree that no other
definition is possible.
October 03, 2014
On Friday, 3 October 2014 at 17:33:33 UTC, Piotrek wrote:
> That depends on design (logic). Ever heard of this?
> http://www.reddit.com/r/programming/comments/1ax0oa/how_kdes_1500_git_repositories_almost_were_lost/

How is not having redundant storage, logging or backup related to this?

This is a risk assessment scenario: «What are we willing to loose compared to what it costs to mitigate the risks by investing in additional resources?»

> A logic error would be a case when you think you are running a garage but suddenly you noticed your stuff is selling meals and is wearing chef's uniforms.

But it is a business decision whether it is better to take amazon.com off the network for a week or just let their search engine occasionally serve food instead of books as search results. Not an engineering decision.

It is a business decision whether it is better for a game to corrupt 1% of user accounts and let customer support manually build them back up than to take the game off the network until the problem is fixed. You would probably have heavier load on customer support and loose more subscriptions by taking the game off the network than giving those 1% one year of free game play as a compensation.

If you have a logic error in a functional routine, it is local. It might not matter, it might be expected.

Logic errors do not imply memory corruption.
Memory corruption does not imply that exceptions are thrown.

Even if memory corruption would lead to exceptions being thrown in 30% of the cases, you'd still have 70% of cases where memory corruption goes undetected. So if that is a concern, you need to focus elsewhere.

You have to think about this in probabilistic terms and relate it to business decisions.

Defining thresholds for acceptable reliability is not an engineering decision. An engineering decision is to use isolates, Erlang, Haskell etc to achieve the thresholds set as acceptable reliability/quality viewed from a business point of view.
October 03, 2014
On 2014-10-03 14:36, David Nadlinger wrote:

> you are saying that specific exceptions were replaced by enforce? I
> can't recall something like this happening.

I have no idea about this but I know there are a lot of "enforce" in Phobos and it sees to be encouraged to use it. Would be really sad if specific exceptions were deliberately replaced with less specific exceptions.

-- 
/Jacob Carlborg
October 03, 2014
On Friday, 3 October 2014 at 18:00:58 UTC, Piotrek wrote:
> On Friday, 3 October 2014 at 15:43:59 UTC, Sean Kelly wrote:
>
> This "real life" example:
>
> http://en.wikipedia.org/wiki/Air_France_Flight_447
>
> I just pick some interesting statements (there are other factors described as well):
>
> "temporary inconsistency between the measured speeds, likely as a result of the obstruction of the pitot tubes by ice crystals, causing autopilot disconnection and reconfiguration to alternate law;"
>
>
> And as I can see it, all subsystems related to the "small failure" was shut down. But what is also important information was not clearly provided to the pilots:
>
> "Despite the fact that they were aware that altitude was declining rapidly, the pilots were unable to determine which instruments to trust: it may have appeared to them that all values were incoherent"
>
> "the cockpit lacked a clear display of the inconsistencies in airspeed readings identified by the flight computers;"
>
> Piotrek

As one that has read the original report integrally, I think that you have taken a bad example: despite the autopilot was disengaged, the stall alarm ringed a pletora of times.

There's no real alternative to the disengagement of the autopilot is that fundamental parameter is compromised.

It took the captain only a few moment to understand the problem (read the voice-recording transcription), but it was too late...

---
/Paolo
October 03, 2014
On Friday, 3 October 2014 at 20:31:42 UTC, Paolo Invernizzi wrote:

>
> As one that has read the original report integrally, I think that you have taken a bad example: despite the autopilot was disengaged, the stall alarm ringed a pletora of times.

My point was that the broken speed indicators shut down the autopilot systems.

Piotrek
October 04, 2014
On Sun, 28 Sep 2014 17:09:57 -0700
Walter Bright via Digitalmars-d <digitalmars-d@puremagic.com> wrote:

> If the program has entered an unknown state, its behavior from then on cannot be predictable.
and D compiler itself contradicts this principle. why it tries to "recover" from parsing/compiling errors? it should stop on the first encountered error and not trying to "recover" itself from unknown state. hate this. and it's inconsistent with your words.


October 04, 2014
On 10/3/2014 6:52 PM, ketmar via Digitalmars-d wrote:
> On Sun, 28 Sep 2014 17:09:57 -0700
> Walter Bright via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
>
>> If the program has entered an unknown state, its behavior from then
>> on cannot be predictable.
> and D compiler itself contradicts this principle. why it tries to
> "recover" from parsing/compiling errors? it should stop on the first
> encountered error and not trying to "recover" itself from unknown state.
> hate this. and it's inconsistent with your words.

Where's the contradiction?  The compilers state hasn't been corrupted just because it encounters errors in the text file.  In fact, it's explicitly built to detect and handle them.  There's not even a contradiction in making assumptions about what that input could have been and attempting to continue based on those assumptions.  At no time in there is the compilers internal state corrupted.

And in direct affirmation of the principle, the compiler has numerous asserts scattered around that _do_ abort compilation should an unexpected and invalid state be detected.
October 04, 2014
On Fri, 03 Oct 2014 19:25:53 -0700
Brad Roberts via Digitalmars-d <digitalmars-d@puremagic.com> wrote:

> Where's the contradiction?  The compilers state hasn't been corrupted just because it encounters errors in the text file.
but compiler is in unknown state. it can't do telepathy, and it's tries are annoying. there is no reason to guess what code programmer meant to write, it's just a source of mystic/nonsensical error messages ("garbage", in other words).

the original source of "trying to recover and continue analysis" was slow compilation times, it was really painful to restart compiler after each error. but D compilation times are good enough to stop this "guess-and-miss" nonsence. and we have good IDEs that can analyse code on background and highligh errors, so there are virtually no reasons for telepathy left.

yet many compilers (including D) still tries to do telepathy (and fails). 'cmon, it's best to improve compiling times, not attempting to guess. PL/1 fails at this, and all other compilers since then fails too.

it's strange to me that Walter telling us that program should stop once it enters unknown state, but forcing D compiler to make uneducated guesses when D compiler enters unknown state. something is very wrong with one of this things.