October 04, 2014
On Saturday, 4 October 2014 at 09:04:58 UTC, Walter Bright wrote:
>
> Airplane avionics systems all abort on error, yet the airplanes don't fall out of the sky.

To be fair, the's a function of aerodynamics more than system design.  But I see what you're getting at.
October 04, 2014
On Saturday, 4 October 2014 at 09:18:41 UTC, Walter Bright wrote:
>
> Threads are not isolated from each other. They are not. Not. Not.

Neither are programs that communicate in some fashion.  I'll grant that the possibility of memory corruption doesn't exist in this case (a problem unique to systems languages like D), but system corruption still does.  And I absolutely agree with you that if memory corruption is ever even suspected, the process must immediately halt.  In that case I wouldn't even throw an Error, I'd call exit(1).
October 04, 2014
On 10/4/2014 4:19 AM, Joseph Rushton Wakeling via Digitalmars-d wrote:
> On 04/10/14 11:18, Walter Bright via Digitalmars-d wrote:
> You seem to be convinced that I don't understand the principles you are
> advocating of isolation, backup, and so forth.  What I've been trying (but
> obviously failing) to communicate to you is, "OK, I agree on these principles,
> let's talk about how to achieve them in a practical sense with D."

Ok, I understand. My apologies for misunderstanding you.

I would suggest the best way to achieve that is to use the process isolation abilities provided by the operating system. Separate the system into processes that communicate via some messaging system provided by the operating system (not shared memory).

I read that the Chrome browser was done this way, so if one part of Chrome crashed, the failed part could be restarted without restarting the rest of Chrome.

Note that such a solution has little to do with D in particular, or C or C++. It's more to do with what the operating system provides for process isolation and interprocess communication.


> Right.  Which is why I'd like to move the discussion over to "How can we achieve
> this in D?"

D provides a lot of ability to make a single process more robust, such as pure functions, immutable data structures, unit testing, @safe, etc., so bugs are less likely. And my personal experience with developing D programs is they come up faster and are less buggy than my C++ ones. But once a bug is detected, we're back to chucking the process.
October 04, 2014
On 10/4/2014 9:16 AM, Sean Kelly wrote:
> On Saturday, 4 October 2014 at 09:18:41 UTC, Walter Bright wrote:
>>
>> Threads are not isolated from each other. They are not. Not. Not.
>
> Neither are programs that communicate in some fashion.

Operating systems typically provide methods of interprocess communication that are robust against corruption, such as pipes, message passing, etc. The receiving process should regard such input as "user/environmental input", and must validate it. Corruption in it would not be regarded as a logic bug in the receiving process (unless it failed to check for it).

Interprocess shared memory, though, is not robust.



> I'll grant that the
> possibility of memory corruption doesn't exist in this case (a problem unique to
> systems languages like D), but system corruption still does.  And I absolutely
> agree with you that if memory corruption is ever even suspected, the process
> must immediately halt.  In that case I wouldn't even throw an Error, I'd call
> exit(1).

System corruption is indeed a problem with this type of setup. We're relying here on the operating system not having such bugs in it, and indeed OS vendors work very hard at preventing an errant program from corrupting the system.

We all know, of course, that this sort of thing happens anyway. An even more robust system design will need a way to deal with that, and failure of the hardware, and failure of the data center, etc.

All components of a reliable system are unreliable, and a robust system needs to be able to recover from the inevitable failure of any component. This kind of thinking needs to pervade the initial system design from the ground up, it's hard to tack it on later.

October 04, 2014
On 10/4/2014 4:39 AM, Joseph Rushton Wakeling wrote:
> The thing is, the privilege to make that kind of business decision is wholly
> dependent on the fact that there are no meaningful safety issues involved.
>
> Compare that to the case of the Ford Pinto.  The allegation made was that Ford
> had preferred to risk paying out lawsuits to injured drivers over fixing a
> design flaw responsible for those (serious) injuries, because a cost-benefit
> analysis had shown the payouts were cheaper than rolling out the fix.  This
> allegation was rightly met with outrage, and severe punitive damages in court.

Unfortunately, such business decisions are always made. Nobody can make a 100% safe system, and if one even tried, such a system would be unusable. A car where safety was the overriding priority could not move an inch, nobody could afford to buy one, etc.

The best one can do in an imperfect world is set a standard of the maximum probability of a fatal accident. In aviation, this standard is set by regulation, and airframe manufacturers are obliged to prove that the system reliability is greater than that standard, in order to get their designs certified.

The debate then is how high can that standard be set and still have affordable, useful products.
October 04, 2014
On 10/4/2014 7:13 AM, H. S. Teoh via Digitalmars-d wrote:
> "Beware -- I've only proven that the code is correct, not tested it." --
> Donald Knuth.
>
> :-)

Quotes like that prove (!) what a cool guy Knuth is!

October 04, 2014
On Saturday, 4 October 2014 at 09:40:26 UTC, Walter Bright wrote:
> Sorry, Ola, you've never written bug-free software, and nobody else has, either.

I have, but only simple ones. This is also key to making systems robust. Army equipment tend to consist of few parts, robust construction, and as little fancy features as you can get away with. But it us often rather unconvinient and hard to adapt to non-army settings.

D is on the other side of the spectrum. Nothing wrong with it, but not really following the principles that would make it suitable for creating simple robust systems.

>> by the transaction engine. The world outside of the
>> transaction engine has NO WAY of affecting integrity.
>
> Hardware fails, too.

Sure, but the point is to have integrity ensured in a conceptually simple system that has been harnessed at a cost level that exceeds the budget of any single application.

> That doesn't mean there are no backups to the primary flight control computer.

No, but the point is that the operational context matters. Robustness is something you have to reason about on a probabilistic level in relation to the operational context.

> The assumption that "proof" means the code doesn't have bugs is charming, but still false.

A validated correctness proof ensures that the code follows the specification, so no bugs.

> > Failure can still happen if the stabilizing model is
> inadequate.
>
> It seems we can't escape bugs.

An inadeqate specification is not a bug!

> Again, warplanes are not built to airliner safety standards. They have different priorities.

Indeed, so the operational context is what matters, therefore the app should set the priorities, not the language and libraries.

October 04, 2014
On 10/4/2014 9:09 AM, Sean Kelly wrote:
> On Saturday, 4 October 2014 at 08:15:51 UTC, Walter Bright wrote:
>> On 10/3/2014 8:43 AM, Sean Kelly wrote:
>>> My point, and I think Kagamin's as well, is that the entire plane is a system
>>> and the redundant internals are subsystems.  They may not share memory, but they
>>> are wired to the same sensors, servos, displays, etc.
>>
>> No, they do not share sensors, servos, etc.
>
> Gotcha.  I imagine there are redundant displays in the cockpit as well, which
> makes sense.  Thus the unifying factor in an airplane is the pilot.

Even the pilot has a backup!

Next time you go flying, peek in the cockpit. You'll see dual instruments and displays. If you examine the outside, you'll see two (or three) pitot tubes (which measure airspeed).


> Right.  So the system relies on the intelligence and training of the pilot for
> proper operation.  Choosing which systems are in error vs. which are correct,
> etc.

A lot of design revolves around making it obvious which component is the failed one, the classic being a red light on the instrument panel.


> I still think an argument could be made that an entire airplane, pilot
> included, is analogous to a server infrastructure, or even a memory isolated
> program (the Erlang example).

Anyone with little training can fly an airplane. Heck, you can go to any flight school and they'll take you up on an introductory flight and let you try out the controls in flight. Most of a pilot's training consists of learning how to deal with failure.


> My only point in all this is that while choosing the OS process is a good
> default when considering the potential scope of undefined behavior, it's not the
> only definition.  The pilot misinterpreting sensor data and making a bad
> judgement call is equivalent to the failure of distinct subsystems corrupting
> the state of the entire system to the point where the whole thing fails.  The
> sensors were communicating confusing information to the pilot, and his
> programming, as it were, was not up to the task of separating the good
> information from the bad.

That's true. Many accidents have resulted from the pilot getting confused about the failures being reported to him, and his failure to properly grasp the situation and what to do about it. All of these result in reevaluations of how failures are presented to the pilot, and the pilot's training and procedures.

On the other hand, many failures have not resulted in accidents because of the pilot's ability to "think outside the box" and come up with a creative solution on the spot. It's why we need human pilots. These solutions then become part of standard procedure!


> Do you have any thoughts concerning my proposal in the "on errors" thread?

Looks interesting, but haven't gotten to it yet.
October 04, 2014
On 10/4/2014 4:25 AM, Joseph Rushton Wakeling wrote:
> Would it help to clarify my intentions in this discussion if I said that, on
> this note, I entirely agree -- and nothing I have said in this discussion is
> intended to be an argument about how Phobos should be designed?

Yes. Thank you!
October 04, 2014
On 10/4/2014 6:36 AM, Joseph Rushton Wakeling via Digitalmars-d wrote:
> Suppose that I implement, in D, a framework creating Erlang-style processes
> (i.e. properly isolated, lightweight processes within a defined runtime
> environment, with an appropriate error-handling framework that allows those
> processes to be brought down and restarted without bringing down the entire
> application).
>
> Is there any reasonable scope for accessing Phobos directly from programs
> written to operate within that runtime, or is it going to be necessary to wrap
> all of Phobos in order to ensure that it's accessed in a safe way (e.g. to
> ensure that the conditions required of in contracts are enforced before the call
> gets to phobos, etc.)?

A start to this would be to ensure that the erlang-style processes only call pure functions. Then I'd add pervasive use of immutable data structures. This should help a lot.