Everyone who writes safety critical software should read this (page 2)

On 10/29/13 5:15 PM, Joseph Rushton Wakeling wrote: > On 29/10/13 23:20, Chris wrote: >> Good man yourself! I still can't get my head around the fact that companies fail >> to provide safety switches that either hand over the control (to humans) or at >> least disable the software based components completely by switching the machine >> off. > > All too often, the reason why management decides to use software to perform tasks is because they > don't trust their employees to do anything. > > It's a mystery to me why they don't start by finding employees they _do_ trust ... :-) As long as you're relying on trust, you're in trouble. Trust and verify. Of course, you have to trust the verification, but that trust can in turn be validated (harder to falsify stress to failure results than "yeah, it'll work" assertsions). It's part of why testing exists.

On Wednesday, 30 October 2013 at 00:28:28 UTC, Brad Roberts wrote: > As long as you're relying on trust, you're in trouble. Trust and verify. Of course, you have to trust the verification, but that trust can in turn be validated (harder to falsify stress to failure results than "yeah, it'll work" assertsions). It's part of why testing exists. Of course -- in fact, verification serves to enhance and sustain trust, the two are complementary. But not relying on blind trust doesn't make it any less daft to employ people you don't have trust in.

On Tue, Oct 29, 2013 at 05:08:57PM -0700, Walter Bright wrote: > On 10/29/2013 3:16 PM, H. S. Teoh wrote: > >On Tue, Oct 29, 2013 at 02:39:59PM -0700, Walter Bright wrote: > >>On 10/29/2013 2:38 PM, Walter Bright wrote: > >>>I wrote one for DDJ a few years back, "Safe Systems from Unreliable Parts". It's probably scrolled off their system. > >> > >> > >>http://www.drdobbs.com/architecture-and-design/safe-systems-from-unreliable-parts/228701716 > > > >This article refers to a "next instalment", but I couldn't find it. Do you have a link handy? > > > http://www.drdobbs.com/architecture-and-design/designing-safe-software-systems-part-2/228701618 Thanks! Is there a third instalment, or is this it? T -- That's not a bug; that's a feature!

October 30, 2013

Re: Everyone who writes safety critical software should read this

Posted by Joakim
in reply to Chris

Permalink

Joakim

Posted in reply to Chris

Permalink

On Tuesday, 29 October 2013 at 22:20:08 UTC, Chris wrote:
> On Tuesday, 29 October 2013 at 21:39:59 UTC, Walter Bright wrote:
>> On 10/29/2013 2:38 PM, Walter Bright wrote:
>>> I wrote one for DDJ a few years back, "Safe Systems from Unreliable Parts". It's
>>> probably scrolled off their system.
>>
>>
>> http://www.drdobbs.com/architecture-and-design/safe-systems-from-unreliable-parts/228701716
>
> Good man yourself! I still can't get my head around the fact that companies fail to provide safety switches that either hand over the control (to humans) or at least disable the software based components completely by switching the machine off.
Heh, this reminded me of my current ultrabook, the Zenbook Prime UX31A, which is an absolutely fantastic machine, the best I've ever owned, but whose designers made the unfortunate decision to make the power button just another key on the keyboard, as opposed to hard-wiring it directly to the battery.  Combine that with the fact that the keyboard connector doesn't hold its place well and is actually held in place by masking tape:

http://www.ifixit.com/Guide/Unresponsive+Keyboard+Keys/11932

Cut to me late last year, unable to turn my ultrabook on because the keyboard connector had completely slipped out, a month after I had accidentally dropped it.  I had to find the linked instructions after a bunch of googling, go pick up a Torx T5, and fix it myself, as Asus support kept insisting to everyone that it was a software issue and that they should either reinstall the drivers or the OS!  I followed those simple instructions instead and no problems till a week ago, when I had to repeat the procedure again. :)

On 10/29/2013 6:55 PM, Walter Bright wrote: > On 10/29/2013 5:54 PM, H. S. Teoh wrote: >> Is there a third instalment, or is this it? > > That's it. The ideas are actually pretty simple. The hard parts are: 1. Convincing engineers that this is the right way to do it. 2. Convincing people that improving quality, better testing, hiring better engineers, government licensing for engineers, following MISRA standards, etc., are not the solution. (Note that all of the above were proposed in the HN thread.) 3. Beating out of engineers the hubris that "this part I designed will never fail!" Jeepers, how often I've heard that. 4. Developing a mindset of "what happens when this part fails in the worst way." 5. Learning to recognize inadvertent coupling between the primary and backup systems. 6. Being familiar with the case histories of failure of related designs. 7. Developing a system to track failures, the resolutions, and check that new designs don't suffer from the same problems. (Much like D's bugzilla, the test suite, and the auto-tester.)

On Wednesday, 30 October 2013 at 03:24:54 UTC, Walter Bright wrote: > Take a look at the reddit thread on this: > > http://www.reddit.com/r/programming/comments/1pgyaa/toyotas_killer_firmware_bad_design_and_its/ > > Do a search for "failsafe". Sigh. One of the comments under the original article you posted says "Poorly designed firmware caused unintended operation, lack of driver training made it fatal." So it's the driver's fault, who couldn't possibly know what was going on in that car-gone-mad? To put the blame on the driver is cynicism of the worst kind. Unfortunately, that's a common (and dangerous) attitude I've come across among programmers and engineers. The user has to adapt to anything they fail to implement or didn't think of. However, machines have to adapt to humans not the other way around (realizing this was part of Apple's success in UI design, Ubuntu is very good now too). I warmly recommend the book "Architect or Bee": http://www.amazon.com/Architect-Bee-Human-Technology-Relationship/dp/0896081311/ref=sr_1_1?ie=UTF8&qid=1383127030&sr=8-1&keywords=architect+or+bee

On Tue, 2013-10-29 at 14:38 -0700, Walter Bright wrote: […] > I wrote one for DDJ a few years back, "Safe Systems from Unreliable Parts". It's probably scrolled off their system. Update it and republish somewhere. Remember the cool hipsters think if it is over a year old it doesn't exist. And the rest of us could always do with a good reminder of quality principles. -- Russel. ============================================================================= Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder@ekiga.net 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel@winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder

On Wednesday, 30 October 2013 at 00:16:10 UTC, Joseph Rushton Wakeling wrote: > On 29/10/13 23:20, Chris wrote: >> Good man yourself! I still can't get my head around the fact that companies fail >> to provide safety switches that either hand over the control (to humans) or at >> least disable the software based components completely by switching the machine >> off. > > All too often, the reason why management decides to use software to perform tasks is because they don't trust their employees to do anything. > > It's a mystery to me why they don't start by finding employees they _do_ trust ... :-) These are expensive, and you got to treat them well!

Forums