Jump to page: 1 2
Thread overview
Developing Mars lander software
Feb 18, 2014
Walter Bright
Feb 18, 2014
Craig Dillabaugh
Feb 19, 2014
Tolga Cakiroglu
Feb 19, 2014
Xinok
Feb 19, 2014
Tolga Cakiroglu
Feb 19, 2014
Xinok
Feb 21, 2014
Jesse Phillips
Feb 19, 2014
Paulo Pinto
Feb 19, 2014
Walter Bright
Feb 19, 2014
Paulo Pinto
Feb 19, 2014
Russel Winder
Feb 19, 2014
Walter Bright
Feb 19, 2014
Russel Winder
February 18, 2014
http://cacm.acm.org/magazines/2014/2/171689-mars-code/fulltext

Some interesting tidbits:

"We later revised it to require that the flight software as a whole, and each module within it, had to reach a minimal assertion density of 2%. There is compelling evidence that higher assertion densities correlate with lower residual defect densities."

This has been my experience with asserts, too.

"A failing assertion is now tied in with the fault-protection system and by default places the spacecraft into a predefined safe state where the cause of the failure can be diagnosed carefully before normal operation is resumed."

Nice to see confirmation of that.

"Running the same landing software on two CPUs in parallel offers little protection against software defects. Two different versions of the entry-descent-and-landing code were therefore developed, with the version running on the backup CPU a simplified version of the primary version running on the main CPU. In the case where the main CPU would have unexpectedly failed during the landing sequence, the backup CPU was programmed to take control and continue the sequence following the simplified procedure."

An example of using dual systems for reliability.
February 18, 2014
On Tuesday, 18 February 2014 at 23:05:21 UTC, Walter Bright wrote:
> http://cacm.acm.org/magazines/2014/2/171689-mars-code/fulltext
>
> Some interesting tidbits:
>
> "We later revised it to require that the flight software as a whole, and each module within it, had to reach a minimal assertion density of 2%. There is compelling evidence that higher assertion densities correlate with lower residual defect densities."
>
> This has been my experience with asserts, too.
>
> "A failing assertion is now tied in with the fault-protection system and by default places the spacecraft into a predefined safe state where the cause of the failure can be diagnosed carefully before normal operation is resumed."
>
> Nice to see confirmation of that.
>
> "Running the same landing software on two CPUs in parallel offers little protection against software defects. Two different versions of the entry-descent-and-landing code were therefore developed, with the version running on the backup CPU a simplified version of the primary version running on the main CPU. In the case where the main CPU would have unexpectedly failed during the landing sequence, the backup CPU was programmed to take control and continue the sequence following the simplified procedure."
>
> An example of using dual systems for reliability.

I thought you were going to tell us it had been developed using D.  It only seems right that a Mars lander would use Digital Mars software. Plus isn't D safer than C99?

Maybe when they send the manned mission to Mars they will do the right thing:o)
February 19, 2014
On Tuesday, 18 February 2014 at 23:05:21 UTC, Walter Bright wrote:
> http://cacm.acm.org/magazines/2014/2/171689-mars-code/fulltext
>
> Some interesting tidbits:
>
> "We later revised it to require that the flight software as a whole, and each module within it, had to reach a minimal assertion density of 2%. There is compelling evidence that higher assertion densities correlate with lower residual defect densities."
>
> This has been my experience with asserts, too.
>
> "A failing assertion is now tied in with the fault-protection system and by default places the spacecraft into a predefined safe state where the cause of the failure can be diagnosed carefully before normal operation is resumed."
>
> Nice to see confirmation of that.
>
> "Running the same landing software on two CPUs in parallel offers little protection against software defects. Two different versions of the entry-descent-and-landing code were therefore developed, with the version running on the backup CPU a simplified version of the primary version running on the main CPU. In the case where the main CPU would have unexpectedly failed during the landing sequence, the backup CPU was programmed to take control and continue the sequence following the simplified procedure."
>
> An example of using dual systems for reliability.

TL;DR the link though, how are they detecting that a CPU fails? An information must be passes outside of CPU to do this. The only solution comes to my mind is that main CPU changes a variable on an external memory at every step, and back up CPU checks it continuously to catch a failure immediately. But this would require about 50% of CPU's power already.

While thinking about this kind of back up systems, knowing and reading that some people are really doing is really great.
February 19, 2014
On Wednesday, 19 February 2014 at 00:16:03 UTC, Tolga Cakiroglu wrote:
>
> TL;DR the link though, how are they detecting that a CPU fails? An information must be passes outside of CPU to do this. The only solution comes to my mind is that main CPU changes a variable on an external memory at every step, and back up CPU checks it continuously to catch a failure immediately. But this would require about 50% of CPU's power already.
>
> While thinking about this kind of back up systems, knowing and reading that some people are really doing is really great.
>

I'm assuming this has something to do with it:
https://en.wikipedia.org/wiki/Heartbeat_%28computing%29

In clustered servers, the active node sends a continuous signal indicating it's still alive. This signal is referred to as a heartbeat. There's a standby node waiting to take over should it stop receiving this signal.
February 19, 2014
On Wednesday, 19 February 2014 at 01:09:43 UTC, Xinok wrote:
> On Wednesday, 19 February 2014 at 00:16:03 UTC, Tolga Cakiroglu wrote:
>>
>> TL;DR the link though, how are they detecting that a CPU fails? An information must be passes outside of CPU to do this. The only solution comes to my mind is that main CPU changes a variable on an external memory at every step, and back up CPU checks it continuously to catch a failure immediately. But this would require about 50% of CPU's power already.
>>
>> While thinking about this kind of back up systems, knowing and reading that some people are really doing is really great.
>>
>
> I'm assuming this has something to do with it:
> https://en.wikipedia.org/wiki/Heartbeat_%28computing%29
>
> In clustered servers, the active node sends a continuous signal indicating it's still alive. This signal is referred to as a heartbeat. There's a standby node waiting to take over should it stop receiving this signal.

I think only knowing that it has failed is not enough. Because the process is landing, and other CPU should know where the process is left. With that heatbeat signal, only option is that all sensor information must be sent both CPUs continuously and sensor values should be enough about what next step to be taken. Then I think it can continue the process flawlessly.
February 19, 2014
On Tuesday, 18 February 2014 at 23:05:21 UTC, Walter Bright wrote:
> http://cacm.acm.org/magazines/2014/2/171689-mars-code/fulltext
>
> Some interesting tidbits:
>
> "We later revised it to require that the flight software as a whole, and each module within it, had to reach a minimal assertion density of 2%. There is compelling evidence that higher assertion densities correlate with lower residual defect densities."
>
> This has been my experience with asserts, too.
>
> "A failing assertion is now tied in with the fault-protection system and by default places the spacecraft into a predefined safe state where the cause of the failure can be diagnosed carefully before normal operation is resumed."
>
> Nice to see confirmation of that.
>
> "Running the same landing software on two CPUs in parallel offers little protection against software defects. Two different versions of the entry-descent-and-landing code were therefore developed, with the version running on the backup CPU a simplified version of the primary version running on the main CPU. In the case where the main CPU would have unexpectedly failed during the landing sequence, the backup CPU was programmed to take control and continue the sequence following the simplified procedure."
>
> An example of using dual systems for reliability.


Having read Code Complete when I was at the university, coupled
with Ada and Eiffel experience, allowed me to live better with C
by

- compiling with all warnings enabled as errors
- making judicious use of assert as poor man's contract system
- running the code regularly with static analyzers

Regarding the last point, I read somewhere that lint was actually
supposed to be part of C toolchain, but when most people tried to
port C to home computers it was highly overlooked, hence the
situation we got into with C.

--
Paulo
February 19, 2014
On 2/19/2014 12:25 AM, Paulo Pinto wrote:
> Regarding the last point, I read somewhere that lint was actually
> supposed to be part of C toolchain, but when most people tried to
> port C to home computers it was highly overlooked, hence the
> situation we got into with C.

The unix toolchain did not port well to 16 bit machines. The C compilers for the PC were pretty much all built from scratch.

February 19, 2014
On Wednesday, 19 February 2014 at 08:49:36 UTC, Walter Bright
wrote:
> On 2/19/2014 12:25 AM, Paulo Pinto wrote:
>> Regarding the last point, I read somewhere that lint was actually
>> supposed to be part of C toolchain, but when most people tried to
>> port C to home computers it was highly overlooked, hence the
>> situation we got into with C.
>
> The unix toolchain did not port well to 16 bit machines. The C compilers for the PC were pretty much all built from scratch.

Thanks for the info.
February 19, 2014
On Wed, 2014-02-19 at 00:49 -0800, Walter Bright wrote: […]
> The unix toolchain did not port well to 16 bit machines. The C compilers for the PC were pretty much all built from scratch.

On the other hand, the UNIX tool chain worked fine for me on PDP-11s, which were very definitely 16-bit. Though we were very happy when we got a VAX-11/750 and 32-bits.

-- 
Russel. ============================================================================= Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.winder@ekiga.net 41 Buckmaster Road    m: +44 7770 465 077   xmpp: russel@winder.org.uk London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder

February 19, 2014
On 2/19/2014 2:21 AM, Russel Winder wrote:
> On Wed, 2014-02-19 at 00:49 -0800, Walter Bright wrote:
> […]
>> The unix toolchain did not port well to 16 bit machines. The C compilers for the
>> PC were pretty much all built from scratch.
>
> On the other hand, the UNIX tool chain worked fine for me on PDP-11s,
> which were very definitely 16-bit. Though we were very happy when we got
> a VAX-11/750 and 32-bits.
>

PC compilers needed to support multiple pointer types. The 11 did not have segmented addresses, so this was irrelevant for the 11. Trying to retrofit unix compilers with near/far/huge turned out to not be so practical, at least I don't know anyone who tried it.
« First   ‹ Prev
1 2