November 01, 2014
On 10/31/2014 5:38 PM, "Ola Fosheim Grøstad" <ola.fosheim.grostad+dlang@gmail.com>" wrote:
> Transactions roll back when there is contention for resources and/or when you
> have any kind of integrity issue. That's why you have retries… so no, it is not
> only something wrong with the input. Something is temporarily wrong with the
> situation overall.

Those are environmental errors, not programming bugs, and asserting for those conditions is the wrong approach.
November 01, 2014
On Saturday, 1 November 2014 at 03:39:02 UTC, Walter Bright wrote:
> Those are environmental errors, not programming bugs, and asserting for those conditions is the wrong approach.

The point is this: what happens in the transaction engine matters, what happens outside of it does not matter much.

Asserts do not belong in release code at all...
November 01, 2014
On Friday, 31 October 2014 at 21:33:22 UTC, H. S. Teoh via Digitalmars-d wrote:
> Again, you're using a different definition of "component".

I see no justified reasoning why process can be considered such "component" ad anything else cannot.

In practice it is completely dependent on system design as a whole and calling process a silver bullet only creates problems when it is in fact not.
November 01, 2014
On Friday, 31 October 2014 at 21:33:22 UTC, H. S. Teoh via Digitalmars-d wrote:
> You're using a different definition of "component".

System granularity is decided by the designer. You either allow people design their systems or force your design on them, if you do both, you contradict yourself.

> An inconsistency in a transaction is a problem with the input, not a problem with the program logic itself.

Distinction between failures doesn't matter. Reliable system manages any failures, especially unexpected and unforeseen ones, without diagnostic.

> If something is wrong with the input, the program
> can detect it and recover by aborting the transaction (rollback the
> wrong data). But if something is wrong with the program logic itself
> (e.g., it committed the transaction instead of rolling back when it
> detected a problem) there is no way to recover within the program
> itself.

Not the case for an airplane: it recovers from any failure within itself. Another indication that an airplane example contradicts to Walter's proposal. See my post about the big picture.

> A failed component, OTOH, is a problem with program logic. You cannot
> recover from that within the program itself, since its own logic has
> been compromised. You *can* rollback the wrong changes made to data by
> that malfunctioning program, of course, but the rollback must be done by
> a decoupled entity outside of that program. Otherwise you might end up
> causing even more problems (for example, due to the compromised /
> malfunctioning logic, the program commits the data instead of reverting
> it, thus turning an intermittent problem into a permanent one).

No misunderstanding, I think that Walter's idea is good, just not always practical, and that real critical systems don't work the way he describes, they make more complicated tradeoffs.
November 01, 2014
On Friday, 31 October 2014 at 21:06:49 UTC, H. S. Teoh via Digitalmars-d wrote:
> This does not mean that process isolation is a "silver bullet" -- I
> never said any such thing.

But made it sound that way:
> The only failsafe solution is to have multiple redundant
> processes, so when one process becomes inconsistent, you fallback to
> another process, *decoupled* process that is known to be good.

If you think a hacker rooted the server, how do you know other perfectly isolated processes are good? Not to mention you suggested to build a system from *communicating* processes, which doesn't sound like perfect isolation at all.

> You don't shutdown the *entire* network unless all redundant components have failed.

If you have a hacker in your network, the network is compromised and is in an unknown state, why do you want the network to continue operation? You contradict yourself.
November 01, 2014
On Wednesday, 29 October 2014 at 21:23:00 UTC, Walter Bright wrote:
> In any case, if the programmer knows than assert error is restricted to a particular domain, and is recoverable, and wants to recover from it, use enforce(), not assert().

But all that does is working around the assert's behavior to ignore cleanups.

Maybe, when it's known, that a failure is not restricted, some different way of failure reporting should be used?
November 01, 2014
On Sat, Nov 01, 2014 at 10:52:31AM +0000, Kagamin via Digitalmars-d wrote:
> On Friday, 31 October 2014 at 21:06:49 UTC, H. S. Teoh via Digitalmars-d wrote:
> >This does not mean that process isolation is a "silver bullet" -- I never said any such thing.
> 
> But made it sound that way:
>
> >The only failsafe solution is to have multiple redundant processes, so when one process becomes inconsistent, you fallback to another process, *decoupled* process that is known to be good.
> 
> If you think a hacker rooted the server, how do you know other perfectly isolated processes are good? Not to mention you suggested to build a system from *communicating* processes, which doesn't sound like perfect isolation at all.

You're confusing the issue. Process-level isolation is for detecting per-process faults. If you want to handle server-level faults, you need external monitoring per server, so that when it detects a possible exploit on one server, it shuts down the server and fails over to another server known to be OK.

And I said decoupled, not isolated. Decoupled means they can still communicate with each other, but with a known protocol that insulates them from each other's faults. E.g. you don't send binary executable code over the communication lines and the receiving process blindly runs it, but you send data in a predefined format that is verified by the receiving party before acting on it. I'm pretty sure this is obvious.


> >You don't shutdown the *entire* network unless all redundant components have failed.
> 
> If you have a hacker in your network, the network is compromised and is in an unknown state, why do you want the network to continue operation? You contradict yourself.

The only contradiction here is introduced by you. If one or two servers on your network have been compromised, does that mean the *entire* network is compromised? No it doesn't. It just means those one or two servers have been compromised. So you have monitoring tools setup to detect problems within the network and isolate the compromised servers. If you are no longer sure the entire network is in a good state, e.g. if your monitoring tools can't detect certain large-scale problems, then sure, go ahead and shutdown the entire network. It depends on what granularity you're operating at. A properly-designed reliable system needs to have multiple levels of monitoring and failover. You have process-level decoupling, server-level, network-level, etc.. You can't just rely on a single level of granularity and expect it to solve everything.


T

-- 
Leather is waterproof.  Ever see a cow with an umbrella?
November 01, 2014
On Sat, Nov 01, 2014 at 09:38:23AM +0000, Dicebot via Digitalmars-d wrote:
> On Friday, 31 October 2014 at 21:33:22 UTC, H. S. Teoh via Digitalmars-d wrote:
> >Again, you're using a different definition of "component".
> 
> I see no justified reasoning why process can be considered such "component" ad anything else cannot.
> 
> In practice it is completely dependent on system design as a whole and calling process a silver bullet only creates problems when it is in fact not.

I never said "component" == "process". All I said was that at the OS level, at least with current OSes, processes are the smallest unit that is decoupled from each other. If you go below that level of granularity, you have the possibility of shared memory being corrupted by one thread (or fibre, or whatever smaller than a process) affecting the other threads. So that means they are not fully decoupled, and the failure of one thread makes all other threads no longer trustworthy.

Obviously, you can go up to larger units than just processes when designing your system, as long as you can be sure they are decoupled from each other.


T

-- 
"No, John.  I want formats that are actually useful, rather than over-featured megaliths that address all questions by piling on ridiculous internal links in forms which are hideously over-complex." -- Simon St. Laurent on xml-dev
November 01, 2014
On Saturday, 1 November 2014 at 15:02:53 UTC, H. S. Teoh via Digitalmars-d wrote:
> I never said "component" == "process". All I said was that at the OS
> level, at least with current OSes, processes are the smallest unit
> that is decoupled from each other. If you go below that level of
> granularity, you have the possibility of shared memory being corrupted
> by one thread (or fibre, or whatever smaller than a process) affecting
> the other threads. So that means they are not fully decoupled, and the
> failure of one thread makes all other threads no longer trustworthy.

This is a question of probability and impact. If my Python program fails unexpectedly, then it could in theory be a bug in a c-library, but it probably is not. So it is better to trap it and continue.

If D provides bound checks, is a solid language, has a solid compiler, has a solid runtime, and solid libraries… then the same logic applies!

If my C program traps on divison by zero, then it probably is an unlucky incident and not a memory corruption issue. So it is probably safe to continue.

If my program cannot find a file, it MIGHT be a kernel issue, but it probably isn't. So it safe to continue.

If my critical state is recorded by a wall built on transactions or full blown event logging, then it is safe to continue even if my front might suffer from memory corruption.

You need to consider:

1. probability (what is the most likely cause of this signal?)

2. impact (do you have insurance?)

3. alternatives (are you in the middle of an air fight?)

November 01, 2014
On 11/1/2014 3:35 AM, Kagamin wrote:
> No misunderstanding, I think that Walter's idea is good, just not always
> practical, and that real critical systems don't work the way he describes, they
> make more complicated tradeoffs.

My ideas are what are implemented on airplanes. I didn't originate these ideas, they come from the aviation industry. Recall that I was employed as an engineer working on flight critical systems design for the 757.