September 28, 2014
29-Sep-2014 01:21, Sean Kelly пишет:
> On Sunday, 28 September 2014 at 21:16:51 UTC, Dmitry Olshansky wrote:
>>
>> But otherwise agreed, dropping the whole process is not always a good
>> idea or it easily becomes a DoS attack vector in a public service.
>
> What I really want to work towards is the Erlang model where an app is a
> web of communicating processes (though Erlang processes are effectively
> equivalent to D objects).  Then, killing a process on an error is
> absolutely correct.  It doesn't affect the resilience of the system.
> But if these processes are actually threads or fibers with memory
> protection, things get a lot more complicated.

One thing I really appreciated about JVM is exactly the memory safety with ability to handle this pretty much in the same way Erlang does.

> I really need to spend
> some time investigating how modern Linux systems handle tons of
> processes running on them and try to find a happy medium.

Keep us posted.

-- 
Dmitry Olshansky
September 28, 2014
On 9/28/2014 1:50 PM, Sean Kelly wrote:
> On Sunday, 28 September 2014 at 20:31:03 UTC, Walter Bright wrote:
>> > The scope of a logic bug can be known to be quite limited.
>>
>> If you know about the bug, then you'd have fixed it already instead of
>> inserting recovery code for unknown problems. I can't really accept that one
>> has "unknown bugs of known scope".
>
> Well, say you're using SafeD or some other system where you know that memory
> corruption is not possible (pure functional programming, for example).  In this
> case, if you know what data a particular execution flow touches, you know the
> scope of the potential damage.  And if the data touched is all either shared but
> read-only or generated during the processing of the request, you can be
> reasonably certain that nothing outside the scope of the transaction has been
> adversely affected at all.

You may know the error is not a memory corrupting one, but that doesn't mean there aren't non-corrupting changes to the shared memory that would result in additional unexpected failures. Also, the logic bug may be the result of an @system part of the code going wrong. You do not know, because YOU DO NOT KNOW the cause the error. And if you knew the cause, you wouldn't need a stack trace to debug it anyway.

I.e. despite being 'safe' it does not imply the program is in a predictable or anticipated state.

I can't get behind the notion of "reasonably certain". I certainly would not use such techniques in any code that needs to be robust, and we should not be using such cowboy techniques in Phobos nor officially advocate their use.
September 28, 2014
On 28/09/14 22:13, Walter Bright via Digitalmars-d wrote:
> On 9/28/2014 12:33 PM, Sean Kelly wrote:
>>> Then use assert(). That's just what it's for.
>> What if I don't want to be forced to abort the program in the event of such an
>> error?
>
> Then we are back to the discussion about can a program continue after a logic
> error is uncovered, or not.
>
> In any program, the programmer must decide if an error is a bug or not, before
> shipping it. Trying to avoid making this decision leads to confusion and using
> the wrong techniques to deal with it.
>
> A program bug is, by definition, unknown and unanticipated. The idea that one
> can "recover" from it is fundamentally wrong. Of course, in D one can try and
> recover from them anyway, but you're on your own trying that, just as you're on
> your own when casting integers to pointers.

Allowing for your "you can try ..." remarks, I still feel this doesn't really cover the practical realities of how some applications need to behave.

Put it this way: suppose we're writing the software for a telephone exchange, which is handling thousands of simultaneous calls.  If an Error is thrown inside the part of the code handling one single call, is it correct to bring down everyone else's call too?

I appreciate that you might tell me "You need to find a different means of error handling that can distinguish errors that are recoverable", but the bottom line is, in such a scenario it's not possible to completely rule out an Error being thrown (an obvious cause would be an assert that gets triggered because the programmer forgot to put a corresponding enforce() statement at a higher level in the code).

However, it's clearly very desirable in this use-case for the application to keep going if at all possible and for any problem, even an Error, to be contained in its local context if we can do so.  (By "local context", in practice this probably means a thread or fiber or some other similar programming construct.)

Sean's touched on this in the current thread with his reference to Erlang, and I remember that he and Dicebot brought the issue up in an earlier discussion on the Error vs. Exception question, but I don't recall that discussion having any firm conclusion, and I think it's important to address; we can't simply take "An Error is unrecoverable" as a point of principle for every application.

(Related note: If I recall right, an Error or uncaught Exception thrown within a thread or fiber will not actually bring the application down, only cause that thread/fiber to hang, without printing any indication of anything going wrong. So on a purely practical basis, it can be essential for the top-level code of a thread or fiber to have a catch {} block for both Errors and Exceptions, just in order to be able to report what has happened effectively.)
September 28, 2014
On 9/28/2014 1:56 PM, H. S. Teoh via Digitalmars-d wrote:
> It looks even more awful when the person who wrote the library code is
> Russian, and the user speaks English, and when an uncaught exception
> terminates the program, you get a completely incomprehensible message in
> a language you don't know. Not much different from a line number and
> filename that has no meaning for a user.

I cannot buy into the logic that since Russian error messages are incomprehensible to me, that therefore incomprehensible messages are ok.

> That's why I said, an uncaught exception is a BUG.

It's a valid opinion, but is not the way D is designed to work.


> The only place where
> user-readable messages can be output is in a catch block where you
> actually have the chance to localize the error string. But if no catch
> block catches it, then by definition it's a bug, and you might as while
> print some useful info with it that your users can send back to you,
> rather than unhelpful bug reports of the form "the program crashed with
> error message 'internal error'".

If anyone is writing code that throws an Exception with "internal error", then they are MISUSING exceptions to throw on logic bugs. I've been arguing this all along.


> if the program failed to catch an exception, you're already screwed
> anyway

This is simply not true. One can write utilities with no caught exceptions at all, and yet have the program emit user friendly messages about "disk full" and stuff like that.


> so why not provide more info rather than less?

Because having an internal stack dump presented to the app user for when he, say, puts in invalid command line arguments, is quite inappropriate.


> Unless, of course, you're suggesting that we put this around every
> main() function:
>
> 	void main() {
> 		try {
> 			...
> 		} catch(Exception e) {
> 			assert(0, "Unhandled exception: I screwed up");
> 		}
> 	}

I'm not suggesting that Exceptions are to be thrown on programmer screwups - I suggest the OPPOSITE.



September 28, 2014
On 28/09/14 19:33, Walter Bright via Digitalmars-d wrote:
> On 9/28/2014 9:23 AM, Sean Kelly wrote:
>> Also, I think the idea that a program is created and shipped to an end user is
>> overly simplistic.  In the server/cloud programming world, when an error occurs,
>> the client who submitted the request will get a response appropriate for them
>> and the system will also generate log information intended for people working on
>> the system.  So things like stack traces and assertion failure information is
>> useful even for production software.  Same with any critical system, as I'm sure
>> you're aware.  The systems are designed to handle failures in specific ways, but
>> they also have to leave a breadcrumb trail so the underlying problem can be
>> diagnosed and fixed.  Internal testing is never perfect, and achieving a high
>> coverage percentage is nearly impossible if the system wasn't designed from the
>> ground up to be testable in such a way (mock frameworks and such).
>
> Then use assert(). That's just what it's for.

I don't follow this point.  How can this approach work with programs that are built with the -release switch?

Moreover, Sean's points here are absolutely on the money -- there are cases where the "users" of a program may indeed want to see traces even for anticipated errors.  And even if you design a nice structure of throwing and catching exceptions so that the simple error message _always_ gives good enough context to understand what went wrong, you still have the other issue that Sean raised -- of an exception accidentally escaping its intended scope, because you forgot to handle it -- when a trace may be extremely useful.

Put it another way -- I think you make a good case that stack traces for exceptions should be turned off by default (possibly just in -release mode?), but if that happens I think there's also a good case for a build flag that ensures stack traces _are_ shown for Exceptions as well as Errors.
September 29, 2014
On 9/28/2014 3:51 PM, Joseph Rushton Wakeling via Digitalmars-d wrote:
> However, it's clearly very desirable in this use-case for the application to
> keep going if at all possible and for any problem, even an Error, to be
> contained in its local context if we can do so.  (By "local context", in
> practice this probably means a thread or fiber or some other similar programming
> construct.)

If the program has entered an unknown state, its behavior from then on cannot be predictable. There's nothing I or D can do about that. D cannot officially endorse such a practice, though D being a systems programming language it will let you do what you want.

I would not even consider such a practice for a program that is in charge of anything that could result in injury, death, property damage, security breaches, etc.

September 29, 2014
On 9/28/2014 4:18 PM, Joseph Rushton Wakeling via Digitalmars-d wrote:
> I don't follow this point.  How can this approach work with programs that are
> built with the -release switch?

All -release does is not generate code for assert()s. To leave the asserts in, do not use -release. If you still want the asserts to be in even with -release,

    if (condition) assert(0);


> Moreover, Sean's points here are absolutely on the money -- there are cases
> where the "users" of a program may indeed want to see traces even for
> anticipated errors.
> And even if you design a nice structure of throwing and
> catching exceptions so that the simple error message _always_ gives good enough
> context to understand what went wrong, you still have the other issue that Sean
> raised -- of an exception accidentally escaping its intended scope, because you
> forgot to handle it -- when a trace may be extremely useful.
>
> Put it another way -- I think you make a good case that stack traces for
> exceptions should be turned off by default (possibly just in -release mode?),
> but if that happens I think there's also a good case for a build flag that
> ensures stack traces _are_ shown for Exceptions as well as Errors.

The -g switch should take care of that. It's what I use when I need a stack trace, as there are many ways a program can fail (not just Errors).

September 29, 2014
On Sunday, 28 September 2014 at 22:00:24 UTC, Walter Bright wrote:
>
> I can't get behind the notion of "reasonably certain". I certainly would not use such techniques in any code that needs to be robust, and we should not be using such cowboy techniques in Phobos nor officially advocate their use.

I think it's a fair stance not to advocate this approach.  But as it is I spend a good portion of my time diagnosing bugs in production systems based entirely on archived log data, and analyzing the potential impact on the system to determine the importance of a hot fix.  The industry seems to be moving towards lowering the barrier between engineering and production code (look at what Netflix has done for example), and some of this comes from an isolation model akin to the Erlang approach, but the typical case is still that hot fixing code is incredibly expensive and so you don't want to do it if it isn't necessary.  For me, the correct approach may simply be to eschew assert() in favor of enforce() in some cases.  But the direction I want to be headed is the one you're encouraging.  I simply don't know if it's practical from a performance perspective.  This is still developing territory.
September 29, 2014
On Monday, 29 September 2014 at 00:09:59 UTC, Walter Bright wrote:
> On 9/28/2014 3:51 PM, Joseph Rushton Wakeling via Digitalmars-d wrote:
>> However, it's clearly very desirable in this use-case for the application to
>> keep going if at all possible and for any problem, even an Error, to be
>> contained in its local context if we can do so.  (By "local context", in
>> practice this probably means a thread or fiber or some other similar programming
>> construct.)
>
> If the program has entered an unknown state, its behavior from then on cannot be predictable. There's nothing I or D can do about that. D cannot officially endorse such a practice, though D being a systems programming language it will let you do what you want.
>
> I would not even consider such a practice for a program that is in charge of anything that could result in injury, death, property damage, security breaches, etc.

Well... suppose you design a system with redundancy such that an error in a specific process isn't enough to bring down the system.  Say it's a quorum method or whatever.  In the instance that a process goes crazy, I would argue that the system is in an undefined state but a state that it's designed specifically to handle, even if that state can't be explicitly defined at design time.  Now if enough things go wrong at once the whole system will still fail, but it's about building systems with the expectation that errors will occur.  They may even be logic errors--I think it's kind of irrelevant at that point.

Even a network of communicating processes, one getting in a bad state can theoretically poison the entire system and you're often not in a position to simply shut down the whole thing and wait for a repairman.  And simply rebooting the system if it's a bad sensor that's causing the problem just means a pause before another failure cascade.  I think any modern program designed to run continuously (increasingly the typical case) must be designed with some degree of resiliency or self-healing in mind.  And that means planning for and limiting the scope of undefined behavior.
September 29, 2014
On Mon, 29 Sep 2014 01:18:02 +0200
Joseph Rushton Wakeling via Digitalmars-d <digitalmars-d@puremagic.com>
wrote:

> > Then use assert(). That's just what it's for.
> 
> I don't follow this point.  How can this approach work with programs that are built with the -release switch?

don't use "-release" switch. the whole concept of "release version" is broken by design. ship what you debugged, not what you think you debugged.