July 11, 2018
On Wednesday, 11 July 2018 at 18:27:33 UTC, Brad Roberts wrote:

> ... application exiting asserts in production.  Yes, you kill the app.  You exit as fast and often as the errors occur.  You know what happens?  You find the bugs faster, you fix them even faster, and the result is solid software.
You mean that the serious consequences of errors better motivate programmers? Then I have an idea. If you connect the current to the chairs of the developers, and with each failed assert the programmer responsible for this part will receive an electrical discharge, the code will surely become even more reliable. But I want the error found in the production not to lead to a drop in the service, affecting all the users who are currently on the site, and this is a slightly different aspect.

> When you're afraid of your software and afraid to make changes to it, you make bad choices.  Embrace every strategy you can find to help you find problems as quickly as possible.
Sorry, but I'm not sure I understand how this relates to the topic. Still, I do not think that a failed assert message in the log allows you to find an error faster than a similar message, but about an exception.

July 11, 2018
On Wednesday, 11 July 2018 at 13:19:01 UTC, Joakim wrote:
> ...
> Sounds like you're describing the "Let it crash" philosophy of Erlang:
>
> https://ferd.ca/the-zen-of-erlang.html
  I never program Erlang, but I think yes, something like this. The people who developed Erlang definitely have a lot of experience developing services.

> The crucial point is whether you can depend on the error being isolated, as in Erlang's lightweight processes. I guess D assumes it isn't.
 I think if we have a task with safe code only, and communication with message passing, it's isolated good enough to make error kill this task only. In any case, I still can drop the whole application myself if I think it will be the more safe way to deal with errors. So paranoids do not lose anything in the case of this approach.
July 11, 2018
On Wednesday, 11 July 2018 at 12:45:40 UTC, crimaniak wrote:
> The error should be maximally localized, and the programmer should be able to respond to any type of errors. The very nature of the work of WEB applications contributes to this. As a rule, queries are handled by short-lived tasks that work with thread-local memory, and killing only the task that caused the error, with the transfer of the exception to the calling task, would radically improve the situation.

Hmm. The fun fun fun thing about undefined behaviour in the absence of MMU's is the effects are maximally _unlocalized_.

ie. It can corrupt _any_ part of the system.

A use after free for example, or an index out of bounds on the heap, can corrupt all and any  subsystem sharing the same virtual address space.

Part of the reason why Walter is pushing so hard for memory safety.

Memory Safety is truly a huge step away from the world of pain that is C/C++.... it removes a truly huge class of defects.

However, it also removes a common terminology. Odds on you know what I mean when I say "use after free" or "index out of bounds".

Now in the levels above the language and the library, humans are equally capable of screwing up and corrupting our own work.... except the language can no longer help you.

Above the language and the library, we no longer have a common terminology for describing the myriad ways you can shoot yourself in the foot.

The language can, through encapsulation "minimize the blast radius", but can't stop you.

I disagree with Bjarne Stroustrup on many things.... but in this article he is absolutely spot on. https://www.artima.com/intv/goldilocks3.html

Please read it, it's probably the most important article on Object Oriented Design you'll find.

Now the problem with "unexpected" exceptions is, odds on you are left with a broken invariant.

ie. Odds on you are left with an object you now cannot reasonably expect to function.

ie. Odds on that object you cannot expect to function, is part of a larger object or subsystem you now cannot reasonably expect to function.

ie. You left with a system that will progressively become flakier and flakier and less responsive and less reliable.

The only sane response really is to reset to a defined state as quickly as possible. ie. Capture a backtrace, exit process and restart.

Your effort in trying to catch and handle unexpected events to achieve uptime is misplaced, you are much better served by Chaos Monkeys.

ie. Deliberately randomly "hard kill" your running systems at random moments and spend your efforts on designing for no resulting corruption and rapid and reliable reset.

I certainly wouldn't unleash Chaos Monkeys on a production system until I was really comfortable with the behaviour of on a test system....
July 11, 2018
On 7/11/2018 3:24 PM, crimaniak via Digitalmars-d wrote:
> On Wednesday, 11 July 2018 at 18:27:33 UTC, Brad Roberts wrote:
> 
>> ... application exiting asserts in production.  Yes, you kill the app.  You exit as fast and often as the errors occur.  You know what happens?  You find the bugs faster, you fix them even faster, and the result is solid software.
> You mean that the serious consequences of errors better motivate programmers? Then I have an idea. If you connect the current to the chairs of the developers, and with each failed assert the programmer responsible for this part will receive an electrical discharge, the code will surely become even more reliable. But I want the error found in the production not to lead to a drop in the service, affecting all the users who are currently on the site, and this is a slightly different aspect.

Motivation is a part of it, to be sure, but only a tiny part.  Asserts and the heavy use of them changes how you think about system state validation.  Yes, you can do that without asserts but I've found that when you tend towards system recovery and error mitigation style thinking you tend to be thinking about getting out of that state, not never getting into it.

As to applying punishments for errors, that tends to be a bad motivator too.  It encourages hiding problems rather than preventing them.

All in all, I'm mostly presenting anecdotal that embracing the style of programming you're arguing against has produced very good results, repeatably, in my work experience.

There's a big topic / discussion area in here about fault isolation.  If you really want things to be able to fail independently, then they need to be separate enough that faults in one cannot affect the other.  Most languages today don't provide the barriers within a process to have multiple fault domains.  None in the c family of languages does.  Erlang is a good example of one that does.  Given the industry and userbase that uses the language, it's not at all shocking that it too embraces the concept of fail fast, don't try to recover.

Anyway, this is one of the areas where people clearly have different philosophies and changing minds is unlikely to happen.
July 11, 2018
On 7/11/2018 11:27 AM, Brad Roberts wrote:
> When you're afraid of your software and afraid to make changes to it, you make bad choices.  Embrace every strategy you can find to help you find problems as quickly as possible.

It's good to hear my opinions on the subject backed by major experience! Thanks for posting.
July 11, 2018
On 7/11/2018 6:54 PM, Brad Roberts wrote:
> Anyway, this is one of the areas where people clearly have different philosophies and changing minds is unlikely to happen.

True, but that doesn't mean each philosophy is equally valid. Some ideas are better than others :-)

BTW, the "fail fast with asserts" is one I was pretty much forced into with DOS real mode programming, and it has served me well for a very long time. It is also based on my experience with Boeing engineering philosophy - and that has resulted in incredibly safe airliners.
July 11, 2018
On 7/11/2018 4:56 PM, John Carter wrote:
> I disagree with Bjarne Stroustrup on many things.... but in this article he is absolutely spot on. https://www.artima.com/intv/goldilocks3.html

It's a great article, and a quick read.
July 12, 2018
On Wednesday, 11 July 2018 23:39:49 MDT Walter Bright via Digitalmars-d wrote:
> On 7/11/2018 6:54 PM, Brad Roberts wrote:
> > Anyway, this is one of the areas where people clearly have different philosophies and changing minds is unlikely to happen.
>
> True, but that doesn't mean each philosophy is equally valid. Some ideas are better than others :-)
>
> BTW, the "fail fast with asserts" is one I was pretty much forced into with DOS real mode programming, and it has served me well for a very long time. It is also based on my experience with Boeing engineering philosophy - and that has resulted in incredibly safe airliners.

This discussion reminds me of an interview from Bryan Cantrill a couple of years back when he was complaining about how Linus was talking about turning all of the BUG_ONs in the Linux kernel into WARN_ONs, because they were getting too many crashes, since that's just hiding bugs rather than shoving them in your face so that you can find them and fix them. Yes, it really sucks when your program crashes, but if you have a check for invalid state in your program, you want it to fail fast so that it does not continue to execute and do who knows what (which is particularly bad for an OS kernel) and so that you know that it's happening - and preferably have it provide a core dump so that you hopefully have the information you need to debug it. Then the problem can be fixed and stop being a problem, reducing the number of bugs in your program and increasing its stability, whereas if you try to hide bugs and continue, then you never even find out that it's a problem, and it doesn't get fixed.

So, there are definitely programmers out there who agree with you even if there are also plenty out there who don't.

- Jonathan M Davis



July 12, 2018
On 7/10/18 6:59 PM, Jonathan M Davis wrote:
> On Tuesday, 10 July 2018 16:48:41 MDT Steven Schveighoffer via Digitalmars-d
> wrote:
>> On 7/10/18 6:26 PM, Jonathan M Davis wrote:
>>> On Tuesday, 10 July 2018 13:21:28 MDT Timon Gehr via Digitalmars-d
> wrote:
>>>> On 03.07.2018 06:54, Walter Bright wrote:
>>>>> ...
>>>>>
>>>>> (I'm referring to the repeated and endless threads here where people
>>>>> argue that yes, they can recover from programming bugs!)
>>>>
>>>> Which threads are those?
>>>
>>> Pretty much any thread arguing for having clean-up done when an Error is
>>> thrown instead of terminating ASAP. Usually, folks don't try to claim
>>> that trying to fully continue the program in spite of the Error is a
>>> good idea, but even that gets suggested sometimes (e.g. trying to catch
>>> and recover from a RangeError comes up periodically).
>>
>> Or aside from that strawman that RangeError shouldn't be an Error even...
> 
> I suspect that we're going to have to agree to disagree on that one. In the
> vast majority of cases, indices do not come from program input, and in the
> cases where they do, they can be checked by the programmer to ensure that
> they don't violate the contract of indexing dynamic arrays. And when you
> consider that the alternative would be for it to be a RangeException, having
> it be anything other than an error would quickly mean that pretty much no
> code using arrays could be nothrow.

It's all wishful thinking on my part. At this point, no way we can make a non opt-in change to RangeException, because so much code will break.

But to be honest, I don't really think RangeException makes much sense either. It really is a programming error, but one that is eminently recoverable in some cases (it depends completely on the program). It stops memory corruption from happening, and as long as you unwind the stack out to a place where you can report the issue and continue on, then it's not going to affect other parts of the program.

The classic example is a fiber- or thread-based service, where the tasks run are independent of each other. It makes no sense to kill all the tasks just because one has an off-by-one indexing problem that was properly prevented from causing any issues.

> Regardless, there are sometimes cases where the programmer decides what the
> contract of an API is (whether that be the creator of the language for
> something standard like dynamic arrays or for a function in a stray
> programmer's personal library), and any time that that contract is violated,
> it's a bug in the program, at which point, the logic is faulty, and
> continuing to execute the program is risky by definition. Whether a
> particular contract was the right choice can of course be debated, but as
> long as it's the contract for that particular API, anyone using it needs to
> be obey it, or they'll have bugs in their program with potentially fatal
> consequences.

We are not so much in disagreement on this, I don't think it makes any sense to make a RangeError not a programming error. But the problem I have with the choice is that an Error *necessarily* makes the entire program unusable. In other words, the scope of the problem is expanded by the language to include more than it should. And really, it's not so much throwing the Error, it's the choice by the language to make nothrow functions not properly clean up on an Error throw. Without that "feature", this would be a philosophical discussion, and not a real problem.

-Steve
July 12, 2018
On 7/10/18 2:37 AM, Walter Bright wrote:
> On 7/9/2018 6:50 PM, John Carter wrote:
>> Nothing creates flaky and unreliable systems more than allowing them to wobble on past the first point where you already know that things are wrong.
> 
> Things got so bad with real mode DOS development that I rebooted the system every time my program crashed, making for rather painfully slow development.
> 
> Salvation came in the form of OS/2 (!). Although OS/2 was a tiny market, it was a godsend for me. I developed all the 16 bit code on OS/2, which had memory protection. Only the final step was recompiling it for real mode DOS.

All this talk about DOS, I also saw this in the news recently:https://kotaku.com/in-2018-a-pc-game-is-being-made-in-dos-1827463766

-Steve