1 day ago Re: RFC: Change what assert does on error | ||||
---|---|---|---|---|
| ||||
Posted in reply to monkyyy | On Wednesday, 2 July 2025 at 16:54:54 UTC, monkyyy wrote:
> On Wednesday, 2 July 2025 at 16:51:40 UTC, kdevel wrote:
>> On Wednesday, 2 July 2025 at 08:11:44 UTC, Walter Bright wrote:
>>> On 6/30/2025 2:18 PM, Sebastiaan Koppe wrote:
>>>> Just know that the idea of exiting directly when something asserts on the pretense that continueing makes things worse breaks down in multi-threaded programs.
>>>
>>> An assert tripping means that you've got a bug in the program, and the program has entered an unanticipated, unknown state.
>>
>> This program
>>
>> void main ()
>> {
>> assert (false);
>> }
>>
>> is a valid D program which is free of bugs and without any
>> "unanticipated, unknown" state.
>>
>> Do you agree?
>
> Nah, clearly this wouldnt pass boeings standards
In release mode it does.
|
19 hours ago Re: RFC: Change what assert does on error | ||||
---|---|---|---|---|
| ||||
Posted in reply to Paolo Invernizzi | On Thursday, 3 July 2025 at 10:19:18 UTC, Paolo Invernizzi wrote:
> What's preventing you to have debugging information in remote server environment without physical access the device?
>
> We are not in the 80 anymore, but even in the 80 ...
>
> /P
I can't hook debuggers up to code running on a remote server that somebody else owns in a DC hundreds of miles away. Therefore the only debugging data available is stack traces. If the language prevents the emission of stack traces then I get ... absolutely nothing.
But why can't I just run my code on a VM with debuggers on it?
Because direct remote access to production machines is strictly forbidden under most security and even regulatory regimes. Ironically, this is because direct remote access to production machines is a FAR larger security threat than a theoretical stack-corruption attack. All I need to get access is to subvert the right human, which is a far less complex attack than subverting the myriad stack protections. And is why most modern attacks focus on humans and not technology.
All of this was covered in my yearly Security Training at Microsoft as far back as 2015. These are well known limitations in corporate IT security. Oh, and I spent about a year of my time at Microsoft doing security and compliance work.
Having direct remote access to production is often a strict legal liability (which means that if the investigation discovers that you allow it, then it is presumed as a matter of law that the breach came from that route and you'll be found guilty right then and there), so you're never going to find a serious business willing to allow it.
At Microsoft, to access production I had to fill out a form and sign it to get access to production. Then I used a specially modified laptop with no custom software installed on it and all the input ports physically disabled that was hooked up to a separate network to gain access. If I needed a tool on the production machine I had to specifically request it from IT and wait for them to install it, I was not allowed to install anything on my own (which could be malware of course)
Needless to say, my manager made us spend an enormous amount of our time making sure that we never needed access to production. The one time I did need production access was ironically because the extensive logging infrastructure we built crashed with no information recorded. So if I seem a bit animated about his topic, it's because I've been the guy whose had to resolve a problem under the exact conditions that we're proposing here.
This is exactly the kind of choice that gets your tech banned from corporate usage.
|
12 hours ago Re: RFC: Change what assert does on error | ||||
---|---|---|---|---|
| ||||
Posted in reply to Adam Wilson | On 7/3/2025 12:21 AM, Adam Wilson wrote:
> It is an absolute non-negotiable business requirement that I be able to get debugging information out of the server without physical access to the device. If you won't deliver the logging data, corrupted or not, on an assert, then no business can justify using D in production.
I did mention that logging the error before terminating the process was acceptable. My point is that recovering is not acceptable.
|
12 hours ago Re: RFC: Change what assert does on error | ||||
---|---|---|---|---|
| ||||
Posted in reply to Richard (Rikki) Andrew Cattermole | Malware continues to be a problem. I wound up with two on my system last week. Ransomware seems to be rather popular. How does it get on a system? I don't share your confidence. Malware authors seem to be very, very good at finding exploits. Besides, a bug in a program can still corrupt the data, causing the program to do unpredictable things. Do you really want your trading software suddenly deciding to sell stock for a penny each? Or your pacemaker to suddenly behave erratically? Or your avionics to suddenly do a hard over? Or corrupt your data files? If you knew what the bug is that caused an assert to trip, why didn't you fix it beforehand? |
11 hours ago Re: RFC: Change what assert does on error | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | On Friday, 4 July 2025 at 06:58:30 UTC, Walter Bright wrote: > Malware continues to be a problem. I wound up with two on my system last week. Ransomware seems to be rather popular. How does it get on a system? Ahem. You run Windows 7. That is the sum total of information required to answer your own question. I haven't had a malware attack on my system since Window 8.1 came out, but I keep my systems running current builds. Yea, I may have to deal with a bit of Graphics driver instability, but I don't get my files locked up for ransom. This has been a solved problem for a decade now. Also, you might want to consider updating your PEBKAC firmware. > Besides, a bug in a program can still corrupt the data, causing the program to do unpredictable things. Do you really want your trading software suddenly deciding to sell stock for a penny each? Or your pacemaker to suddenly behave erratically? Or your avionics to suddenly do a hard over? Or corrupt your data files? In about two weeks I'm going to go visit EAA AirVenture and have a lovely conversation with an avionics outfit that writes it's software on a Linux/C++ tech stack called Dynon, based out of Snohomish WA. I watched it reset right in front of me, nothing bad happened to the airplane. Last year I spent an hour jawing with one of their software engineers about the system. I'd put it in my (theoretical) airplane. |
11 hours ago Re: RFC: Change what assert does on error | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | On Friday, 4 July 2025 at 06:29:26 UTC, Walter Bright wrote:
> On 7/3/2025 12:21 AM, Adam Wilson wrote:
>> It is an absolute non-negotiable business requirement that I be able to get debugging information out of the server without physical access to the device. If you won't deliver the logging data, corrupted or not, on an assert, then no business can justify using D in production.
>
> I did mention that logging the error before terminating the process was acceptable. My point is that recovering is not acceptable.
Kinda hard to do that when the process terminates, especially if the logger is a side-thread of the app like it was on my team at MSFT.
But also, not printing a stack trace means there is nothing to log.
|
11 hours ago Re: RFC: Change what assert does on error | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | On Friday, July 4, 2025 12:29:26 AM Mountain Daylight Time Walter Bright via Digitalmars-d wrote:
> On 7/3/2025 12:21 AM, Adam Wilson wrote:
> > It is an absolute non-negotiable business requirement that I be able to get debugging information out of the server without physical access to the device. If you won't deliver the logging data, corrupted or not, on an assert, then no business can justify using D in production.
>
> I did mention that logging the error before terminating the process was acceptable. My point is that recovering is not acceptable.
Even if recovering is not acceptable, if the proper clean up is done when the stack is unwound, then it's possible to use destructors, scope statements, and catch blocks to get additional information about the state of the program as the stack unwinds. If the proper clean up is not done as the stack unwinds, then those destructors, scope statements, and catch statements will either not be run (meaning that any debugging information which could have been obtained from them wouldn't be), and/or only some of them will be run. And of course, for each such piece of clean up code that's skipped, the more invalid the state of the program becomes, making it that much riskier for any of the code that does run while the stack unwinds to log any information about the state of the program.
And since in many cases, the fact that an Error was thrown means that memory corruption was about to occur rather than it actually having occurred, the state of the program could actually be perfectly memory safe while the stack unwinds if all of the clean up code is run correctly. It would be buggy, obviously, because the fact that an Error was thrown means that there's a bug, but it could still be very much memory safe. However, if that clean up code is skipped, then the logic of the program is further screwed up (since code that's normally guaranteed to run is not run), and that runs the risk of making it so that the code that does run during shutdown is then no longer memory safe, since the ability of the language to guarantee memory safety at least partially relies on the code actually following the normal rules of the language (which would include running destructors, scope statements, and catch statements).
It's quite possible to simultaneously say that it's bad practice to attempt to recover from an Error and to make it so that all of the normal clean up code runs while the stack unwinds with an Error. Wanting to recover from an Error and to continue to run the program is not the only reason to want the stack to unwind correctly. It can also be critical for getting accurate information while the program is shutting down due to an Error (especially in programs where the programmer is not the one running the program and isn't going to be able to reproduce the problem without additional information). And honestly, if the clean up code isn't going to be run properly, what was even the point of making Error a Throwable instead of just printing something out and terminating the program at the source of the Error?
Having the stack unwind properly in the face of Errors gives us a valuable
debugging tool. It does not mean that we're endorsing folks attempting to
recover from Errors - and some folks do that already simply because Error
is a Throwable, and it's completely possible to attempt it whether it's a
good idea or not. If you hadn't wanted that to be possible, you shouldn't
have ever made Error a Throwable. But the fact that it is a Throwable makes
it possible to get better information out of a program that's being killed
by an Error - especially if the stack unwinds properly in the process.
So, fixing the stack unwinding to work properly with Errors won't change the
fact that some folks will try to recover from Errors, but it will make it
easier to get information about the program's state when an Error occurs and
therefore make it easier to fix such bugs.
At the end of the day, whether the programmer does the right thing with Errors is up to the programmer, and we have the opportunity here to make it work better for folks who _are_ trying to do the right thing and have the program shut down on such failures. They just want to be able to get better information during the shutdown without faulty stack unwinding potentially introducing memory safety issues in the process.
If a programmer is determined to shoot themselves in the foot by trying to recover from an Error, they're going to do that whether we like it or not.
- Jonathan M Davis
|
11 hours ago Re: RFC: Change what assert does on error | ||||
---|---|---|---|---|
| ||||
Posted in reply to Richard (Rikki) Andrew Cattermole | On 7/3/2025 1:25 AM, Richard (Rikki) Andrew Cattermole wrote: >> From what I can tell this kind of attack is unlikely in D even without the > codegen protection. So once again, the Error class hierarchy offers no protection from this kind of attack. The paper says that exception unwinding of the stack is still vulnerable to malware attack. > Need more evidence to suggest that Error shouldn't offer cleanup. Right now I have none. Because: 1. there is no purpose to the cleanup as the process is to be terminated 2. code that is not executed is not vulnerable to attack 3. the more code that is executed after the program entered unknown and unanticipated territory, the more likely it will corrupt something that matters Do you really want cleanup code to be updating your data files after the program has corrupted its data structures? --- This whole discussion seems pointless anyway. If you want to unwind the exception stack every time, use enforce(), not assert(). That's what it's for. |
11 hours ago Re: RFC: Change what assert does on error | ||||
---|---|---|---|---|
| ||||
Posted in reply to Sebastiaan Koppe | On 7/2/2025 9:35 AM, Sebastiaan Koppe wrote: > I absolutely understand your stance. There are programs where I would blindly follow your advice. It's just that there 99x as many where graceful shutdown is better. As the quote from me says, "Depending on one's tolerance for risk, it might favor the user with a message about what went wrong before aborting (like a backtrace)." That would make it up to you how graceful a shutdown is desirable. Even so, continuing to operate the program as if the error did not happen remains a mistake. > Also, most triggered asserts I have seen were because of programmer bugs, as in, they misused some library for example, not because of actual corruption or violation of some basic axiom. The behavior of assert() in D is completely customizable. But I cannot in good conscience recommend continuing normal operation of a program after it has crashed. |
11 hours ago Re: RFC: Change what assert does on error | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | On 04/07/2025 6:58 PM, Walter Bright wrote: > Malware continues to be a problem. I wound up with two on my system last week. Ransomware seems to be rather popular. How does it get on a system? Step 1. Run outdated and insecure software. For instance Windows 7. I cannot remember the last time I had malware on my computer. The built in anti-virus is good enough on Windows. Staying fairly up to date is enough to combat any potential attacks in modern operating systems due to the automatic and frequent updates. Ransomware generally requires people to disable OS protections on Windows, or for that specific virus to have never before been used. When you hear postmortems of them they typically have "variant of" in its description, as this is how they get around anti-virus. Anti-virus today is quite sophisticated, they can analyze call stack patterns. I don't know how prevalent it is however, but it does exist. > I don't share your confidence. Malware authors seem to be very, very good at finding exploits. Yes, they are very good at reading security advisories and then applying an attack based upon what is written. Turns out lots of people have out dated software, so even if a bug has been fixed, its still got a lot of potential benefit for them. So many web apps get taken over specifically because of this. I found one website here in NZ that was exactly this. Out of date software ~10 years old, with security advisories and would have been really easy to get in if I wanted to. And that was pure chance. It was advertised on TV at some point... > Besides, a bug in a program can still corrupt the data, causing the program to do unpredictable things. Do you really want your trading software suddenly deciding to sell stock for a penny each? Or your pacemaker to suddenly behave erratically? Or your avionics to suddenly do a hard over? Or corrupt your data files? An Error is thrown in an event where local information alone cannot inform the continuation of the rest of the program. Given this we know that a given call stack cannot proceed, it must do what it can to roll back transactions to prevent corruption of outside data. Leaving them half way could cause corruption too. A good example of this is the Windows USB drive support. To give an example of this relevant to D; for an application shipped by Microsoft's App store. It must have the ability to keep the GUI open and responsive even after the error has occurred. It must if it can't handle it automatically, inform the user that a program ending event has occurred and allow that user to close the program in their own time. You are not allowed to call abort or exit in this situation. It is illegal as per the contract you sign. Do I think they should have added this? No. But it is there and yet C++ can handle this but we can't. > If you knew what the bug is that caused an assert to trip, why didn't you fix it beforehand? Q: How do you know that the the bug was fixed and won't reappear? A: You write an assert. Q: How do you know that your assumptions are correct about code that you didn't write or haven't reevaluated and is quite complex? A: You write an assert. In a perfect world we'd throw proof assistants at programs and say they have no bugs ever. But the real world is the opposite, quick changes and rarely tested thoroughly enough to say it won't trip. |
Copyright © 1999-2021 by the D Language Foundation