| |
 | Posted by Timon Gehr in reply to Walter Bright | Permalink Reply |
|
Timon Gehr 
Posted in reply to Walter Bright
| On 5/4/25 18:49, Walter Bright wrote:
> On 5/3/2025 9:49 PM, Timon Gehr wrote:
>> For the record, even if my application would not run very sluggishly when compiled with DMD, in this particular case it does not matter how accurate the segfault location is as I am not getting any information in the first place.
>
> You suggested in another reply that I no experience debugging programs that do not have a tty attached. This is incorrect.
Well, I think the circumstances I described were more specific than only having no tty attached.
> When I've had programs "wink out" leaving no trail nor context behind, I add logging code. In particular, I add a line to the entry of key functions, something like `fprintf(log, "function name\n");` which appends to a log file. Examining the log file gives clues as to where the program was when it failed and a trail how it got there. I add more logging statements as needed to gradually close in to where the fault is.
> ...
Again, this can work decently well when the crash is reproducible.
>
>> My understanding is that null pointer dereference being UB is a widespread assumption in the LLVM and GCC optimizers. Simply "disabling the behavior" is not practical.
>
> It may be dependent on the optimization level.
>
> There are all kinds of UB behavior that the LLVM and GCC optimizers just delete because, hey, it's undefined behavior that will never happen so it can be just deleted. Things like `(x + 1 > x)` being replaced with `1`. There's a way for the compiler to emit a warning when this is done, perhaps try that?
>
> The compiler's ability to check at compile time for a null pointer dereference (and hence delete it) is extremely limited. It relies on data flow analysis where it can prove the pointer is null, not just "it might be". I read that compilers issue a warning when this is the case. Why not try `p = null; `*p = 3;` in your setup and see if the compiler you're using gives a warning?
>
> If it does, but the warning does not happen with your code, then you know the compiler is not deleting the reference, and so the CPU will check it and seg fault it at runtime.
> ...
Well, there are multiple discussions here. One of them was about your claim that null dereferences will segfault reliably. Another one was about your claim that segfaults are always useful.
>
>> Perhaps I could add `-fsanitize=null` to add null checks, but that would not really solve the main problem as it is not integrated with D scope guards.
>
> It would at least tell you if it is a null pointer dereference or not. That in itself would be valuable information.
> ...
All of this speculation assumes that the design of D's error checking beyond null pointers is not at fault, but it is very likely that it is. druntime is riddled with code that builds an error message and then does `assert(0, msg);`. x)
I think at this time my best bet at narrowing it down is to use a custom druntime build. This should not be necessary, however.
> I have no idea what your program does, but I suspect your most practical option is to add logging to a file, and ask your customer to email the file to you when it crashes.
>
Well, I have no idea where the crash happens and getting one log file every couple months (and writing out gigabytes of useless log files in the meantime) seems not workable. It would be much better to be able to react to the crash.
> Another thing you can try is build your program with dmd and see if it behaves differently regarding the mysterious crash.
This is not workable. The user is certainly not willing to use a DMD build for two months.
|