When you accidentally put endless recursion in your code, by default on linux, you get this printed to the console once the stack overflows:
[1] 37856 segmentation fault (core dumped) ./program
Not very informative. After this, you typically open a debugger like gdb
and use the backtrace
command to find where this happened. Can't we just print a backtrace when it happens, just like an assertion failure?
Well, there exists a etc.linux.memoryerror
module, which provides the registerMemoryErrorHandler
function. After that is called, segfaults are caught using the posix function sigaction(SIGSEGV, ...)
, and it throws a NullPointerError
or InvalidPointerError
object with a stack trace.
This doesn't work with stack overflow however, because the signal callback function uses stack memory itself, so you get a stack overflow while handling the stack overflow, and the program just aborts again.
This can be solved by using sigaltstack
, which provides alternative stack memory for the signal handler. I tried integrating this into the existing code, but it hijacks RIP (the instruction pointer) to an assembly routine and continues, so custom X86 assembly gets executed in the context of the segfault. In the case of stack overflow, this results in a loop where the signal handler is called endlessly.
As far as I can see, the assembly tricks are only needed to support catching the Error object, which is bad practice any way, so I thought it would be simpler to make a new handler using assert(0)
. I also think it can be enabled by default, at the very least in debug builds. Here's a PR with my work so far (at the time of writing):
https://github.com/dlang/dmd/pull/15331
It seems to works well, but I'm not experienced with signal handling, so review comments are welcome. Is it bad for performance? Is it unsafe? Can this break existing code? Hopefully not!