Better error message for endless recursion (or other segfaults) on linux

Jun 17, 2023

Dennis

Jun 17, 2023

ryuukk_

Jun 17, 2023

Dennis

Jun 17, 2023

ryuukk_

June 17, 2023

Better error message for endless recursion (or other segfaults) on linux

Posted by Dennis

Permalink

Dennis

Permalink

When you accidentally put endless recursion in your code, by default on linux, you get this printed to the console once the stack overflows:

[1]    37856 segmentation fault (core dumped)  ./program

Not very informative. After this, you typically open a debugger like gdb and use the backtrace command to find where this happened. Can't we just print a backtrace when it happens, just like an assertion failure?

Well, there exists a etc.linux.memoryerror module, which provides the registerMemoryErrorHandler function. After that is called, segfaults are caught using the posix function sigaction(SIGSEGV, ...), and it throws a NullPointerError or InvalidPointerError object with a stack trace.

This doesn't work with stack overflow however, because the signal callback function uses stack memory itself, so you get a stack overflow while handling the stack overflow, and the program just aborts again.

This can be solved by using sigaltstack, which provides alternative stack memory for the signal handler. I tried integrating this into the existing code, but it hijacks RIP (the instruction pointer) to an assembly routine and continues, so custom X86 assembly gets executed in the context of the segfault. In the case of stack overflow, this results in a loop where the signal handler is called endlessly.

As far as I can see, the assembly tricks are only needed to support catching the Error object, which is bad practice any way, so I thought it would be simpler to make a new handler using assert(0). I also think it can be enabled by default, at the very least in debug builds. Here's a PR with my work so far (at the time of writing):

https://github.com/dlang/dmd/pull/15331

It seems to works well, but I'm not experienced with signal handling, so review comments are welcome. Is it bad for performance? Is it unsafe? Can this break existing code? Hopefully not!

Forums