I have a program where the GC seems to be overwriting memory still in use and corrupting data.
Here's the code. It's massively reduced from the original program. It's hard to reduce it further because minor changes can prevent the problem from triggering. I'll explain below the important parts.
import std.stdio;
struct S {
int check;
S* next;
int[4] data;
}
int main(string[] args) {
void*[] allocs;
enum bad_iter = 268;
for (int n = 0; n < bad_iter+1; n++) {
allocs.length = 0;
auto x = " ";
x ~= ' ';
int[10][] ts;
for(int i = 0; i < 21; i++) {
ts.length++;
}
S head;
S* s = &head;
if (n == bad_iter) {
n = bad_iter; // convenient line to set a breakpoint only for the last iteration
}
for(int i = 0; i < 8; i++) {
auto ns = new S;
ns.check = 1; // set test value here
s.next = ns;
s = ns;
}
s = head.next; // get the first S allocated this iteration
if (s.check != 1) { // check test value here
writefln("check=%d", s.check);
return -1;
}
new int[10];
allocs ~= null;
new size_t[3];
}
return 0;
}
The important part is the following. On each iteration we create 8 instances of S. For each S value, we set its check
field to 1. Then we check the value of that field (for the first instance of S). When compiled with the address sanitizer, we observe it's been corrupted and it's no longer 1.
Am I doing something incorrectly in the code? AFAIK I'm respecting the rules required by the GC. Maybe there's a silly bug I overlooked?
Tested with LDC 1.40.0 on x86_64 Linux:
$ ldc2 app.d -g --frame-pointer=all && ./app # OK
$ ldc2 app.d -fsanitize=address -g --frame-pointer=all && ./app # BUG
check=-337690816
$
By setting a watchpoint on the address of the field, I see that the code that writes to check
is part of the GC implementation. Here's the backtrace:
* thread #1, name = 'app', stop reason = watchpoint 1
* frame #0: 0x00007ffff7f4695c libdruntime-ldc-shared.so.110`_D4core8internal2gc4impl12conservativeQw3Gcx15recoverNextPageMFNbEQCmQCkQCeQCeQCcQCn4BinsZb + 348
frame #1: 0x00007ffff7f46278 libdruntime-ldc-shared.so.110`_D4core8internal2gc4impl12conservativeQw3Gcx10smallAllocMFNbmKmkxC8TypeInfoZPv + 776
frame #2: 0x00007ffff7f417d9 libdruntime-ldc-shared.so.110`_D4core8internal2gc4impl12conservativeQw14ConservativeGC__T9runLockedS_DQCsQCqQCkQCkQCiQCtQBy12mallocNoSyncMFNbmkKmxC8TypeInfoZPvS_DQFaQEyQEsQEsQEqQFb10mallocTimelS_DQGiQGgQGaQGaQFyQGj10numMallocslTmTkTmTxQDlZQFuMFNbKmKkKmKxQEeZQDx + 89
frame #3: 0x00007ffff7f449d3 libdruntime-ldc-shared.so.110`_DThn16_4core8internal2gc4impl12conservativeQw14ConservativeGC6qallocMFNbmkMxC8TypeInfoZSQDd6memory8BlkInfo_ + 83
frame #4: 0x00007ffff7f4ddec libdruntime-ldc-shared.so.110`gc_qalloc + 28
frame #5: 0x000055555556be9a app`_D4core8lifetime__T11_d_newitemTTS3app1SZQwFNaNbNeZPQt at lifetime.d:2837:5
frame #6: 0x000055555556b745 app`D main(args=string[] @ 0x00007fffffffe438) at app.d:28:13
frame #7: 0x00007ffff7f68ecd libdruntime-ldc-shared.so.110`_D2rt6dmain212_d_run_main2UAAamPUQgZiZ6runAllMFZv + 77
frame #8: 0x00007ffff7f68ce7 libdruntime-ldc-shared.so.110`_d_run_main2 + 407
frame #9: 0x00007ffff7f68b3d libdruntime-ldc-shared.so.110`_d_run_main + 141
frame #10: 0x000055555556c2b2 app`main(argc=1, argv=0x00007fffffffe728) at entrypoint.d:42:17
frame #11: 0x00007ffff7745e08 libc.so.6`__libc_start_call_main(main=(app`main at entrypoint.d:39), argc=1, argv=0x00007fffffffe728) at libc_start_call_main.h:58:16
frame #12: 0x00007ffff7745ecc libc.so.6`__libc_start_main_impl(main=(app`main at entrypoint.d:39), argc=1, argv=0x00007fffffffe728, init=<unavailable>, fini=<unavailable>, rtld_fini=<unavailable>, stack_end=0x00007fffffffe718) at libc-start.c:360:3
frame #13: 0x000055555556b3a5 app`_start + 37
There is a subsequent write to that memory location in the leak sanitizer and LSan complains:
==4056526==LeakSanitizer has encountered a fatal error.
(though usually this message isn't flushed)
I assume the original problem was caused by the GC and ASan/LSan are just subsequent victims, but it's hard to be sure. Apparently, LSan is automatically enabled for Linux when ASan is used. Although the ASan documentation says that LSan "can be enabled using ASAN_OPTIONS=detect_leaks=1
on macOS", setting that to 0 didn't seem to disable it, so I couldn't test with ASan but not LSan.
Any ideas of what might be going on?