Problem with GC and address/leak sanitizer (page 2)

On 2/17/2025 8:55 PM, Steven Schveighoffer wrote: > On Monday, 17 February 2025 at 22:14:13 UTC, Walter Bright wrote: >> Is this an issue with the new GC, or the old one? > > Old gc I'm a little surprised to see a problem crop up after 25 years of continuous use? Perhaps try an older release of the compiler?

On 18/02/2025 6:22 PM, Walter Bright wrote: > On 2/17/2025 8:55 PM, Steven Schveighoffer wrote: >> On Monday, 17 February 2025 at 22:14:13 UTC, Walter Bright wrote: >>> Is this an issue with the new GC, or the old one? >> >> Old gc > > I'm a little surprised to see a problem crop up after 25 years of continuous use? > > Perhaps try an older release of the compiler? This has already been solved, see Johan's comments. ASAN with a fake stack isn't operating properly.

==4108825==AddressSanitizer CHECK failed: /home/vsts/work/1/s/compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cpp:556 "((*tls_addr + *tls_size)) <= ((*stk_addr + *stk_size))" (0x78927e371080, 0x78927e371000)

On Friday, 7 March 2025 at 14:51:22 UTC, Luís Marques wrote:

LDC 1.40.0 (git commit 9296fd6fcc) reproduces the problem with both LLVM 15.0 and 19.1, so for now it seems that this isn't related to something having changed on the LLVM side.

I finally found some time today to analyze this properly, rather than just trying different LDC/LLVM versions. I haven't yet double-checked everything, but based on my current understanding, I believe I've identified the issue, along with several deficiencies:

The main issue: The distributed libdruntime-ldc-shared.so appears to be compiled without version=SupportSanitizers, so the fake stack scanning isn't enabled. I conclude this by observing that the distributed runtime's disassembly doesn't include scanStackForASanFakeStack, unlike when this option is enabled. The necessary code exists in the runtime's source, but it's simply not being compiled in.
Fake stack support should work when properly enabled: If you build LDC with -DRT_SUPPORT_SANITIZERS=True, the fake stack is scanned as expected. From what I can see (by checking for the proper GC scan calls and addresses), everything is being done correctly. The test tests/sanitizers/asan_fakestack_GC.d still fails, but not because of broken fake stack scanning—instead, the test itself is broken. An assertion fails in test_non_null_does_not_trigger_collection because memory from a previous iteration is being freed, not because it's freeing memory incorrectly for the current iteration.
Redundant function calls: The runtime function scanAllTypeImpl calls scanStackForASanFakeStack multiple times for the same fake stack frame, using identical memory ranges each time. While technically correct, this is unnecessary and inefficient. Based on the comment "This is the pointer we are catching with", it seems there's a misunderstanding regarding how the fake stack should be located.
Misleading comment in the test: The comment // Large enough so it will be on fakestack heap (not inlined in local stack frame) in asan_fakestack_GC.d appears to be incorrect. The size of the allocation shouldn't determine this—if its address escapes, it must go into the heap-allocated fake stack regardless.

March 29

Re: Problem with GC and address/leak sanitizer

Posted by Johan
in reply to Luís Marques

Permalink

Johan

Posted in reply to Luís Marques

Permalink

Hi Luis,
Thanks for spending some time on it.
Here some inline remarks from my side.

On Friday, 28 March 2025 at 00:35:50 UTC, Luís Marques wrote:

The main issue: The distributed libdruntime-ldc-shared.so appears to be compiled without version=SupportSanitizers, so the fake stack scanning isn't enabled.

This is intentional, because there is some overhead associated with it. If this is the main issue then that's "OK" and has always been the case. Perhaps FakeStack is now enabled by default. We should either disable that default, or start shipping including the small overhead in druntime.

Fake stack support should work when properly enabled: If you build LDC with -DRT_SUPPORT_SANITIZERS=True, the fake stack is scanned as expected.

Great.

From what I can see (by checking for the proper GC scan calls and addresses), everything is being done correctly. The test tests/sanitizers/asan_fakestack_GC.d still fails, but not because of broken fake stack scanning—instead, the test itself is broken. An assertion fails in test_non_null_does_not_trigger_collection because memory from a previous iteration is being freed, not because it's freeing memory incorrectly for the current iteration.

The idea of the test is to test that enabling fakestack does not change collection behavior. The test is clunky, I'll admit. If it is indeed failing already on the fakestack-disabled case, then it needs to be adjusted for sure.

Redundant function calls: The runtime function scanAllTypeImpl calls scanStackForASanFakeStack multiple times for the same fake stack frame, using identical memory ranges each time. While technically correct, this is unnecessary and inefficient. Based on the comment "This is the pointer we are catching with", it seems there's a misunderstanding regarding how the fake stack should be located.

I'm not sure how this can be related to specifically fakestack implementation issue. If you mean that scanAllTypeImpl calls scanStackForASanFakeStack multiple times, then that is not a problem of our fakestack support code, but of scanAllTypeImpl itself because it would also call the normal scan method multiple times with same memory range. That would point towards an issue with duplicates in the ThreadBase.sm_cbeg list.

Is that what you meant? (it is literally what you typed, but the sentences following it do not make sense with that interpretation...)

LLVM does not give us many options to find the fake stack. At the time I think I used the only public API method available, which indeed results in multiple times getting the same fakestack. We can poke into ASan internals but I am afraid it will quickly bitrot/break. Perhaps by now ASan's public interface has been extended, otherwise we should ask for it (or submit PRs to LLVM)...
I'm quite sure we are doing the right thing of locating the fakestack, it's done similarly here: https://www.aovivo.ilutas.com.br/node-v17.9.1/deps/v8/src/heap/base/stack.cc

Misleading comment in the test: The comment // Large enough so it will be on fakestack heap (not inlined in local stack frame) in asan_fakestack_GC.d appears to be incorrect. The size of the allocation shouldn't determine this—if its address escapes, it must go into the heap-allocated fake stack regardless.

I vaguely remember there being some optimization with small memory sizes, but I cannot find it and cannot trigger it anymore. I think it may be related to this comment of mine: https://johanengelen.github.io/posts/2017-12-25-ldc-and-addresssanitizer/#fn:4
Hmm... I no longer remember. Regardless, you are right that to detect any stackmemory use after return, that memory cannot sit on the regular processor stack.

cheers,
Johan

Forums