3 days ago

On Tuesday, 13 May 2025 at 10:21:12 UTC, Denis Feklushkin wrote:

I added simple debug(PRINTF) section exactly after druntime allocator. It throws error if newly allocated memory intersects with already allocated internal bucket List structures. I hope I didn't make a mistake in this code?

auto p = runLocked!(mallocNoSync, mallocTime, numMallocs)(size, bits, localAllocSize, ti);

debug(PRINTF)
{
    outer:
    foreach(List* firstList; gcx.bucket)
    {
        List* curr = firstList;
        while(curr !is null)
        {
            void* p_end = cast(ubyte*) p + localAllocSize;
            void* curr_end = cast(ubyte*) curr + List.sizeof;

            const bool notIntersects = ((p < curr && p_end < curr) || (p > curr_end && p_end > curr_end));

            if(!notIntersects)
            {
                printf("%p - allocated into bucket List value, located on %p: firstList.pool=%p curr.pool=%p\n",
                    p, curr, firstList.pool, curr.pool);

                assert(false);
                break outer;
            }

            curr = curr.next;
        }
    }
}

Druntime was built as debug version with enabled INVARIANT, MEMSTOMP and PRINTF

Then this snippet was used with compiled druntime (do not forget to replace path to new druntime in ldc2.conf):

/+ dub.sdl:
	name "issue"
+/
// How to run: dub run --single app.d

class C {}

void main()
{
    new C;
}
> dub run --single app.d --compiler=ldc2
    Starting Performing "debug" build using ldc2 for x86_64.
    Building issue ~master: building configuration [application]
     Linking issue
     Running issue
_d_newclass(ci = 0x56496398c350, app.C)
0x5649a1312c90.Gcx::addRange(0x564963985940, 0x564963994718)
GC::malloc(gcx = 0x5649a1312c90, size = 16 bits = 2, ti = app.C)
  => p = 0x7fa30e5cb000
0x7fa30e5cb000 - allocated into bucket List value, located on 0x7fa30e5cb010: firstList.pool=0x5649a1313fa0 curr.pool=0x5649a1313fa0
core.exception.AssertError@core/internal/gc/impl/conservative/gc.d(505): Assertion failure
----------------
core/runtime.d:831 [0x564963942d45]
core/lifetime.d:126 [0x56496394234c]
core/runtime.d:753 [0x564963942d0e]
core/runtime.d:773 [0x564963942640]
rt/dmain2.d:241 [0x564963920f30]
rt/deh.d:47 [0x564963949b9e]
rt/dwarfeh.d:347 [0x564963921ac2]
core/exception.d:569 [0x564963936a05]
core/exception.d:808 [0x564963936444]
core/internal/gc/impl/conservative/gc.d:505 [0x5649639502f3]
core/internal/gc/proxy.d:156 [0x56496393cf70]
core/internal/gc/impl/proto/gc.d:101 [0x5649639604fb]
core/internal/gc/proxy.d:156 [0x56496393cf70]
rt/lifetime.d:130 [0x5649639235fe]
app.d:10 [0x56496391a7af]
rt/dmain2.d:520 [0x56496392169c]
rt/dmain2.d:474 [0x5649639214b2]
rt/dmain2.d:520 [0x5649639215ba]
rt/dmain2.d:474 [0x5649639214b2]
rt/dmain2.d:545 [0x564963921372]
rt/dmain2.d:333 [0x564963921040]
/home/denizzz/ldc2_standalone/bin/../import/core/internal/entrypoint.d:42 [0x56496391a7f1]
??:? [0x7fa30e6f6ca7]
??:? __libc_start_main [0x7fa30e6f6d64]
??:? [0x56496391a6d0]
GC.fullCollect()
processing GC Marks, (nil)
rt_finalize2(p = 0x5649a1312c20)
Error Program exited with code 1

Am I making an obvious mistake somewhere?

3 days ago

On Tuesday, 13 May 2025 at 18:30:34 UTC, Denis Feklushkin wrote:

>

I hope I didn't make a mistake in this code?

The intersection logic is wrong, treating adjacency as intersection. Try this: const bool intersects = (p_end > curr && p < curr_end).

2 days ago
On 5/12/25 8:31 AM, Denis Feklushkin wrote:

> Vulkan API is used and it implicitly creates threads.

Do those threads call back to D code that allocate from the GC? If so, the GC must be aware of the threads to be able to suspend them during a collection.

I had to call thread_attachThis() to do that in a past project:

  https://dlang.org/library/core/thread/osthread/thread_attach_this.html

However, it was not clear whether or when to make a corresponding call to thread_detachThis(). If Vulkan threads disappear on their own, your only chance for a call to thread_detachThis() may be right before returning from your D callback function.

Ali

2 days ago

On Tuesday, 13 May 2025 at 19:12:19 UTC, kinke wrote:

>

On Tuesday, 13 May 2025 at 18:30:34 UTC, Denis Feklushkin wrote:

>

I hope I didn't make a mistake in this code?

The intersection logic is wrong, treating adjacency as intersection. Try this: const bool intersects = (p_end > curr && p < curr_end).

I changed it to this and everything worked out:

((p < curr && p_end <= curr) || (p >= curr_end && p_end >= curr_end));

It seems to be correct: all p borders should leave on same side from curr range

2 days ago
On Wednesday, 14 May 2025 at 04:26:08 UTC, Ali Çehreli wrote:
> On 5/12/25 8:31 AM, Denis Feklushkin wrote:
>
> > Vulkan API is used and it implicitly creates threads.
>
> Do those threads call back to D code that allocate from the GC? If so, the GC must be aware of the threads to be able to suspend them during a collection.

There is no such thing in my code (it is possible with Vulkan, but I removed this code from the test build)

But I am almost sure that the problem is in the Vulkan lib: when Vulkan VkDevice object created then about 30 threads implicitely created by Vulkan library and something goes wrong

2 days ago

On Wednesday, 14 May 2025 at 06:54:42 UTC, Denis Feklushkin wrote:

>

((p < curr && p_end <= curr) || (p >= curr_end && p_end >= curr_end));

The idea of ​​such check failed because List pointers to other lists sometimes are overwritten by garbage and issue just moves to curr.next access

I also see that there is no any kind of TLS sections in the libvulkan.so

But I don't understand why malloc() can give intersecting allocations in this case. Any ideas?

2 days ago

On Wednesday, 14 May 2025 at 08:00:06 UTC, Denis Feklushkin wrote:

>

But I don't understand why malloc() can give intersecting allocations in this case. Any ideas?

The malloc function could either be thread-safe or thread-unsafe. Both are not reentrant:

Malloc operates on a global heap, and it's possible that two different invocations of malloc that happen at the same time, return the same memory block. (The 2nd malloc call should happen before an address of the chunk is fetched, but the chunk is not marked as unavailable). This violates the postcondition of malloc, so this implementation would not be re-entrant.

https://stackoverflow.com/a/3941563

Okay, I think the question can be considered closed

12 hours ago

On Wednesday, 14 May 2025 at 09:11:13 UTC, Denis Feklushkin wrote:

>

Okay, I think the question can be considered closed

However, I am still on this issue! :(

ERROR: AddressSanitizer: SEGV on unknown address 0x000100000006

I tried all 4 available TLS models: global-dynamic, local-dynamic, initial-exec, local-exec. But I didn't build druntime with these models - only resulting binary.

Valgring says that memory block, returned by malloc(), has never been allocated dynamically:

$ valgrind --tool=memcheck ./pukan
[...]
==1218062==  Address 0x100000006 is not stack'd, malloc'd or (recently) free'd

$fs_base (a-la TLS pointer reported by GDB) is 0x00007ffff7a50b40. And all other allocated and used values of my D code are lying nearby this value.

I also found that the problem with access to the 0x100000006 pointer is quite common. And, it seems, always threads-related:

https://bbs.archlinux.org/viewtopic.php?id=210363 - here I couldn't track how they solved the problem
https://github.com/gluster/glusterfs/issues/2971 - crashed while ltcmalloc library init/fini related functions are called in two different threads during a library loaded/unloaded.The process is getting crashed during access of tls variables in heap profiler api

Adding fuel to the fire is the fact that the same vulkan library works for me without (any known) problems in Danny Arends project, which uses SDL2 instead of glfw. Loading the vulkan library itself happens by the same way in both projects - linking during the build process.

2 hours ago

On Friday, 16 May 2025 at 10:42:36 UTC, Denis Feklushkin wrote:

>

On Wednesday, 14 May 2025 at 09:11:13 UTC, Denis Feklushkin wrote:

>

Okay, I think the question can be considered closed

However, I am still on this issue! :(

I still think this is may be a druntime issue. And it's probably not about TLS.

I discovered the rr tool that allows quickly create and replay repeatable replays in the gdb (its built-in system works very slowly). So now there's no need to run gdb many times and carefully examine everything. rr available in Debian, but that version doesn't work with my code - some kind of tick counting error, seems because video driver used), but self-compiled one works fine.

So, after playing and rewind few times I clearly see:

I made sure that malloc uses switched "arenas" as soon as threads appear - this mechanism is built into glibc and enabled automatically when second pthread created.

I also tried replacing free(void*) symbol with my own empty stub to make sure that nothing was freed definitely and someone didn't get the used piece again. It didn't help.

Vulkan library quite legitimately allocates some memory for its needs, uses it, and this memory contains that memory piece where the issue occurs. I don't know why Valgrind answered (evasively) that this memory had not been allocated before.

Next, when executing on the D side, GC's pool of small allocations (of size 32) is exhausted. And then some magic happens in the gc.d code using recoverPool near SmallObjectPool.allocPage(), which I do not fully understand. (Obliviously, this is necessary to reuse the memory that was previously allocated.)

As a result, a new List is formed without malloc() call. This list contains a pointer to the some pool. Apparently, this memory is taken from a previously used pool. But at the same time, the memory that this pointer points to looks as has never been touched by any D code. I haven't figured out why this is so yet. Perhaps there is some error in calculating pointers.

Also, during inside of allocPage, execution flow gets to the line:

void* p = baseAddr + pn * PAGESIZE;

but at same time baseAddr == 0xf0f0f0f0f0f0f0f0f0 (result of MEMSTOMP)

2 hours ago

On Friday, 16 May 2025 at 20:26:44 UTC, Denis Feklushkin wrote:

>

where the issue occurs. I don't know why Valgrind answered (evasively) that this memory had not been allocated before.

Because it's not an address, it's just some internal data accidentally saved into List.pool by Vulkan