May 16

On Friday, 16 May 2025 at 20:26:44 UTC, Denis Feklushkin wrote:

>

As a result, a new List is formed without malloc() call. This list contains a pointer to the some pool. Apparently, this memory is taken from a previously used pool. But at the same time, the memory that this pointer points to looks as has never been touched by any D code.

...and into this memory writes Vulkan library, as into its own allocated memory

May 17

On Friday, 16 May 2025 at 22:55:33 UTC, Denis Feklushkin wrote:

>

On Friday, 16 May 2025 at 20:26:44 UTC, Denis Feklushkin wrote:

>

As a result, a new List is formed without malloc() call. This list contains a pointer to the some pool. Apparently, this memory is taken from a previously used pool. But at the same time, the memory that this pointer points to looks as has never been touched by any D code.

...and into this memory writes Vulkan library, as into its own allocated memory

I'm really tired of researching this issue. Maybe someone else also interested?

Just made a branch with latest dirty debug changes:

git clone --branch=manual_reduce git@github.com:denizzzka/pukan.git

commit: 34dff13e76bb6ffbe9053eb8cad8f8f33a850b94

May 18

On Saturday, 17 May 2025 at 20:31:22 UTC, Denis Feklushkin wrote:

>

I'm really tired of researching this issue. Maybe someone else also interested?

Just made a branch with latest dirty debug changes:

git clone --branch=manual_reduce git@github.com:denizzzka/pukan.git

commit: 34dff13e76bb6ffbe9053eb8cad8f8f33a850b94

I managed to reduce the GC calls to several thousands (yes!) small GC.malloc()/GC.free() calls and get rid of third-party libraries (vulkan, etc). Actually, I just recorded all allocations/deallocations that my D code makes and then trimmed them a bit because the error still reproduceable. I hope that this is not a problem in the approach itself.

Sample now looks like one file (ZIP archive link):

/+ dub.sdl:
	name "issue"
+/
// How to run: dub run --single code.d

import core.memory: GC;

auto gc_malloc(T...)(T a)
{
    auto r = GC.malloc(a);
    assert(r !is null);
    return r;
}

auto gc_free(T...)(T a) => GC.free(a);

void main() {

version(linux)
version(DigitalMars)
{
    import etc.linux.memoryerror;
    registerMemoryAssertHandler();
}

void* ptr_0x7f5b360f3008 = gc_malloc(72, 0x1);
void* ptr_0x7f5b360f4008 = gc_malloc(8, 0x0);
void* ptr_0x7f5b360f5008 = gc_malloc(24, 0xa);
[...]
void* ptr_0x7f5b3611b968 = gc_malloc(12, 0x0);
void* ptr_0x7f5b3611b988 = gc_malloc(12, 0x0);

}

After compiling by DMD v2.111.0 execution returns:

> dub run --single code.d --compiler=dmd
    Starting Performing "debug" build using dmd for x86_64.
    Building issue ~master: building configuration [application]
     Linking issue
     Running issue
core.exception.AssertError@/usr/include/dmd/druntime/import/etc/linux/memoryerror.d(415): segmentation fault: null pointer read/write operation
----------------
??:? _d_assert_msg [0x55f779816710]
/usr/include/dmd/druntime/import/etc/linux/memoryerror.d:415 extern (C) nothrow @nogc void etc.linux.memoryerror.registerMemoryAssertHandler!().registerMemoryAssertHandler()._d_handleSignalAssert(int, core.sys.posix.signal.siginfo_t*, void*) [0x55f7798165f3]
??:? [0x7fdfed618def]
??:? rt_finalize2 [0x55f77981d75b]
??:? rt_finalizeFromGC [0x55f7798486ba]
??:? nothrow ulong core.internal.gc.impl.conservative.gc.Gcx.sweep() [0x55f77983e478]
??:? nothrow ulong core.internal.gc.impl.conservative.gc.Gcx.fullcollect(bool, bool) [0x55f77983f5a5]
??:? nothrow ulong core.internal.gc.impl.conservative.gc.ConservativeGC.runLocked!(core.internal.gc.impl.conservative.gc.ConservativeGC.fullCollect().go(core.internal.gc.impl.conservative.gc.Gcx*), core.internal.gc.impl.conservative.gc.Gcx*).runLocked(ref core.internal.gc.impl.conservative.gc.Gcx*) [0x55f7798442e2]
??:? nothrow ulong core.internal.gc.impl.conservative.gc.ConservativeGC.fullCollect() [0x55f77983ba9f]
??:? nothrow void core.internal.gc.impl.conservative.gc.ConservativeGC.collect() [0x55f77983ba7d]
??:? gc_term [0x55f7798280c7]
??:? rt_term [0x55f77981d002]
??:? void rt.dmain2._d_run_main2(char[][], ulong, extern (C) int function(char[][])*).runAll() [0x55f779816d60]
??:? void rt.dmain2._d_run_main2(char[][], ulong, extern (C) int function(char[][])*).tryExec(scope void delegate()) [0x55f779816c49]
??:? _d_run_main2 [0x55f779816bb2]
??:? _d_run_main [0x55f77981699b]
/usr/include/dmd/druntime/import/core/internal/entrypoint.d:29 main [0x55f779816485]
??:? [0x7fdfed602ca7]
??:? __libc_start_main [0x7fdfed602d64]
??:? _start [0x55f779801670]
Error Program exited with code 1
May 18

On Sunday, 18 May 2025 at 12:32:23 UTC, Denis Feklushkin wrote:

>

On Saturday, 17 May 2025 at 20:31:22 UTC, Denis Feklushkin wrote:

>

I'm really tired of researching this issue. Maybe someone else also interested?

Just made a branch with latest dirty debug changes:

git clone --branch=manual_reduce git@github.com:denizzzka/pukan.git

commit: 34dff13e76bb6ffbe9053eb8cad8f8f33a850b94

I managed to reduce the GC calls to several thousands (yes!) small GC.malloc()/GC.free() calls and get rid of third-party libraries (vulkan, etc).

This is was wrong approach. SIGSERV caused by FINALIZE attr bits on some of GC.malloc() calls without acltually specified class info

That's it, I have no other ideas

May 18

On Monday, 12 May 2025 at 21:29:10 UTC, Steven Schveighoffer wrote:

> >

Yes, of course I understand perfectly well. And it seems to me that I am not doing anything "reprehensible".

The "reprehensible" thing that almost always causes GC issues is use after free because you are interacting with C memory.

I just added GC.collect() before lines what caused SIGSERV and all was fixed. Is that what you meant?

If so, I don't understand the nature of this error

I feel uncomfortable about all this: if it fixes problem - then why? If it doesn't, then there must be a bug somewhere that is causing and collect() jsut masks it

Commit that fixes(or not?) issue

May 19

On Sunday, 18 May 2025 at 15:49:21 UTC, Denis Feklushkin wrote:

>

This is was wrong approach. SIGSERV caused by FINALIZE attr bits on some of GC.malloc() calls without acltually specified class info

Oof, yes. The gc_malloc calls with the FINALIZE bit set need either a class object to be filled in, or a struct finalizer supplied via the TypeInfo (you are not supplying any to the calls).

>

That's it, I have no other ideas

Memory problems suck. Finding out why something did something after the fact is nearly impossible.

In all my experience with the GC, and I've had a lot over the last year, these problems are extremely difficult to find.

Please send me an email, maybe we can do some kind of session to try and find the problems. I have very good current knowledge of the GC, but I'm not going to be able to understand your program without help.

-Steve

May 19

On Sunday, 18 May 2025 at 19:10:18 UTC, Denis Feklushkin wrote:

>

On Monday, 12 May 2025 at 21:29:10 UTC, Steven Schveighoffer wrote:

> >

Yes, of course I understand perfectly well. And it seems to me that I am not doing anything "reprehensible".

The "reprehensible" thing that almost always causes GC issues is use after free because you are interacting with C memory.

I just added GC.collect() before lines what caused SIGSERV and all was fixed. Is that what you meant?

No, I mean that almost always a GC problem is caused by using memory that the GC cannot see.

So things are collected before they are unreferenced.

However, GC.disable at the start should fix it, and you've said that doesn't. So that sounds more like a straight buffer overflow or other issue.

>

If so, I don't understand the nature of this error

I feel uncomfortable about all this: if it fixes problem - then why? If it doesn't, then there must be a bug somewhere that is causing and collect() jsut masks it

I would guess it is the latter.

-Steve

1 2 3
Next ›   Last »