Jump to page: 1 27  
Page
Thread overview
Program crash: GC destroys an object unexpectedly
Sep 13
eugene
Sep 13
user1234
Sep 13
user1234
Sep 13
eugene
Sep 13
user1234
Sep 13
eugene
Sep 13
eugene
Sep 14
Tejas
Sep 14
eugene
Sep 14
eugene
Sep 14
eugene
Sep 14
eugene
Sep 14
eugene
Sep 14
eugene
Sep 14
jfondren
Sep 14
eugene
Sep 14
jfondren
Sep 14
eugene
Sep 14
jfondren
Sep 14
eugene
Sep 14
jfondren
Sep 14
eugene
Sep 14
eugene
6 days ago
jfondren
2 days ago
eugene
3 days ago
eugene
3 days ago
jfondren
2 days ago
eugene
2 days ago
eugene
2 days ago
jfondren
2 days ago
eugene
2 days ago
eugene
2 days ago
eugene
Sep 14
eugene
2 days ago
eugene
2 days ago
eugene
1 day ago
eugene
Sep 13
eugene
Sep 13
jfondren
Sep 13
eugene
Sep 14
eugene
Sep 14
eugene
Sep 14
eugene
2 days ago
eugene
6 hours ago
jfondren
5 hours ago
eugene
5 hours ago
jfondren
5 hours ago
eugene
5 hours ago
H. S. Teoh
4 hours ago
eugene
4 hours ago
eugene
5 hours ago
H. S. Teoh
5 hours ago
H. S. Teoh
September 13

Code snippets

class Stopper : StageMachine {

    enum ulong M0_IDLE = 0;
    Signal sg0, sg1;

    this() {
        super("STOPPER");

        Stage init, idle;
        init = addStage("INIT", &stopperInitEnter);
        idle = addStage("IDLE", &stopperIdleEnter);

        init.addReflex("M0", idle);

        idle.addReflex("S0", &stopperIdleS0);
        idle.addReflex("S1", &stopperIdleS1);
    }

    void stopperInitEnter() {
        sg0 = newSignal(Signal.sigInt);
        sg1 = newSignal(Signal.sigTerm);
        msgTo(this, M0_IDLE);
    }

The instance of Stopper is created in the scope of main():

void main(string[] args) {

    auto stopper = new Stopper();
    stopper.run();

stopperInitEnter(), where sg0 and sg1 are created, is invoked inside run() method.

After ~6 seconds from the start (dummy) destructors of sg0 and sg1 are called:

!!! esrc.EventSource.~this() : esrc.Signal (owner STOPPER, fd = 24) this @ 0x7fa5410d4f60
!!! esrc.EventSource.~this() : esrc.Signal (owner STOPPER, fd = 25) this @ 0x7fa5410d4f90

Then after pressing ^C (SIGINT) the program gets SIGSEGV, since references to sg0 and sg1 are no longer valid (they are "sitting" in epoll_event structure).

First I thought I am stupid and I do not see some obvious mistake, but...
That crash happens if the program was compiled with dmd (v2.097.2).
When using gdc (as well as ldc, both from Debian 8 official repo), I do not observe no crashes - program may run for hours and after interrupting by ^C it terminates as expected.

And the most strange thing is this - if using gdc with -Os flag, the program behaves
exactly as when compiled with fresh dmd - destructors for sg0 and sg1 are called soon
after program start.

I do not understand at all why GC considers those sg0 and sg1 as unreferenced.
And why old gdc (without -Os) and old ldc do not.

September 13

On Monday, 13 September 2021 at 17:18:30 UTC, eugene wrote:

>

[...]

At first glance and given the fact that some code necessary to diagnose the problem accurately is missing:

new Stopper allocates using the GC. it's then a "GC range", it's content will be scanned and handled by the GC, including sg0 and sg1.

So far everything is simple.

The problems seems to lies in newSignal() which "would" not allocate using the GC. So when the GC reaches sg0 and sg1 values, indirectly when scanning a Stopper instance, it thinks that these they are unused and, consequently, free them.

If you dont want them to be managed by the GC remove them from the GC, using removeRange().

September 13

On Monday, 13 September 2021 at 17:40:41 UTC, user1234 wrote:

>

On Monday, 13 September 2021 at 17:18:30 UTC, eugene wrote:

>

[...]

At first glance and given the fact that some code necessary to diagnose the problem accurately is missing:

new Stopper allocates using the GC. it's then a "GC range", it's content will be scanned and handled by the GC, including sg0 and sg1.

So far everything is simple.

The problems seems to lies in newSignal() which "would" not allocate using the GC. So when the GC reaches sg0 and sg1 values, indirectly when scanning a Stopper instance, it thinks that these they are unused and, consequently, free them.

If you dont want them to be managed by the GC remove them from the GC, using removeRange().

of sorry, or maybe the opposite, so addRange...

September 13

On Monday, 13 September 2021 at 17:40:41 UTC, user1234 wrote:

>

The problems seems to lies in newSignal() which "would" not allocate using the GC.

final Signal newSignal(int signum) {
    Signal sg = new Signal(signum);
    sg.owner = this;
    sg.number = sg_number++;
    sg.register();
    return sg;
}

full src is here
http://zed.karelia.ru/0/e/edsm-in-d-2021-09-10.tar.gz

September 13

On Monday, 13 September 2021 at 17:54:43 UTC, eugene wrote:

>

On Monday, 13 September 2021 at 17:40:41 UTC, user1234 wrote:

>

The problems seems to lies in newSignal() which "would" not allocate using the GC.

final Signal newSignal(int signum) {
    Signal sg = new Signal(signum);
    sg.owner = this;
    sg.number = sg_number++;
    sg.register();
    return sg;
}

full src is here
http://zed.karelia.ru/0/e/edsm-in-d-2021-09-10.tar.gz

thx, so the problem is not what I suspected to be (mixed gc-managed and manually managed memory). sorrry...

September 13

On Monday, 13 September 2021 at 17:18:30 UTC, eugene wrote:

>

And the most strange thing is this

It is echo-server/echo-client pair.
And it is echo-client that crashes upon SIGINT.

echo-server contains very similar code

class Listener : StageMachine {

enum ulong M0_WORK = 0;
enum ulong M1_WORK = 1;
enum ulong M0_GONE = 0;

RestRoom workerPool;
ushort port;
TCPListener reception;
Signal sg0, sg1;

this(RestRoom wPool, ushort port = 1111) {
    super("LISTENER");
    workerPool = wPool;
    this.port = port;

    Stage init, work;
    init = addStage("INIT", &listenerInitEnter);
    work = addStage("WORK", &listenerWorkEnter);

    init.addReflex("M0", work);

    work.addReflex("L0", &listenerWorkL0);
    work.addReflex("M0", &listenerWorkM0);
    work.addReflex("S0", &listenerWorkS0);
    work.addReflex("S1", &listenerWorkS1);
}

void listenerInitEnter() {
    reception = newTCPListener(port);
    sg0 = newSignal(Signal.sigInt);
    sg1 = newSignal(Signal.sigTerm);
    msgTo(this, M0_WORK);
}

but it does not crashes (destruc).

The only significant difference - it has TCPListener instance, besides absolutely the same sg0 and sg1 'channels'.

September 13

On Monday, 13 September 2021 at 17:56:34 UTC, user1234 wrote:

>

thx, so the problem is not what I suspected to be (mixed gc-managed and manually managed memory). sorrry...

I am actually C coder and do not have much experience
with GC languages, so I did not even attempt to
try use D without GC yet, just want to understand
how all that GC magic works.

The programs does not contain manual malloc()/free(),
I am just not ready for such mix.

September 13

On 9/13/21 1:54 PM, eugene wrote:

>

On Monday, 13 September 2021 at 17:40:41 UTC, user1234 wrote:

>

The problems seems to lies in newSignal() which "would" not allocate using the GC.

    final Signal newSignal(int signum) {
        Signal sg = new Signal(signum);
        sg.owner = this;
        sg.number = sg_number++;
        sg.register();
        return sg;
    }

full src is here
http://zed.karelia.ru/0/e/edsm-in-d-2021-09-10.tar.gz

The GC only scans things that it knows about.

Inside your EventQueue you have this code:

    void registerEventSource(EventSource es) {
        auto e = EpollEvent(0, es);
        int r = epoll_ctl(id, EPOLL_CTL_ADD, es.id, &e);
        assert(r == 0, "epoll_ctl(ADD) failed");
    }

    EventQueue opOpAssign(string op)(EventSource es)
    if (("+" == op) || ("~" == op)) {
        registerEventSource(es);
        return this;
    }

    void deregisterEventSource(EventSource es) {
        auto e = EpollEvent(0, es);
        int r = epoll_ctl(id, EPOLL_CTL_DEL, es.id, &e);
        assert(r == 0, "epoll_ctl(DEL) failed");
    }

    EventQueue opOpAssign(string op)(EventSource es)
    if ("-" == op) {
        deregisterEventSource(es);
        return this;
    }

And you are registering your signals using the += operator.

What is happening here, is, epoll_ctl is adding your event source to a C allocated structure (namely the epoll struct, allocated by epoll_create1, and possibly even managed by the OS). The GC does not have access to this struct, so if that's the only reference to them, they will get cleaned up by the GC.

Now, with your stopper code that you showed, it looks like you are storing the reference to stopper right on the main stack frame. This should prevent those from being destroyed, since Stopper has a reference to both signals.

But I would recommend using core.memory.GC.addRoot on your EventSource when registering it with epoll, and using core.memory.GC.removeRoot when unregistering. That will ensure they do not get cleaned up before being unregistered. If this doesn't fix the problem, perhaps there is some other issue happening.

-Steve

September 13

On Monday, 13 September 2021 at 17:18:30 UTC, eugene wrote:

>

Then after pressing ^C (SIGINT) the program gets SIGSEGV, since references to sg0 and sg1 are no longer valid (they are "sitting" in epoll_event structure).

engine/ecap.d(54): Error: field EpollEvent.es cannot assign to misaligned pointers in @safe code
engine/ecap.d(56): Error: cannot take address of local e in @safe function registerEventSource

from adding @safe to ecap.EventQueue.registerEventSource, and then from using a @trusted block to silence the first complaint.

Instead of using a temporary EpollEvent array in EventQueue.wait, you could make the array an instance variable and have registerEventSource populate it directly, so that the GC can always trace from this array to an EnventSource.

... however, I don't think this fixes your problem, or is your only problem, since the segfault's still observed when this memory is leaked:

    void registerEventSource(EventSource es) {
        import core.memory : pureMalloc, GC;

        auto p = cast(EpollEvent*) pureMalloc(EpollEvent.sizeof);
        p.event_mask = 0;
        p.es = es;
        GC.addRoot(p);
        int r = epoll_ctl(id, EPOLL_CTL_ADD, es.id, p);
        assert(r == 0, "epoll_ctl(ADD) failed");
    }
September 13

On Monday, 13 September 2021 at 18:42:47 UTC, Steven Schveighoffer wrote:

>

And you are registering your signals using the += operator.

That was a sort of exercise with operator overloading.

>

Now, with your stopper code that you showed, it looks like you are storing the reference to stopper right on the main stack frame. This should prevent those from being destroyed, since Stopper has a reference to both signals.

Exactly - this is the main point of my confusion.
On my expectation, GC should not mark those as unreferenced.

Also, notice those dynamic arrays

void main(string[] args) {
    RxSm[] rxMachines;
    auto rxPool = new RestRoom();
    foreach (k; 0 .. nConnections) {
        auto sm = new RxSm(rxPool);
        rxMachines ~= sm;
        sm.run();
    }

rxMachines (and alike) are not needed by the prog itself,
they are just to keep references for GC.

« First   ‹ Prev
1 2 3 4 5 6 7