September 18, 2021

On Saturday, 18 September 2021 at 09:39:24 UTC, eugene wrote:

>

The definition of this struct was taken from
/usr/include/dmd/druntime/import/core/sys/linux/epoll.d
...
If the reason for crash was in EpollEvent alignment,
programs would segfaults always very soon after start,
just right after the very first return from epoll_wait().

The struct's fine as far as libc and the kernel are concerned. epoll_wait is not even using those 64 bits or interpreting them as containing any kind of data, it's just moving them around for the caller to use. It's also not a hardware error to interpret those bits where they are as a pointer. They are however not 64-bit aligned so D's GC is collecting objects that only they point to.

September 19, 2021
>

reference-containing struct that vanishes on the return of your corresponding function

I do not think it's a problem, otherwise both programs would not work at all.
However, echo-server works without any surprises;
echo-client also works, except that EventSources
pointed by sg0 and sg1 data members in the Stopper instance,
are cleared by GC soon after echo-client start.
This does not mean that echo-client gets SIGSEGV right after
those objects are destroyed, no - the crash happens later,
upon receiving SIGINT or SIGTERM.

September 19, 2021

On Wednesday, 15 September 2021 at 23:07:45 UTC, jfondren wrote:

>

Yep. This patch is sufficient to prevent the segfault:

Your idea (hold references to all event sources somewhere) is quite clear,
but it confuses me a bit, since

  1. there are references to all event sources already,
    they are data members in StageMachine subclasses.
  2. only two of many events sources are destroyed,
    namely, those which are referenced by sg1 and sg0 in Stopper machine of echo-client.
    All other event sources are not destroyed.
September 19, 2021

On Saturday, 18 September 2021 at 09:54:05 UTC, jfondren wrote:

>

On Saturday, 18 September 2021 at 09:39:24 UTC, eugene wrote:

>

The definition of this struct was taken from
/usr/include/dmd/druntime/import/core/sys/linux/epoll.d
...
If the reason for crash was in EpollEvent alignment,
programs would segfaults always very soon after start,
just right after the very first return from epoll_wait().

The struct's fine as far as libc and the kernel are concerned. epoll_wait is not even using those 64 bits or interpreting them as containing any kind of data, it's just moving them around for the caller to use. It's also not a hardware error to interpret those bits where they are as a pointer.

Exactly.

>

They are however not 64-bit aligned so D's GC is collecting objects that only they point to.

Ok...

  1. There are 303 event sources in echo-server,
    200 in RX machines (100 Ios and 100 Timers),
    100 Ios in TX machines and finally 3 in
    Listener (one Io and two signals, sg0 and sg1)

All of these 303 references in EpollEvent struct are 'misaligned'
in this sense, but non of corresponding objects are collected.

  1. There are 22 event sources in echo-client,
    20 in RX machines (10 Ios and 10 Timers),
    10 Ios in TX machines and finally 2 in Stopper machines
    (sg0 and sg1, for handling SIGINT and SIGTERM),
    but only the two last are collected, all other are not -
    here is the problem.
September 19, 2021

On Monday, 13 September 2021 at 17:18:30 UTC, eugene wrote:

>

The instance of Stopper is created in the scope of main():

void main(string[] args) {

    auto stopper = new Stopper();
    stopper.run();

Look...
I have added stopper into an array...

    Stopper[] stoppers;
    auto stopper = new Stopper();
    stoppers ~= stopper;
    stopper.run();

and, you won't believe, this have fixed the problem -
the objects, referenced by sg0 and sg1 are not destroyed anymore.

This is much more acceptable 'solition' for me than adding
all of that bunch of event sources into some array.

But I'am still puzzled - what is so special in the stopper?
echo-server has it 'reception' just as single variable
and it works fine.

September 19, 2021

On Sunday, 19 September 2021 at 08:51:31 UTC, eugene wrote:

> >

reference-containing struct that vanishes on the return of your corresponding function
I do not think it's a problem, otherwise both programs would not work at all.

The GC doesn't reliably punish objects living past there not being any references to them because it's not always operating. If you have a tight loop where the GC is never invoked, you can do what ever crazy things you want. Your program doesn't crash until you hit ctrl-C after all.

>

Look...
I have added stopper into an array...

    Stopper[] stoppers;
    auto stopper = new Stopper();
    stoppers ~= stopper;
    stopper.run();

and, you won't believe, this have fixed the problem -
the objects, referenced by sg0 and sg1 are not destroyed anymore.

This is a sufficient patch to prevent the segfault:

diff --git a/echo_client.d b/echo_client.d
index 1f8270e..5ec41df 100644
--- a/echo_client.d
+++ b/echo_client.d
@@ -32,7 +32,7 @@ void main(string[] args) {
         sm.run();
     }

-    auto stopper = new Stopper();
+    scope stopper = new Stopper();
     stopper.run();

     writeln(" === Hello, world! === ");

The scope stack-allocates Stopper.

This is also a sufficient patch to prevent the segfault:

diff --git a/echo_client.d b/echo_client.d
index 1f8270e..0b968a8 100644
--- a/echo_client.d
+++ b/echo_client.d
@@ -39,4 +39,6 @@ void main(string[] args) {
     auto md = new MessageDispatcher();
     md.loop();
     writeln(" === Goodbye, world! === ");
+    writeln(stopper.sg0.number);
+    //writeln(stopper.sg1.number);
 }

either one of those writelns will do it.

Without either of the above, STOPPER is destroyed a few seconds into a run of echo-client:

$ ./echo-client | grep STOPPER
'STOPPER' registered 24 (esrc.Signal)
'STOPPER' registered 25 (esrc.Signal)
'STOPPER @ INIT' got 'M0' from 'SELF'
'STOPPER' enabled 24 (esrc.Signal)
'STOPPER' enabled 25 (esrc.Signal)
(seconds pass)
stopper.Stopper.~this(): STOPPER destroyed

You can hit ctrl-C prior to Stopper's destruction and there's no segfault. (On my system, it won't show the usual 'segfault' message to the terminal when grep is filtering like that, but if you turn on coredumps you can see one is only generated with a ctrl-C after Stopper's destroyed.)

So this looks at first to me like a bug: dmd is allowing Stopper to be collected before the end of its lexical scope if it isn't used later in it. Except, forcing a collection right after stopper.run() doesn't destroy it.

Here's a patch that destroys Stopper almost immediately, so that a ctrl-C within milliseconds of the program starting will still segfault it. This also no longer requires the server to be active.

diff --git a/engine/edsm.d b/engine/edsm.d
index 513d8a5..ea9ac3a 100644
--- a/engine/edsm.d
+++ b/engine/edsm.d
@@ -176,6 +176,8 @@ class StageMachine {
"'%s @ %s' got '%s' from '%s'", name, currentStage.name, eventName,
m.src ? (m.src is this ? "SELF" : m.src.name) : "OS"
);

  •    import core.memory : GC;
    
  •    GC.collect;
    
       if (eventName !in currentStage.reflexes) {
    

valgrind:

^C==14893== Thread 1:
==14893== Jump to the invalid address stated on the next line
==14893==    at 0x2: ???
==14893==    by 0x187A3C: void disp.MessageDispatcher.loop()
==14893==    by 0x1BED89: _Dmain

with Stopper's collection prevented and some logging around reactTo:

^Csi.sizeof = 128
about to react to Message(null, stopper.Stopper, 0, esrc.Signal)
'STOPPER @ IDLE' got 'S0' from 'OS'
goodbye, world
reacted
 === Goodbye, world! ===
1
ecap.EventQueue.~this
stopper.Stopper.~this(): STOPPER destroyed

So the problem here is that ctrl-C causes that message to come but Stopper's been collected and that address contains garbage. Since the Message in the MessageQueue should keep it alive, I think this is probably a bug in dmd.

September 19, 2021

On Sunday, 19 September 2021 at 16:27:55 UTC, jfondren wrote:

>

So the problem here is that ctrl-C causes that message to come but Stopper's been collected and that address contains garbage.

This is exactly what I was trying to say...
Thanx a lot for your in-depth investigation of the trouble!
I'll try your patches later.

>

Since the Message in the MessageQueue should keep it alive, I think this is probably a bug in dmd.

In the starting post I noticed that

  • when compiled with gdc, echo-client does not crash
  • when compiled with ldc, no crash
  • but when compiled with gdc -Os, same crash as with dmd.

The last was (and still is) the most confusing observation for me.

September 19, 2021

On Monday, 13 September 2021 at 17:54:43 UTC, eugene wrote:

>

full src is here
http://zed.karelia.ru/0/e/edsm-in-d-2021-09-10.tar.gz

I've also made two simple examples, just in case

Now, let's put some pressure on garbage collector

Every 10 ms do some allocations:

    void mainIdleEnter() {
        tm0.enable();
        tm0.heartBeat(10); // milliseconds
    }

    void mainIdleT0(StageMachine src, Object o) {
        int[] a;
        foreach (k; 0 .. 1000) {
            a ~= k;
        }
    }

After 3 seconds from the start destructors are called

edsm-in-d-simple-example-2 $ ./test | grep owner
!!! esrc.EventSource.~this() : esrc.Signal (owner STOPPER, fd = 5) this @ 0x7fa267872150
!!! esrc.EventSource.~this() : esrc.Signal (owner STOPPER, fd = 6) this @ 0x7fa267872180

After this happens, pressing ^C results in segfault.

September 19, 2021

On Sunday, 19 September 2021 at 16:27:55 UTC, jfondren wrote:

>

This is a sufficient patch to prevent the segfault:

diff --git a/echo_client.d b/echo_client.d
index 1f8270e..5ec41df 100644
--- a/echo_client.d
+++ b/echo_client.d
@@ -32,7 +32,7 @@ void main(string[] args) {
         sm.run();
     }

-    auto stopper = new Stopper();
+    scope stopper = new Stopper();
     stopper.run();

I tried stack allocated stopper in my second 'simple example' and...
No segfault, but:
http://zed.karelia.ru/0/e/oops.png
As can be seen from the screenshot, destructors of sg0 and sg1 were not called,
but at the very end something went completely wrong.

September 19, 2021

On Sunday, 19 September 2021 at 16:27:55 UTC, jfondren wrote:

>

This is also a sufficient patch to prevent the segfault:

diff --git a/echo_client.d b/echo_client.d
index 1f8270e..0b968a8 100644
--- a/echo_client.d
+++ b/echo_client.d
@@ -39,4 +39,6 @@ void main(string[] args) {
     auto md = new MessageDispatcher();
     md.loop();
     writeln(" === Goodbye, world! === ");
+    writeln(stopper.sg0.number);
+    //writeln(stopper.sg1.number);
 }

This one really helps, program terminates as expected:

'MAIN @ IDLE' got 'T0' from 'OS'
'MAIN @ IDLE' got 'T0' from 'OS'
^Csi.sizeof = 128
'STOPPER @ IDLE' got 'S0' from 'OS'
0
 === Goodbye, world! ===
___!!!___edsm.StageMachine.~this(): MAIN destroyed...
ecap.EventQueue.~this
   !!! esrc.EventSource.~this() : esrc.Timer (owner MAIN, fd = 4) this @ 0x7f15e6c870c0
___!!!___edsm.StageMachine.~this(): STOPPER destroyed...
   !!! esrc.EventSource.~this() : esrc.Signal (owner STOPPER, fd = 5) this @ 0x7f15e6c8a150
   !!! esrc.EventSource.~this() : esrc.Signal (owner STOPPER, fd = 6) this @ 0x7f15e6c8a180