September 14, 2021

On Tuesday, 14 September 2021 at 16:07:00 UTC, jfondren wrote:

>

No. And when was the first one?

here:

On Monday, 13 September 2021 at 18:45:22 UTC, jfondren wrote:

>

auto p = cast(EpollEvent*) pureMalloc(EpollEvent.sizeof);

What? Allocate struct epoll_event on the heap?
It is a feeble joke ;)

    static int ecap__add(int fd, void *dptr)
    {
        struct epoll_event waitfor = {0};
           int flags, r;

        waitfor.data.ptr = dptr;

        r = epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &waitfor);
        if (-1 == r) {

All fd's (sockets, timers etc) are added the same way
and corresponding EventSources are not destroyed by GC.

September 14, 2021

On Tuesday, 14 September 2021 at 16:15:20 UTC, eugene wrote:

>

On Tuesday, 14 September 2021 at 16:07:00 UTC, jfondren wrote:

>

No. And when was the first one?

here:

On Monday, 13 September 2021 at 18:45:22 UTC, jfondren wrote:

>

auto p = cast(EpollEvent*) pureMalloc(EpollEvent.sizeof);

What? Allocate struct epoll_event on the heap?
It is a feeble joke ;)

It is an example of deliberately static storage that does not fix your problem, thereby proving that the broken lifetimes of the struct are not your only problem.

I explained that one at the time, and I explained this one. If it comes with an explanation, it's probably not a joke.

>
    static int ecap__add(int fd, void *dptr)
    {
        struct epoll_event waitfor = {0};
           int flags, r;

        waitfor.data.ptr = dptr;

        r = epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &waitfor);
        if (-1 == r) {

All fd's (sockets, timers etc) are added the same way
and corresponding EventSources are not destroyed by GC.

GC needs to be able to stop your program and find all of the live objects in it. The misaligned pointer and the reference-containing struct that vanishes on the return of your corresponding function are both problems for this.

September 14, 2021

On Tuesday, 14 September 2021 at 16:43:50 UTC, jfondren wrote:

>

GC needs to be able to stop your program

nice fantasies...

>

and find all of the live objects in it. The misaligned pointer and the reference-containing struct that vanishes on the return of your corresponding function are both problems for this.

where did you find 'misaligned pointer'?...

September 14, 2021

On Tuesday, 14 September 2021 at 16:56:52 UTC, eugene wrote:

>

On Tuesday, 14 September 2021 at 16:43:50 UTC, jfondren wrote:

>

GC needs to be able to stop your program

nice fantasies...

>

and find all of the live objects in it. The misaligned pointer and the reference-containing struct that vanishes on the return of your corresponding function are both problems for this.

where did you find 'misaligned pointer'?...

It doesn't seem like communication between us is possible, in the "a five-pound phone won't sell" way. You can find this answer explained with code in an earlier post.

My suggestion remains: try troubleshooting by making your program @safe.

September 14, 2021

On Tuesday, 14 September 2021 at 17:02:32 UTC, jfondren wrote:

>

It doesn't seem like communication between us is possible

and you are wrong, as usual ,)

>

in the "a five-pound phone won't sell" way.

I am not a 'selling boy'

>

My suggestion remains: try troubleshooting by making your program @safe.

Please, take that clever bot away.

September 14, 2021

On 9/14/21 2:05 PM, eugene wrote:

>

On Tuesday, 14 September 2021 at 17:02:32 UTC, jfondren wrote:

>

It doesn't seem like communication between us is possible

and you are wrong, as usual ,)

>

in the "a five-pound phone won't sell" way.

I am not a 'selling boy'

>

My suggestion remains: try troubleshooting by making your program @safe.

Please, take that clever bot away.

People are trying to help you here. With that attitude, you are likely to stop getting help.

-Steve

September 14, 2021

On Tuesday, 14 September 2021 at 18:33:33 UTC, Steven Schveighoffer wrote:

>

People are trying to help you here.

Then, answer the questions.

Why those sg0 and sg1 are 'collected'
by this so f... antstic GC?

September 14, 2021
On 9/14/21 9:56 AM, eugene wrote:

> On Tuesday, 14 September 2021 at 16:43:50 UTC, jfondren wrote:

>> The misaligned pointer and the
>> reference-containing struct that vanishes on the return of your
>> corresponding function are both problems for this.
>
> where did you find 'misaligned pointer'?...

I think it's the align(1) for EpollEvent.

I was able to reproduce the segmentation fault and was seemingly able to fix it by making the EventSource class references alive by adding a constructor:

align (1) struct EpollEvent {
    align(1):
    uint event_mask;
    EventSource es;

  this(uint event_mask, EventSource es) {
    this.event_mask = event_mask;
    this.es = es;
    living ~= es;  // <-- Introduced this constructor for this line
  }
    /* just do not want to use that union, epoll_data_t */
}

// Here is the array that keeps EventSource alive:
EventSource[] living;

If that really is the fix, of course the references must be taken out of that container when possible.

Ali

September 15, 2021

On Tuesday, 14 September 2021 at 20:59:14 UTC, Ali Çehreli wrote:

>

On 9/14/21 9:56 AM, eugene wrote:

>

On Tuesday, 14 September 2021 at 16:43:50 UTC, jfondren wrote:

> >

The misaligned pointer and the
reference-containing struct that vanishes on the return of
your
corresponding function are both problems for this.

where did you find 'misaligned pointer'?...

I think it's the align(1) for EpollEvent.

I was able to reproduce the segmentation fault and was seemingly able to fix it by making the EventSource class references alive by adding a constructor:

align (1) struct EpollEvent {
align(1):
uint event_mask;
EventSource es;

this(uint event_mask, EventSource es) {
this.event_mask = event_mask;
this.es = es;
living ~= es; // <-- Introduced this constructor for this line
}
/* just do not want to use that union, epoll_data_t */
}

// Here is the array that keeps EventSource alive:
EventSource[] living;

If that really is the fix, of course the references must be taken out of that container when possible.

Ali

Yep. This patch is sufficient to prevent the segfault:

diff --git a/engine/ecap.d b/engine/ecap.d
index 71cb646..d57829c 100644
--- a/engine/ecap.d
+++ b/engine/ecap.d
@@ -32,6 +32,7 @@ final class EventQueue {
     private int id;
     private bool done;
     private MessageQueue mq;
+    private EventSource[] sources;

     private this() {
         id = epoll_create1(0);
@@ -52,6 +53,7 @@ final class EventQueue {

     void registerEventSource(EventSource es) {
         auto e = EpollEvent(0, es);
+        sources ~= es;
         int r = epoll_ctl(id, EPOLL_CTL_ADD, es.id, &e);
         assert(r == 0, "epoll_ctl(ADD) failed");
     }
@@ -63,7 +65,10 @@ final class EventQueue {
     }

     void deregisterEventSource(EventSource es) {
+        import std.algorithm : countUntil, remove;
+
         auto e = EpollEvent(0, es);
+        sources = sources.remove(sources.countUntil(es));
         int r = epoll_ctl(id, EPOLL_CTL_DEL, es.id, &e);
         assert(r == 0, "epoll_ctl(DEL) failed");
     }

Going through the project and adding @safe: to the top of everything results in these errors: https://gist.github.com/jrfondren/c7f7b47be057273830d6a31372895895
some I/O, some @system functions, some weird C APIs ... and misaligned assignments to EpollEvent.es. So debugging with @safe isn't bad, but I'd still like rustc-style error codes:

engine/ecap.d(89): Error E415: field `EpollEvent.es` cannot assign to misaligned pointers in `@safe` code

$ dmd --explain E415

Yeah see, the garbage collector only looks for pointers at pointer-aligned addresses.
September 18, 2021

On Tuesday, 14 September 2021 at 20:59:14 UTC, Ali Çehreli wrote:

>

On 9/14/21 9:56 AM, eugene wrote:

>

On Tuesday, 14 September 2021 at 16:43:50 UTC, jfondren wrote:

> >

The misaligned pointer and the
reference-containing struct that vanishes on the return of
your
corresponding function are both problems for this.

where did you find 'misaligned pointer'?...

I think it's the align(1) for EpollEvent.

The definition of this struct was taken from
/usr/include/dmd/druntime/import/core/sys/linux/epoll.d

version (X86_Any)
{
    align(1) struct epoll_event
    {
    align(1):
        uint events;
        epoll_data_t data;
    }
}

I am using my own definition, because data field
has not any special meaning for the Linux kernel,
it is returned as is by epoll_wait().
I am always using this field as pointer to EventSource.

This struct has to be 12 bytes for x86 arch,
in /usr/include/linux/eventpoll.h it looks like this:

struct epoll_event {
        __u32 events;
        __u64 data;
} EPOLL_PACKED;

At some moment I had different definition (align is only inside):

struct EpollEvent {
    align(1):
    uint event_mask;
    EventSource es;
    /* just do not want to use that union, epoll_data_t */
}

But it's appeared:

  1. relatively fresh gdc (from Linux Mint 19) does the right thing, the structure is packed and has 12 bytes size.
  2. old gdc (from Debian 8) produces 16 bytes EventEpoll and both programs
    gets SIGSEGV right after first return from epoll_wait(), hence this check:
static assert(EpollEvent.sizeof == 12);

If the reason for crash was in EpollEvent alignment,
programs would segfaults always very soon after start,
just right after the very first return from epoll_wait().