1 day ago
On 5/4/25 19:34, Walter Bright wrote:
> On 5/4/2025 8:43 AM, Timon Gehr wrote:
>> The point is that Walter seems to be moving in the direction of creating even more cases where I would not be able to get any information back from normie Windows users,
> 
> I replied to you already: "I had no idea anyone was using it for that purpose. I guess I can't change that :-)"
> 
> Besides, you could still use `try ... catch (Error e)` instead of `scope(failure)`. After all, the `scope` construct literally is just rewritten as try...catch or try...finally.
> ...

My understanding is all of those features are under constant threat of being broken for "Error" and not guaranteed to work in the first place.

> 
>> while still not acknowledging that segfaults are useless and unworkable in practically relevant cases.
> 
> Worst case, you can write a signal handler for any of the signals that can be generated (not just null pointer signals), and have the handler write all it can to a file that can be emailed to you.
> 
> You can also hook atexit() as maybe your program is exiting that way.

Well, yes. As I said, I can probably figure it out at some point. It's just a lot more work than necessary in some other languages for comparable situations.
1 day ago
On 5/4/25 19:37, Walter Bright wrote:
> Have you considered it might be a stack overflow?

Yes, that is also an option, though I think it is not very likely for this particular application.
1 day ago
On 5/4/25 19:34, Walter Bright wrote:
> 
> 
>> while still not acknowledging that segfaults are useless and unworkable in practically relevant cases.
> 
> Worst case, you can write a signal handler for any of the signals that can be generated (not just null pointer signals), and have the handler write all it can to a file that can be emailed to you.
> 
> You can also hook atexit() as maybe your program is exiting that way.

I just remembered like half a year ago I have also seen a crash once in


```
private void _enforceNoOverlap(const char[] action,
    uintptr_t ptr1, uintptr_t ptr2, const size_t bytes)
{
    const d = ptr1 > ptr2 ? ptr1 - ptr2 : ptr2 - ptr1;
    if (d >= bytes)
        return;
    const overlappedBytes = bytes - d;

    UnsignedStringBuf tmpBuff = void;
    string msg = "Overlapping arrays in ";
    msg ~= action;
    msg ~= ": ";
    msg ~= overlappedBytes.unsignedToTempString(tmpBuff);
    msg ~= " byte(s) overlap of ";
    msg ~= bytes.unsignedToTempString(tmpBuff);
    assert(0, msg);
}
```

Unfortunately I had accidentally detached my debugger instead of getting the backtrace at the time. It is possible that this is what is happening on the user's machine, but I don't know. And I also have no idea where this overlap might occur.

This just gives an invalid instruction even in dubs "release-debug" builds by default unless using a custom druntime build. I just don't think this should ever happen to anyone, even if there is some workaround, by default people will run into this at least once and perhaps they will not be as lucky as me and see it with a debugger attached.

1 day ago
On 5/4/25 18:49, Walter Bright wrote:
> On 5/3/2025 9:49 PM, Timon Gehr wrote:
>> For the record, even if my application would not run very sluggishly when compiled with DMD, in this particular case it does not matter how accurate the segfault location is as I am not getting any information in the first place.
> 
> You suggested in another reply that I no experience debugging programs that do not have a tty attached. This is incorrect.

Well, I think the circumstances I described were more specific than only having no tty attached.


> When I've had programs "wink out" leaving no trail nor context behind, I add logging code. In particular, I add a line to the entry of key functions, something like `fprintf(log, "function name\n");` which appends to a log file. Examining the log file gives clues as to where the program was when it failed and a trail how it got there. I add more logging statements as needed to gradually close in to where the fault is.
> ...

Again, this can work decently well when the crash is reproducible.


> 
>> My understanding is that null pointer dereference being UB is a widespread assumption in the LLVM and GCC optimizers. Simply "disabling the behavior" is not practical.
> 
> It may be dependent on the optimization level.
> 
> There are all kinds of UB behavior that the LLVM and GCC optimizers just delete because, hey, it's undefined behavior that will never happen so it can be just deleted. Things like `(x + 1 > x)` being replaced with `1`. There's a way for the compiler to emit a warning when this is done, perhaps try that?
> 
> The compiler's ability to check at compile time for a null pointer dereference (and hence delete it) is extremely limited. It relies on data flow analysis where it can prove the pointer is null, not just "it might be". I read that compilers issue a warning when this is the case. Why not try `p = null; `*p = 3;` in your setup and see if the compiler you're using gives a warning?
> 
> If it does, but the warning does not happen with your code, then you know the compiler is not deleting the reference, and so the CPU will check it and seg fault it at runtime.
> ...

Well, there are multiple discussions here. One of them was about your claim that null dereferences will segfault reliably. Another one was about your claim that segfaults are always useful.

> 
>> Perhaps I could add `-fsanitize=null` to add null checks, but that would not really solve the main problem as it is not integrated with D scope guards.
> 
> It would at least tell you if it is a null pointer dereference or not. That in itself would be valuable information.
> ...

All of this speculation assumes that the design of D's error checking beyond null pointers is not at fault, but it is very likely that it is. druntime is riddled with code that builds an error message and then does `assert(0, msg);`. x)

I think at this time my best bet at narrowing it down is to use a custom druntime build. This should not be necessary, however.

> I have no idea what your program does, but I suspect your most practical option is to add logging to a file, and ask your customer to email the file to you when it crashes.
> 

Well, I have no idea where the crash happens and getting one log file every couple months (and writing out gigabytes of useless log files in the meantime) seems not workable. It would be much better to be able to react to the crash.

> Another thing you can try is build your program with dmd and see if it behaves differently regarding the mysterious crash.

This is not workable. The user is certainly not willing to use a DMD build for two months.
1 day ago

On Sunday, 4 May 2025 at 20:28:45 UTC, Timon Gehr wrote:

>

I just remembered like half a year ago I have also seen a crash once in

private void _enforceNoOverlap(const char[] action,
    uintptr_t ptr1, uintptr_t ptr2, const size_t bytes)
{
    const d = ptr1 > ptr2 ? ptr1 - ptr2 : ptr2 - ptr1;
    if (d >= bytes)
        return;
    const overlappedBytes = bytes - d;

    UnsignedStringBuf tmpBuff = void;
    string msg = "Overlapping arrays in ";
    msg ~= action;
    msg ~= ": ";
    msg ~= overlappedBytes.unsignedToTempString(tmpBuff);
    msg ~= " byte(s) overlap of ";
    msg ~= bytes.unsignedToTempString(tmpBuff);
    assert(0, msg);
}

That's a regression, also works with dmd: https://forum.dlang.org/post/krzyhjmbzkhqgolpemxz@forum.dlang.org

in 2.082 the code was

private void _enforceNoOverlap(const char[] action,
    uintptr_t ptr1, uintptr_t ptr2, in size_t bytes)
{
    const d = ptr1 > ptr2 ? ptr1 - ptr2 : ptr2 - ptr1;
    if(d >= bytes)
        return;
    const overlappedBytes = bytes - d;

    UnsignedStringBuf tmpBuff = void;
    string msg = "Overlapping arrays in ";
    msg ~= action;
    msg ~= ": ";
    msg ~= overlappedBytes.unsignedToTempString(tmpBuff, 10);
    msg ~= " byte(s) overlap of ";
    msg ~= bytes.unsignedToTempString(tmpBuff, 10);
    throw new Error(msg);
}
20 hours ago
On 5/5/25 10:19, Kagamin wrote:
> On Sunday, 4 May 2025 at 20:28:45 UTC, Timon Gehr wrote:
>> I just remembered like half a year ago I have also seen a crash once in
>>
>>
>> ```
>> private void _enforceNoOverlap(const char[] action,
>>     uintptr_t ptr1, uintptr_t ptr2, const size_t bytes)
>> {
>>     const d = ptr1 > ptr2 ? ptr1 - ptr2 : ptr2 - ptr1;
>>     if (d >= bytes)
>>         return;
>>     const overlappedBytes = bytes - d;
>>
>>     UnsignedStringBuf tmpBuff = void;
>>     string msg = "Overlapping arrays in ";
>>     msg ~= action;
>>     msg ~= ": ";
>>     msg ~= overlappedBytes.unsignedToTempString(tmpBuff);
>>     msg ~= " byte(s) overlap of ";
>>     msg ~= bytes.unsignedToTempString(tmpBuff);
>>     assert(0, msg);
>> }
>> ```
> 
> That's a regression, also works with dmd: https://forum.dlang.org/post/ krzyhjmbzkhqgolpemxz@forum.dlang.org
> 
> in 2.082 the code was
> ```
> private void _enforceNoOverlap(const char[] action,
>      uintptr_t ptr1, uintptr_t ptr2, in size_t bytes)
> {
>      const d = ptr1 > ptr2 ? ptr1 - ptr2 : ptr2 - ptr1;
>      if(d >= bytes)
>          return;
>      const overlappedBytes = bytes - d;
> 
>      UnsignedStringBuf tmpBuff = void;
>      string msg = "Overlapping arrays in ";
>      msg ~= action;
>      msg ~= ": ";
>      msg ~= overlappedBytes.unsignedToTempString(tmpBuff, 10);
>      msg ~= " byte(s) overlap of ";
>      msg ~= bytes.unsignedToTempString(tmpBuff, 10);
>      throw new Error(msg);
> }
> ```

https://github.com/dlang/dmd/commit/dda88738b2b391239c87cfdbd60b76d6025306e5





20 hours ago
On Monday, 5 May 2025 at 13:45:49 UTC, Timon Gehr wrote:
> https://github.com/dlang/dmd/commit/dda88738b2b391239c87cfdbd60b76d6025306e5

just revert
10 hours ago
> The user is certainly not willing to use a DMD build for two months.

As DMD will give you a compiler error message if it can statically determine (using Data Flow Analysis) there's a null dereference, you've found the problem. You won't need to send it to your customer. I'd turn on function inlining, too, which extends the reach of the static DFA.

If DMD does not detect any such, then unless LDC is doing a lot of interprocedural DFA, then LDC is not deleting UB null dereferences. If you do get a null dereference when executing the program, it will exhibit as a seg fault.

I read that the latest gcc/llvm compilers will give a warning if deleting UB. If in your case no such warning is given, then it is not deleting UB, and this is not an issue for the bug your user is experiencing.

> druntime is riddled with code that builds an error message and then does assert(0, msg);. x)

Here are the compiler options for responding to assert failures:

Behavior on assert/boundscheck/finalswitch failure:
  =[h|help|?]    List information on all available choices
  =D             Usual D behavior of throwing an AssertError
  =C             Call the C runtime library assert failure function
  =halt          Halt the program execution (very lightweight)
  =context       Use D assert with context information (when available)

What you can do is use the -checkaction=C option, and then write your own version of the C runtime library assert failure function. Writing your own will prevent the linker from pulling it in from the C runtime library (i.e. it overrides it). You can then have it do whatever you need done.

Druntime also uses a function pointer for assert failures where you can set it to point to your own handler.

> Well, I have no idea where the crash happens and getting one log file every couple months (and writing out gigabytes of useless log files in the meantime) seems not workable. It would be much better to be able to react to the crash.

Is the program running continuously for 2 months? If not, every start of the program can delete the log file, and start a new one. Will it be gigabytes long? That depends on your strategy of where to put the logging statements, which is a bit of an art. The size of the logs can also be reduced by using a compression or hash of the function names.

> Another one was about your claim that segfaults are always useful.

And they are. Even if you have to resort to writing your own signal handler. They're always better than what I had to deal with debugging code on a machine without them.

When you do find the problem, I'd love to know what it was.
10 hours ago
On 5/4/2025 1:06 PM, Timon Gehr wrote:
> My understanding is all of those features are under constant threat of being broken for "Error" and not guaranteed to work in the first place.

We learned our lesson to not break existing code. For example, we deprecated use of complex numbers maybe 15 years ago, and yet there's still active code using them. This is annoying to me because it makes doing the AArch64 code generator significantly harder.


>> Worst case, you can write a signal handler for any of the signals that can be generated (not just null pointer signals), and have the handler write all it can to a file that can be emailed to you.

I asked grok "write a signal handler for linux that catches a null pointer seg fault":

```C
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ucontext.h>

void segfault_handler(int sig, siginfo_t *info, void *context) {
    // Check if the fault was due to a null pointer access
    if (info->si_addr == NULL) {
        fprintf(stderr, "Caught segmentation fault: Null pointer access at RIP=%p\n",
                (void*)((ucontext_t*)context)->uc_mcontext.gregs[REG_RIP]);
    } else {
        fprintf(stderr, "Caught segmentation fault at address %p\n", info->si_addr);
    }

    // Print basic stack trace (simplified)
    void *buffer[10];
    int nptrs = backtrace(buffer, 10);
    char **strings = backtrace_symbols(buffer, nptrs);
    if (strings) {
        fprintf(stderr, "Stack trace:\n");
        for (int i = 0; i < nptrs; i++) {
            fprintf(stderr, "%s\n", strings[i]);
        }
        free(strings);
    }

    // Exit the program
    exit(EXIT_FAILURE);
}

int setup_segfault_handler(void) {
    struct sigaction sa;

    // Clear the structure
    memset(&sa, 0, sizeof(struct sigaction));

    // Set handler
    sa.sa_sigaction = segfault_handler;
    sa.sa_flags = SA_SIGINFO;  // Use sa_sigaction instead of sa_handler

    // Register handler for SIGSEGV
    if (sigaction(SIGSEGV, &sa, NULL) == -1) {
        perror("Failed to set up SIGSEGV handler");
        return -1;
    }

    return 0;
}

// Example usage
int main(void) {
    if (setup_segfault_handler() != 0) {
        return EXIT_FAILURE;
    }

    // Trigger a null pointer dereference
    int *ptr = NULL;
    *ptr = 42;  // This will cause a segfault

    return 0;
}
```

>> You can also hook atexit() as maybe your program is exiting that way.
> 
> Well, yes. As I said, I can probably figure it out at some point. It's just a lot more work than necessary in some other languages for comparable situations.

https://man7.org/linux/man-pages/man3/atexit.3.html

```
#include <stdlib.h>

int atexit(typeof(void (void)) *function);

DESCRIPTION

The atexit() function registers the given function to be called at
normal process termination, either via exit(3) or via return from
the program's main().  Functions so registered are called in the
reverse order of their registration; no arguments are passed.
```
10 hours ago
On 5/4/2025 1:07 PM, Timon Gehr wrote:
> On 5/4/25 19:37, Walter Bright wrote:
>> Have you considered it might be a stack overflow?
> 
> Yes, that is also an option, though I think it is not very likely for this particular application.

I've had many crashes due to stack overflows. The most recent was when I was trying to get move constructors to work, the semantic routines would go into an infinite recursive loop that was hard to untangle.

Code generation relies on seg faults to detect them. A guard page is inserted past the end of the stack allocation which generates a seg fault when it gets touched.

Divide by zero also generates a seg fault.