10 hours ago
On 5/4/2025 1:28 PM, Timon Gehr wrote:
> ```
> private void _enforceNoOverlap(const char[] action,
>      uintptr_t ptr1, uintptr_t ptr2, const size_t bytes)
> {
>      const d = ptr1 > ptr2 ? ptr1 - ptr2 : ptr2 - ptr1;
>      if (d >= bytes)
>          return;
>      const overlappedBytes = bytes - d;
> 
>      UnsignedStringBuf tmpBuff = void;
>      string msg = "Overlapping arrays in ";
>      msg ~= action;
>      msg ~= ": ";
>      msg ~= overlappedBytes.unsignedToTempString(tmpBuff);
>      msg ~= " byte(s) overlap of ";
>      msg ~= bytes.unsignedToTempString(tmpBuff);
>      assert(0, msg);
> }
> ```

I would not write logging code like that that relied on the gc being in a working state.

> Unfortunately I had accidentally detached my debugger instead of getting the backtrace at the time. It is possible that this is what is happening on the user's machine, but I don't know. And I also have no idea where this overlap might occur.
> 
> This just gives an invalid instruction even in dubs "release-debug" builds by default unless using a custom druntime build. I just don't think this should ever happen to anyone, even if there is some workaround, by default people will run into this at least once and perhaps they will not be as lucky as me and see it with a debugger attached.
> 

The -release switch causes `assert(0, "message")` to be replaced with "ud2" which generates a breakpoint for the debugger. It triggers an invalid opcode exception.

You could write a signal handler to intercept that. See my other post for how to write one.
10 hours ago
On 5/5/2025 7:10 AM, Kagamin wrote:
> On Monday, 5 May 2025 at 13:45:49 UTC, Timon Gehr wrote:
>> https://github.com/dlang/dmd/commit/dda88738b2b391239c87cfdbd60b76d6025306e5
> 
> just revert

Check the PR for it first for the reason why.
9 hours ago
On 5/6/25 02:04, Walter Bright wrote:
> On 5/4/2025 1:06 PM, Timon Gehr wrote:
>> My understanding is all of those features are under constant threat of being broken for "Error" and not guaranteed to work in the first place.
> 
> We learned our lesson to not break existing code. For example, we deprecated use of complex numbers maybe 15 years ago, and yet there's still active code using them. This is annoying to me because it makes doing the AArch64 code generator significantly harder.
> ...

Well, personally I am mostly upset about this deprecation because `creal` was my favorite D keyword. 😋

Also I had actually bought the justification for why they were built-in.

> 
>>> Worst case, you can write a signal handler for any of the signals that can be generated (not just null pointer signals), and have the handler write all it can to a file that can be emailed to you.
> 
> I asked grok "write a signal handler for linux that catches a null pointer seg fault":
> ...

Sure, it's not rocket science. Anyway, if you recall, the mysterious crash was on _Windows_, on a machine I do not control. I am sure I will be able to get out stack traces on segfault there too using SEH and by perhaps shipping debug info.

I can solve or work around problems. However, I think it is better to eliminate said problems at the source, for everyone.

The point is:

- by default you just get nothing. even if druntime prints a stack trace (which it might do on windows, I am not a windows user myself, so have not seen that), I suspect it just deletes the console window immediately anyway.

- you have to do some manual steps, which you most likely only do once you actually face an irreproducible issue

- you have to wait for the next report, which may be months later (and next report is not the same as the next time it happens, as I don't like putting telemetry in my software)

- running the scope guards as advertised fixes the issue, e.g. on Windows I wrapped the main function in:

```d
try ...
catch(Throwable e){
    writeln(e.toString());
    system("pause");
    return 1;
}
```
This way my users can actually see the stack trace in cases where it is actually printed. And even before that, _other scope guards will have gathered useful information related to the crash beyond just a call stack trace_.

- cases where destructors/scope guards used to run are actively being broken so that scope guards no longer run. Branches are actively being coded into druntime to result in invalid instruction errors even if the programmer does not compile with a flag that means they want this. This has to stop, it's terrible UX.


>>> You can also hook atexit() as maybe your program is exiting that way.
>>
>> Well, yes. As I said, I can probably figure it out at some point. It's just a lot more work than necessary in some other languages for comparable situations.
> 
> https://man7.org/linux/man-pages/man3/atexit.3.html
> 
> ```
> #include <stdlib.h>
> 
> int atexit(typeof(void (void)) *function);
> 
> DESCRIPTION
> 
> The atexit() function registers the given function to be called at
> normal process termination, either via exit(3) or via return from
> the program's main().  Functions so registered are called in the
> reverse order of their registration; no arguments are passed.
> ```

Sure. I guess I could try to put the `system("pause")` call into this instead, then at least stderr will be visible to users somewhat more often, even in cases where scope guards are skipped entirely.

The entire point though is that I do not want destructors and scope guards to be skipped. x) None of your proposed workarounds so far fix this.

The additional work I am referring to is mostly _fixing the issue with only bare-bones information_ and _waiting for the next report_. I care less about _getting the program to spit out more than zero information on crash_. It seems you misunderstood me. Adding signal handlers is a nuisance, but this is not what causes the bulk of additional work. I do consider it a waste of time in its own right, though.


Of course, there will always be situations where destructors/scope guards don't run, e.g., if there is a hardware or power failure, but there can at least be a best effort to run them instead of actively working to suppress them. It will be sufficient in almost all cases.

Unless you are saying actually running the destructors and scope guards is similarly easy, I have not dug this deep into druntime so far. Then it should maybe be officially supported at least in an opt-in fashion. Also I suspect grok would be of limited utility for making this work, if doing so in a signal handler is even workable.

I understand that in cases where there is memory corruption you may not actually want to run them, but I feel D in standard usage is memory safe enough for me to make the call that running them always if possible at all will be overall vastly more helpful than running them unreliably.
9 hours ago
On 5/6/25 03:43, Walter Bright wrote:
> On 5/4/2025 1:28 PM, Timon Gehr wrote:
>> ```
>> private void _enforceNoOverlap(const char[] action,
>>      uintptr_t ptr1, uintptr_t ptr2, const size_t bytes)
>> {
>>      const d = ptr1 > ptr2 ? ptr1 - ptr2 : ptr2 - ptr1;
>>      if (d >= bytes)
>>          return;
>>      const overlappedBytes = bytes - d;
>>
>>      UnsignedStringBuf tmpBuff = void;
>>      string msg = "Overlapping arrays in ";
>>      msg ~= action;
>>      msg ~= ": ";
>>      msg ~= overlappedBytes.unsignedToTempString(tmpBuff);
>>      msg ~= " byte(s) overlap of ";
>>      msg ~= bytes.unsignedToTempString(tmpBuff);
>>      assert(0, msg);
>> }
>> ```
> 
> I would not write logging code like that that relied on the gc being in a working state.
> ...

Well, this is copied from druntime.

>> Unfortunately I had accidentally detached my debugger instead of getting the backtrace at the time. It is possible that this is what is happening on the user's machine, but I don't know. And I also have no idea where this overlap might occur.
>>
>> This just gives an invalid instruction even in dubs "release-debug" builds by default unless using a custom druntime build. I just don't think this should ever happen to anyone, even if there is some workaround, by default people will run into this at least once and perhaps they will not be as lucky as me and see it with a debugger attached.
>>
> 
> The -release switch causes `assert(0, "message")` to be replaced with "ud2" which generates a breakpoint for the debugger. It triggers an invalid opcode exception.
> 
> You could write a signal handler to intercept that. See my other post for how to write one.

Well, I can. I should not have to do that though to get basic functionality that is a given in some other languages. I replied to the other post.
9 hours ago
On 5/6/25 05:11, Timon Gehr wrote:
>>
> 
> The -release switch causes `assert(0, "message")` to be replaced with "ud2" which generates a breakpoint for the debugger. It triggers an invalid opcode exception.

druntime is built with that switch, so even in debug builds you will get invalid instruction errors. We have to fix this.
9 hours ago
On 5/6/25 01:49, Walter Bright wrote:
>  ...
>  > druntime is riddled with code that builds an error message and then does assert(0, msg);. x)
> 
> Here are the compiler options for responding to assert failures:
> 
> Behavior on assert/boundscheck/finalswitch failure:
>    =[h|help|?]    List information on all available choices
>    =D             Usual D behavior of throwing an AssertError
>    =C             Call the C runtime library assert failure function
>    =halt          Halt the program execution (very lightweight)
>    =context       Use D assert with context information (when available)
> 
> What you can do is use the -checkaction=C option, and then write your own version of the C runtime library assert failure function. Writing your own will prevent the linker from pulling it in from the C runtime library (i.e. it overrides it). You can then have it do whatever you need done.
> ...

I am perfectly content with it throwing an assert error and unwinding the stack. This is just not what druntime will do in the default build that ships with the compiler.

> Druntime also uses a function pointer for assert failures where you can set it to point to your own handler.
> ...

I am sure that does not work with the default build of druntime, when the assert(0) is also in druntime.

>  > Well, I have no idea where the crash happens and getting one log file every couple months (and writing out gigabytes of useless log files in the meantime) seems not workable. It would be much better to be able to react to the crash.
> 
> Is the program running continuously for 2 months? If not, every start of the program can delete the log file, and start a new one. Will it be gigabytes long? That depends on your strategy of where to put the logging statements, which is a bit of an art. The size of the logs can also be reduced by using a compression or hash of the function names.
> ...

Sure. You are making my point for me though, this amount of effort is vastly disproportionate to address an issue that is a) likely to originate in druntime and b) extremely uncommon and c) most likely trivial to deal with in some competing languages.

>  > Another one was about your claim that segfaults are always useful.
> 
> And they are. Even if you have to resort to writing your own signal handler. They're always better than what I had to deal with debugging code on a machine without them.
> ...

Well, I think I already granted that they are better than just letting the program run with memory corruption.

I just wish everyone would refrain from actively putting invalid instructions and segfaults into druntime in the future. They are not _that_ useful and there are vastly more useful alternatives. x)

> When you do find the problem, I'd love to know what it was.

I'll let you know. Not likely to happen very soon though.

I also still have to reduce a similar heisenbug I had where Phobos `sort` sometimes corrupted the data it was sorting.
9 hours ago
On 5/6/25 05:31, Timon Gehr wrote:
>  I just wish everyone would refrain from actively putting invalid instructions and segfaults into druntime in the future. They are not _that_ useful and there are vastly more useful alternatives. x)

And the same is true for segfault-on-null. I don't want this. If a standard null check can be implemented taking advantage of CPU features, fine. But semantics should be the same as if the compiler inserted a branch that throws an error every time a nullable pointer is dereferenced.

Even better would be the type system just ensuring nullable pointers are never dereferenced, the OP's experience report notwithstanding. But I guess this part is a pipe dream for now.
8 hours ago
On 5/6/25 03:46, Walter Bright wrote:
> On 5/5/2025 7:10 AM, Kagamin wrote:
>> On Monday, 5 May 2025 at 13:45:49 UTC, Timon Gehr wrote:
>>> https://github.com/dlang/dmd/commit/ dda88738b2b391239c87cfdbd60b76d6025306e5
>>
>> just revert
> 
> Check the PR for it first for the reason why.

https://github.com/dlang/druntime/pull/2794

From that PR discussion:

> > @WalterBright: Error and assert() do the same thing, we should just use one method. I'd like to eventually deprecate Error.
> 
> @Geod24: No they don't. First, we have checkaction to change the behavior. Second, in release mode (and we distribute druntime / phobos in release mode), the assert you are putting lead to a HLT, without error message, while throw new Error will lead to sensible error messages.

It seems there were multiple PRs like this one at the time, consistent with my perception that this is a pervasive issue.
6 hours ago
D is set up so if you throw an `Exception`, then destructors will run as the stack unwinds. But if you throw an `Error`, you can catch it but the destructors don't run.

The reason is that `Error` means the program has entered an invalid state. Nothing in the program can be trusted any more. The program should do as little as possible to close down gracefully.

The template `std.exception.enforce` works like `assert`, but throws an `Exception` instead.

The documentation for `enforce()` says:

$(NOTE `enforce` is used to throw exceptions and is therefore intended to
aid in error handling. It is $(I not) intended for verifying the logic
of your program - that is what `assert` is for.)

What you can do is override the default assert behavior by inserting your own assert handler by calling `core.exception.assertHandler()`. Be sure to set `-checkaction=D`

Then you can have assert() behave however you want.
6 hours ago
On Tuesday, 6 May 2025 at 01:43:07 UTC, Walter Bright wrote:

>
> I would not write logging code like that that relied on the gc being in a working state.
>

Why would you assume that the code that doesn't rely on the GC is in a working state?