RFC: Change what killing a thread does on error instead (page 11)

Settings

Help

Index » General » RFC: Change what killing a thread does on error instead (page 11)

1 day ago

RFC: Change what killing a thread does on error instead

Posted by FeepingCreature
in reply to Richard (Rikki) Andrew Cattermole

Permalink

FeepingCreature

Posted in reply to Richard (Rikki) Andrew Cattermole

Permalink

On Sunday, 29 June 2025 at 18:04:51 UTC, Richard (Rikki) Andrew Cattermole wrote:

Hello!

I've managed to have a chat with Walter to discuss what assert does on error.

In recent months, it has become more apparent that our current error-handling behaviours have some serious issues. Recently, we had a case where an assert threw, killed a thread, but the process kept going on. This isn't what should happen when an assert fails.

An assert specifies that the condition must be true for program continuation. It is not for logic level issues, it is solely for program continuation conditions that must hold.

Should an assert fail, the most desirable behaviour for it to have is to print a backtrace if possible and then immediately kill the process.

I disagree. A thread dying should simply kill the program, no matter for what reason it does. Threads dying not killing the program by default is what's the problem here. If it was an exception rather than AssertError, it'd be just as bad. We have an internal thread implementation that does nothing but guarantee that 1. the thread's error is logged, 2. the program goes down immediately after.

1 day ago

Re: RFC: Change what assert does on error

Posted by Richard (Rikki) Andrew Cattermole
in reply to Dennis

Permalink

Richard (Rikki) Andrew Cattermole

Posted in reply to Dennis

Permalink

On 09/07/2025 2:34 AM, Dennis wrote:
> On Tuesday, 8 July 2025 at 10:47:52 UTC, Richard (Rikki) Andrew Cattermole wrote:
>> I've found where the compiler is implementing this, verified it.
>>
>> Its not nothrow specific.
> 
> Whether a function is nothrow affects whether a call expression 'can throw'
> 
> https://github.com/dlang/dmd/ blob/9610da2443ec4ed3aeed060783e07f76287ae397/compiler/src/dmd/ canthrow.d#L131-L139
> 
> Which affects whether a statement 'can throw'
> 
> https://github.com/dlang/dmd/ blob/9610da2443ec4ed3aeed060783e07f76287ae397/compiler/src/dmd/ blockexit.d#L101C23-L101C31
> 
> And when a 'try' statement can only fall through or halt, then a (try A; finally B) gets transformed into (A; B). When the try statement 'can throw' this doesn't happen.
> 
> https://github.com/dlang/dmd/ blob/9610da2443ec4ed3aeed060783e07f76287ae397/compiler/src/dmd/ statementsem.d#L3421-L3432
> 
> Through that path, nothrow produceds better generated code, which you can easily verify by looking at assembler output of:
> 
> ```D
> void f();
> void testA() {try {f();} finally {f();}}
> 
> void g() nothrow;
> void testB() {try {g();} finally {g();}}
> ```

Yeah I've debugged all of this, and you're talking about what I found.

That simplification rewrite, is an optimization that can be removed from the frontend.

Right now it is contributing to the belief that Error will not run cleanup. Which isn't true. It does.

>> Its Exception specific, not nothrow. Its subtle, but very distinct difference.
> 
> I have no idea what this distinction is supposed to say, but "there is no nothrow specific optimizations taking place" is either false or pedantic about words.

Right, a nothrow specific optimization to me would mean that a function is marked as nothrow and therefore an optimization takes place because of it. The attribute comes before the optimization.

That isn't what is happening here. The compiler is going statement by statement, looking to see if in the execution of that statement it could return via an Exception exception and then when not present simplifying the AST. The attribute is coming after the optimization.

1 day ago

Re: RFC: Change what assert does on error

Posted by Dukc
in reply to Richard (Rikki) Andrew Cattermole

Permalink

Dukc

Posted in reply to Richard (Rikki) Andrew Cattermole

Permalink

On Tuesday, 8 July 2025 at 19:18:39 UTC, Richard (Rikki) Andrew Cattermole wrote:

Right now it is contributing to the belief that Error will not run cleanup. Which isn't true. It does.

Either interpretation is wrong. It is currently unspecified whether Error will run cleanups, unless it is explicitly caught.

Or maybe I should write "undetermined", as I don't think the spec actually covers this. But I believe that is the intent behind what it currently does.

Right, a nothrow specific optimization to me would mean that a function is marked as nothrow and therefore an optimization takes place because of it. The attribute comes before the optimization.

That isn't what is happening here. The compiler is going statement by statement, looking to see if in the execution of that statement it could return via an Exception exception and then when not present simplifying the AST. The attribute is coming after the optimization.

Nonetheless, the presence of nothrow attribute on a called function is affecting what is happening. I believe this is what everyone else here means with nothrow optimisation, no more, no less.

1 day ago

Re: RFC: Change what killing a thread does on error instead

Posted by Dukc
in reply to FeepingCreature

Permalink

Dukc

Posted in reply to FeepingCreature

Permalink

On Tuesday, 8 July 2025 at 18:37:03 UTC, FeepingCreature wrote:

On Sunday, 29 June 2025 at 18:04:51 UTC, Richard (Rikki) Andrew Cattermole wrote:

Hello!

I've managed to have a chat with Walter to discuss what assert does on error.

An assert specifies that the condition must be true for program continuation. It is not for logic level issues, it is solely for program continuation conditions that must hold.

Should an assert fail, the most desirable behaviour for it to have is to print a backtrace if possible and then immediately kill the process.

That's an interesting idea actually. I think we still should have some mechanism for another thread to handle a thread death but maybe catching another error at another thread isn't the way.

Instead, maybe some thread could register a death handler delegate (thread gravedigger?) that is called if another thread dies. If there is no gravedigger, or if the only gravedigger thread itself dies, then all others would immediately receive an unrecoverable error, and the error from the dead thread would be what is reported.

1 day ago

Re: RFC: Change what assert does on error

Posted by Dennis
in reply to Richard (Rikki) Andrew Cattermole

Permalink

Dennis

Posted in reply to Richard (Rikki) Andrew Cattermole

Permalink

On Tuesday, 8 July 2025 at 19:18:39 UTC, Richard (Rikki) Andrew Cattermole wrote:

That simplification rewrite, is an optimization that can be removed from the frontend.

"it can be removed" is irrelevant if we're talking about whether the optimization exists now.

Right now it is contributing to the belief that Error will not run cleanup. Which isn't true. It does.

Not when it bubbles through functions that have this nothrow optimization.

import std.stdio;

// This program doesn't print "cleanup", unless you remove `nothrow`
void nothrowError() nothrow => throw new Error("error");
void main()
{
    try {nothrowError();}
    finally {writeln("cleanup");}
}

Right, a nothrow specific optimization to me would mean that a function is marked as nothrow and therefore an optimization takes place because of it. The attribute comes before the optimization.

That's exactly what I'm demonstrating: two functions, both with hidden bodies, only difference is the nothrow annotation, different code gen.

Whether a function is nothrow determines the outcome of the statement control flow analysis that leads to the optimization. The tf.nothrow check is executed before the code path that does the AST rewrite, so you'd have to clarify what you means with "comes after". There's an indirection there, but that's completely irrelevant for this discussion. I really don't get what point you're trying to make. These are the facts:

nothrow currently affects code generation
nothrow currently affects whether throw Error skips finally blocks in try-finally blocks
nothrow can be written down in source code or inferred, which is treated the same
scope(exit) and destructor calls are lowered to try-finally, making them behave equivalently
All this logic currently exists in the frontend
It is possible to remove the nothrow optimization by changing frontend logic
There's a discussion going on whether that's desirable.

Do you disagree with any of these, or is there a different point you're trying to make?

1 day ago

Re: RFC: Change what killing a thread does on error instead

Posted by Sebastiaan Koppe
in reply to Dukc

Permalink

Sebastiaan Koppe

Posted in reply to Dukc

Permalink

On Tuesday, 8 July 2025 at 19:55:13 UTC, Dukc wrote:

That's an interesting idea actually. I think we still should have some mechanism for another thread to handle a thread death but maybe catching another error at another thread isn't the way.

That is similar to what happens with structured concurrency. For every execution context there is always an owner to which any Error gets forwarded to, all the way up to the main thread.

It would be straightforward to change that so that it terminates the process on the spot, but I prefer graceful shutdown instead.

1 day ago

Re: RFC: Change what killing a thread does on error instead

Posted by Dukc
in reply to Sebastiaan Koppe

Permalink

Dukc

Posted in reply to Sebastiaan Koppe

Permalink

On Tuesday, 8 July 2025 at 20:24:06 UTC, Sebastiaan Koppe wrote:

> >

That is similar to what happens with structured concurrency. For every execution context there is always an owner to which any Error gets forwarded to, all the way up to the main thread.

I think you misunderstood. There would be no thread-specific owner, only a global handler for all others and maybe a backup handler in case the gravedigger itself dies.

But, guaranteeing that each thread has an owner is certainly an excellent concept too. I would maybe not go for that in this case though. Not because I'd consider structured concurrency inferior (rather the opposite in fact), but because the solution should preferably work with existing client code.

1 day ago

Re: RFC: Change what assert does on error

Posted by Adam D. Ruppe
in reply to Dennis

Permalink

Adam D. Ruppe

Posted in reply to Dennis

Permalink

On Tuesday, 8 July 2025 at 14:05:25 UTC, Dennis wrote:

So I take it opend changed that, being okay with the breaking change?

opend reverted dmd's change of behavior introduced around 2018. Prior to then, dmd ran the finally blocks in all cases, then they changed it to "optimize" nothrow functions.

Now, I can't call this a regression per se, since the documentation said you can't expect the finally blocks to be run on Errors already even before that change, but this was a breaking change in practice - and not a simple compile error if you happened to combine certain features, it is a silent change to runtime behavior, not running code you wrote in only certain circumstances. Quite spooky.

Only if you dogmatically stick to the ideology that catching errors is unacceptable - despite the potential real world benefits of catching it, and the fact it does work just fine most the time even in upstream today (and historically, did in all cases) - can you justify this skipping of code as an optimization rather than a silent wrong-code compiler bug.

Because @safe constructors of structs containing fields with @system destructors will now raise a safety error even with nothrow.

I've never encountered this, perhaps because upstream also worked this same way for many years, including through most the active development period of druntime, phobos, and arsd doesn't really concern itself with @safe nothrow attribute spam.

But if this did cause a compile error.... I'd prefer that to a silent runtime change, at least we'd be alerted to the change in behavior instead of being left debugging a puzzling situation with very little available information.

mutex.lock();
arr[i]++;
mutex.unlock();

Instead of this:

mutex.lock();
scope(exit) mutex.unlock();
arr[i]++;

Like here, if the RangeError is thrown and the mutex remains locked with the first code sample, ok, you can understand the exception was thrown on line 2, so line 3 didn't run. Not pleasant when it happens to you, but you'll at least understand what happened.

But with the second sample, it'd take a bit, not much since it being an Error instead of Exception would jump out pretty quickly, but a bit of language lawyering to understand why the mutex is still locked in upstream D - normally, scope(exit) is a good practice for writing exception safe code.

While I can't say I have the numbers to prove that its performance is important to me, I currently like the idea that scope(exit)/destructors are a zero-cost abstraction when Exceptions are absent.

For what its worth, I kinda like the idea too, it did pain me a little to see the codegen bloat back up a lil when reverting that change. But....

> >

Correctness trumps minor performance improvements.

yup.

1 day ago

Re: RFC: Change what killing a thread does on error instead

Posted by Derek Fawcus
in reply to Sebastiaan Koppe

Permalink

Derek Fawcus

Posted in reply to Sebastiaan Koppe

Permalink

On Tuesday, 8 July 2025 at 20:24:06 UTC, Sebastiaan Koppe wrote:

On Tuesday, 8 July 2025 at 19:55:13 UTC, Dukc wrote:
That is similar to what happens with structured concurrency. For every execution context there is always an owner to which any Error gets forwarded to, all the way up to the main thread.

It would be straightforward to change that so that it terminates the process on the spot, but I prefer graceful shutdown instead.

It was mentioned up thread that this could be an exception. Was that supposed to be the language exception, or also include CPU exceptions - resulting in signals under unix?

For the latter, I want the process to crash and core dump by default, not have something try and catch SIGSEGV, SIGBUS, SIGFPE etc.

1 day ago

Re: RFC: Change what killing a thread does on error instead

Posted by Sebastiaan Koppe
in reply to Derek Fawcus

Permalink

Sebastiaan Koppe

Posted in reply to Derek Fawcus

Permalink

On Tuesday, 8 July 2025 at 20:40:46 UTC, Derek Fawcus wrote:

It was mentioned up thread that this could be an exception. Was that supposed to be the language exception, or also include CPU exceptions - resulting in signals under unix?

For the latter, I want the process to crash and core dump by default, not have something try and catch SIGSEGV, SIGBUS, SIGFPE etc.

In most cases you wouldn't want to catch those, so the default should be to coredump indeed.

Top | Forum index | About this forum

Forums