On Saturday, 19 April 2025 at 22:49:19 UTC, Jonathan M Davis wrote:
> On Thursday, April 17, 2025 8:39:27 PM MDT Walter Bright via Digitalmars-d wrote:
> I'd like to know what those gdc and ldc transformations are, and whether they are controllable with a switch to their optimizers.
I know there's a problem with WASM not faulting on a null dereference, but in another post I suggested a way to deal with it.
Unfortunately, my understanding isn't good enough to explain those details. I discussed it with Johan in the past, but I've never worked on ldc or with llvm (or on gdc/gcc), so I really don't know what is or isn't possible. However, from what I recall of what Johan said, we were kind of stuck, and llvm considered dereferencing null to be undefined behavior.
There is a way now to tell LLVM that dereferencing null is defined (nota bene) behavior.
> It may be the case that there's some sort of way to control that (and llvm may have more capabilities in that regard since I last discussed it with Johan), but someone who actually knows llvm is going to have to answer those questions. And I don't know how gdc's situation differs either.
So far not responded in this thread because I feel it is an old discussion, with old misunderstandings.
There is confusion between dereferencing in the language, versus dereferencing by the CPU. What I think that C and C++ do very well is separate language behavior from implementation/CPU behavior, and only prescribe language behavior, no (or very little) implementation behavior. I feel D should do the same.
Non-virtual method example, where (in my opinion) the dereference happens at call site, not inside the function:
class A {
int a;
final void foo() { // non-virtual
a = 1; // no dereference here
}
}
A a;
a.foo(); <-- DEREFERENCE
During program execution, with the current D implementation of classes and non-virtual methods, the CPU will only "dereference" the this
pointer to do the assignment to a
. But that is only the case for our current implementation. For the D language behavior, it does not matter what the implementation does: same behavior should happen on any architecture/platform/execution model.
If you want to fault on null-dereference, I believe you have to add a null-check at every dereference at language level (regardless of implementation details). Perhaps it does not impact performance very much (with optimizer enabled); I vaguely remember a paper from Microsoft where they tried this and did not see a big perf impact (if any).
Some notes to trigger you to think about distinguishing language behavior from CPU/implementation details:
- You don't have to implement classes and virtual functions using a vptr/vtable, there are other options!
- There does not need to be a "stack" (implementation detail vocabulary). Some "CPUs" don't have a "stack", and instead do "local storage" (language vocabulary) in an alternative way. In fact, even on CPUs with stack, it can help to not use it! (read about Address Sanitizer detection of stack-use-after-scope and ASan's "fake stack")
- Pointers don't have to be memory addresses (you probably already know that they are not physical addresses on common CPUs), but could probably be implemented as hashes/keys into a database as well. C does not define ordered comparison (e.g. > and <) for pointers (it's implementation defined, IIRC), except when they point into the same object (e.g. an array or struct). Why? Because what does it mean on segmented memory architectures (i.e. x86)?
- Distinguishing language from implementation behavior means that correct programs work the same on all kinds of different implementations (e.g. you can run your C++ program in a REPL, or run it in your browser through WASM).
cheers,
Johan