September 05, 2017
On Monday, 4 September 2017 at 21:23:50 UTC, Moritz Maxeiner wrote:
> On Monday, 4 September 2017 at 17:58:41 UTC, Johan Engelen wrote:
>>
>> (The spec requires crashing on null dereferencing, but this spec bit is ignored by DMD and LDC, I assume in GDC too.
>> Crashing on `null` dereferencing requires a null-check on every dereferencing through an unchecked pointer, because 0 might be a valid memory access, and also because ptr->someDataField is not going to lookup address 0, but 0+offsetof(someDataField) instead, e.g. potentially addressing a valid low address at 1000000, say.)
>
> It's not implemented as compiler checks because the "actual" requirement is "the platform has to crash on null dereference" (see the discussion in/around [1]). Essentially: "if your platform doesn't crash on null dereference, don't use D on it (at the very least not @safe D)".

My point was that that is not workable. The "null dereference" is a D language construct, not something that the machine is doing. It's ridiculous to specify that reading from address 1_000_000 should crash the program, yet that is exactly what is specified by D when running this code (and thus null checks need to be injected in many places to be spec compliant):

```
struct S {
  ubyte[1_000_000] a;
  int b;
}
void main() {
   S* s = null;
   s.b = 1;
}
```

-Johan
September 05, 2017
On Tuesday, 5 September 2017 at 18:32:34 UTC, Johan Engelen wrote:
> My point was that that is not workable. The "null dereference" is a D language construct, not something that the machine is doing. It's ridiculous to specify that reading from address 1_000_000 should crash the program, yet that is exactly what is specified by D when running this code (and thus null checks need to be injected in many places to be spec compliant):
>
> ```
> struct S {
>   ubyte[1_000_000] a;
>   int b;
> }
> void main() {
>    S* s = null;
>    s.b = 1;
> }
> ```
>
> -Johan

Perhaps it should nullcheck exceptionally large types which may overflow the memory protected area, but not others?


September 05, 2017
On Tuesday, 5 September 2017 at 18:32:34 UTC, Johan Engelen wrote:
> On Monday, 4 September 2017 at 21:23:50 UTC, Moritz Maxeiner wrote:
>> On Monday, 4 September 2017 at 17:58:41 UTC, Johan Engelen wrote:
>>>
>>> (The spec requires crashing on null dereferencing, but this spec bit is ignored by DMD and LDC, I assume in GDC too.
>>> Crashing on `null` dereferencing requires a null-check on every dereferencing through an unchecked pointer, because 0 might be a valid memory access, and also because ptr->someDataField is not going to lookup address 0, but 0+offsetof(someDataField) instead, e.g. potentially addressing a valid low address at 1000000, say.)
>>
>> It's not implemented as compiler checks because the "actual" requirement is "the platform has to crash on null dereference" (see the discussion in/around [1]). Essentially: "if your platform doesn't crash on null dereference, don't use D on it (at the very least not @safe D)".
>
> My point was that that is not workable. The "null dereference" is a D language construct, not something that the machine is doing.

While "null dereference" is a language construct "null" is defined as actual address zero (like it's defined in C/C++ by implementation) and dereference means r/w from/to that virtual memory address, it is something the machine does: Namely, memory protection, because the page for address 0 is (usually) not mapped (and D requires it to not be mapped for @safe to work), accessing it will lead to a page fault, which in turn leads to a segmentation fault and then program crash.

> It's ridiculous to specify that reading from address 1_000_000 should crash the program, yet that is exactly what is specified by D when running this code (and thus null checks need to be injected in many places to be spec compliant):
>
> ```
> struct S {
>   ubyte[1_000_000] a;
>   int b;
> }
> void main() {
>    S* s = null;
>    s.b = 1;
> }
> ```

In order to be spec compliant and correct a compiler would only need to inject null checks on dereferences where the size of the object being pointed to (in your example S.sizeof) is larger than the bottom virtual memory segment of the target OS (the one which no C compatible OS maps automatically and you also shouldn't map manually).
The size of that bottom segment, however, is usually _deliberately_ large precisely so that buggy (C) programs crash on NULL dereference (even with structures as the above), so in practice, unless you invalidate assumptions about expected maximum structure sizes made by the OS, null dereferences can be assumed to crash.
September 05, 2017
On Tuesday, September 05, 2017 18:32:34 Johan Engelen via Digitalmars-d wrote:
> On Monday, 4 September 2017 at 21:23:50 UTC, Moritz Maxeiner
>
> wrote:
> > On Monday, 4 September 2017 at 17:58:41 UTC, Johan Engelen
> >
> > wrote:
> >> (The spec requires crashing on null dereferencing, but this spec bit is ignored by DMD and LDC, I assume in GDC too. Crashing on `null` dereferencing requires a null-check on every dereferencing through an unchecked pointer, because 0 might be a valid memory access, and also because ptr->someDataField is not going to lookup address 0, but 0+offsetof(someDataField) instead, e.g. potentially addressing a valid low address at 1000000, say.)
> >
> > It's not implemented as compiler checks because the "actual" requirement is "the platform has to crash on null dereference" (see the discussion in/around [1]). Essentially: "if your platform doesn't crash on null dereference, don't use D on it (at the very least not @safe D)".
>
> My point was that that is not workable. The "null dereference" is a D language construct, not something that the machine is doing. It's ridiculous to specify that reading from address 1_000_000 should crash the program, yet that is exactly what is specified by D when running this code (and thus null checks need to be injected in many places to be spec compliant):
>
> ```
> struct S {
>    ubyte[1_000_000] a;
>    int b;
> }
> void main() {
>     S* s = null;
>     s.b = 1;
> }
> ```

dmd and the spec were written with the assumption that the CPU is going to segfault your program when you dereference a null pointer. In the vast majority of cases, that assumption holds. The problem of course is the case that you bring up where you're dealing with objects that are large enough that the CPU can't do that anymore. And as Moritz points out, all that's required to fix that is to insert null checks for those types. It shouldn't be necessary at all for the vast majority of types. The CPU already handles them correctly - at least on any x86-based system. I would expect any other modern CPU to do the same, but I'm not familiar enough with other such systems to know for sure. Regardless, there definitely should be no need to insert null checks all over the place in any x86-based code. At most, it's needed in a few places to deal with abnormally large objects.

Regardless, for @safe to do its job, the program does need to crash when dereferencing null. So, if the CPU can't do the checks like the spec currently assumes, then the compiler is going to need to insert the checks, and while that may hurt performance, I don't think that there's really any way around that while still ensuring that @safe code does not corrupt memory or access memory that it's not supposed to. @system code could skip it to get the full performance, but @safe is stuck.

- Jonathan M Davis

September 06, 2017
On Tuesday, 5 September 2017 at 15:46:13 UTC, Dukc wrote:
> [..]
>
> Of course, if we want to support this we should construct a high-level library template that chooses the correct vector size for the platform, eliminates that outer for loop and handles uneven array lenghts.

You mean like this: https://github.com/dlang/druntime/pull/1891?

September 06, 2017
On Monday, 4 September 2017 at 09:15:30 UTC, ag0aep6g wrote:
> On 09/04/2017 06:10 AM, Moritz Maxeiner wrote:
> That doesn't crash at the call site, but only when the callee accesses the parameter:

That's just an observation based on a detail of a particular compiler implementation. It's simply not true in general but might appear that way in a particular case. Did you inspect the generated code? If the entire thing has been _inlined_ and properly optimised as decent modern compilers most definitely all do _when the correct switches are used_, then looking at the code there is no such thing as caller and callee - it's all just a stream of code.


September 06, 2017
On Wednesday, 6 September 2017 at 09:21:59 UTC, Petar Kirov [ZombineDev] wrote:
> On Tuesday, 5 September 2017 at 15:46:13 UTC, Dukc wrote:
>> [..]
>>
>> Of course, if we want to support this we should construct a high-level library template that chooses the correct vector size for the platform, eliminates that outer for loop and handles uneven array lenghts.
>
> You mean like this: https://github.com/dlang/druntime/pull/1891?

No. I meant a function which, given an array, returns a range over that array which internally reads many elements at once from the array by copying them to a static array for handling. Then the compiler knows it can take advantage of optimizations like that pull request, because it knows static arrays can't overlap, even if the original arguments do.

Of course the user should not call that function if the arrays do overlap, or if the loop body mutates other elements.

See David Simcha's talk at DConf 13 at 37:30, that's the basic idea how I'm thinking the range would internally iterate.

https://www.youtube.com/watch?v=yMNMV9JlkcQ&list=PLpISZoFBH1xtyA6uBsNyQH8P3lx92U64V&index=16
September 06, 2017
On Tuesday, 5 September 2017 at 22:59:12 UTC, Jonathan M Davis wrote:
>
> dmd and the spec were written with the assumption that the CPU is going to segfault your program when you dereference a null pointer. In the vast majority of cases, that assumption holds.

In my terminology, "dereference" is a language spec term. It is not directly related to what the CPU is doing.
```
struct S {
  void nothing() {}
}

void foo (S* s) {
  (*s).nothing(); //dereference must crash on null?
}
```
If you call the `(*s)` a dereference, then you are agreeing with "my" dereference terminology. ( used the * for extra clarity; "s.nothing()" is the same.)

In LDC, dereferencing a null ptr is UB.
DMD is assuming the same, or something similar. Otherwise DMD wouldn't be able to optimize foo in this example to an empty body as it does currently.
(go go null-checks everywhere)

-Johan
September 06, 2017
On Wednesday, September 06, 2017 19:40:16 Johan Engelen via Digitalmars-d wrote:
> On Tuesday, 5 September 2017 at 22:59:12 UTC, Jonathan M Davis
>
> wrote:
> > dmd and the spec were written with the assumption that the CPU is going to segfault your program when you dereference a null pointer. In the vast majority of cases, that assumption holds.
>
> In my terminology, "dereference" is a language spec term. It is
> not directly related to what the CPU is doing.
> ```
> struct S {
>    void nothing() {}
> }
>
> void foo (S* s) {
>    (*s).nothing(); //dereference must crash on null?
> }
> ```
> If you call the `(*s)` a dereference, then you are agreeing with
> "my" dereference terminology. ( used the * for extra clarity;
> "s.nothing()" is the same.)
>
> In LDC, dereferencing a null ptr is UB.
> DMD is assuming the same, or something similar. Otherwise DMD
> wouldn't be able to optimize foo in this example to an empty body
> as it does currently.
> (go go null-checks everywhere)

I would argue that if the dereferencing of the pointer is optimized out, then it is never dereferenced, and therefore, it doesn't need to crash. It's only if it's actually dereferenced that the crashing needs to occur, because that's what's need for @safe to be @safe. I can totally believe that the spec needs to be clearer about this, but I would definitely interpret it to mean that if the pointer is actually dereferenced, the program must crash and not that your code example must crash even if it's optimized. And I would be surprised if Walter meant anything else. He just isn't always good about writing the spec in a way that others agree that it means what he meant, and to be fair, it can be very hard to write things in an unambiguous way. Regardless, I don't see a problem here - or a need to insert a bunch of null checks. The spec should probably be clarified, but the only thing that I'm aware of that I would consider a hole in dereferencing null right now is the fact that it relies on the CPU to segfault the program in cases where the object is too large for that to occur - and in those cases, null checks really should be inserted. For objects that are small enough to trigger segfaults with null, null checks should not be necessary.

- Jonathan M Davis

September 08, 2017
On Wednesday, 6 September 2017 at 17:30:44 UTC, Dukc wrote:
> See David Simcha's talk at DConf 13 at 37:30, that's the basic idea how I'm thinking the range would internally iterate.

Correction: The outer loop would iterate in steps like that but the body would be different. It would each time copy elements into static array of length unroll.length (which in this case would be width of vector operations), let the user iterate over that and then assign it back to the original array.
1 2 3
Next ›   Last »