Memory safe in D (page 6)

Settings

Help

Index » General » Memory safe in D (page 6)

March 13

Re: Memory safe in D

Posted by Alex
in reply to Ogi

Permalink

Alex

Posted in reply to Ogi

Permalink

On Wednesday, 13 March 2024 at 06:36:14 UTC, Ogi wrote:
> Division by zero also crashes the program but nobody makes a big deal out of it.

Guess division by zero is not so common as null pointer issues. Also, usually it leads just to some king of arithmetic exception.

March 13

Re: Memory safe in D

Posted by Richard (Rikki) Andrew Cattermole
in reply to Walter Bright

Permalink

Richard (Rikki) Andrew Cattermole

Posted in reply to Walter Bright

Permalink

On 13/03/2024 7:05 PM, Walter Bright wrote:
> BTW, doing data flow analysis is very expensive in terms of compiler run time. The optimizer does it, but running the optimizer is optional for that reason.

I want to see D become temporally safe, and that means DFA for @safe code.

The question is not if, but when at this point, we have to solve it, and in doing so define the literature otherwise we'll be left behind.

I'm certainly not ready for my type state analysis DIP to go into development just yet, but ideas shouldn't be too far behind.

March 13

Re: Memory safe in D

Posted by Petar Kirov [ZombineDev]
in reply to Walter Bright

Permalink

Petar Kirov [ZombineDev]

Posted in reply to Walter Bright

Permalink

On Wednesday, 13 March 2024 at 06:05:35 UTC, Walter Bright wrote:
> [..]
>
> Consider the following:
> ```
> class A { void bar(); }
>
> void foo(int i) {
>     A a;
>     if (i) a = new A();
>     ...
>     if (i) a.bar();
> }
> ```
> What happens if we apply data flow analysis to determine the state of `a` when it calls `bar()`? It will determine that `a` has the possible values (`null`, new A()`). Hence, it will give an error that `a` is possibly null at that point.
>
> Yet the code is correct, not buggy.
>
> Yes, the compiler could figure out that `i` is the same, but the conditions can be more complex such that the compiler cannot figure it out (the halting problem).
>
> So that doesn't work.
>
> We could lower `a.bar()` to `NullCheck(a).bar()` which throws an exception if `a` is null. But what have we gained there? Nothing. The program still aborts with an exception, just like if the hardware checked. Except we've got this manual check that costs extra code and CPU time.
>
> BTW, doing data flow analysis is very expensive in terms of compiler run time. The optimizer does it, but running the optimizer is optional for that reason.

Here's how TypeScript deals with this problem:

```ts
class A { bar() {} }

function foo(i: number) {
    let a: A;
    if (i) a = new A();

    if (i) a.bar(); // Error: Variable 'a' is used before being assigned.
}

function foo2(i: number) {
    let a: A | null = null;
    if (i) a = new A();

    if (i) a.bar(); // Error: 'a' is possibly 'null'
}

```


https://www.typescriptlang.org/docs/handbook/2/narrowing.html

March 13

Re: Memory safe in D

Posted by Petar Kirov [ZombineDev]
in reply to Walter Bright

Permalink

Petar Kirov [ZombineDev]

Posted in reply to Walter Bright

Permalink

On Wednesday, 13 March 2024 at 06:05:35 UTC, Walter Bright wrote:

[..]
Consider the following:

class A { void bar(); }

void foo(int i) {
    A a;
    if (i) a = new A();
    ...
    if (i) a.bar();
}

What happens if we apply data flow analysis to determine the state of a when it calls bar()? It will determine that a has the possible values (null, new A()). Hence, it will give an error that a` is possibly null at that point.

Here's how this situation is handled in TypeScript:

class A { bar() {} }

function foo(i: number) {
    let a: A;
    if (i) a = new A();

    if (i) a.bar(); // Error: Variable 'a' is used before being assigned.
}

function foo2(i: number) {
    let a: A | null = null;
    if (i) a = new A();

    if (i) a.bar(); // Error: 'a' is possibly 'null'
}


function bar(i: number) {
    let a: A;
    if (i) {
      a = new A();
      a.bar(); // No errors.
    }
}

function bar2(i: number) {
    let a: A | null = null;
    if (i) {
      a = new A(); // The type of `a` is `A | null`
      a.bar();     // The type of `a` is now `A`
    }
}

Yet the code is correct, not buggy.

Yes, the compiler could figure out that i is the same, but the conditions can be more complex such that the compiler cannot figure it out (the halting problem).

So that doesn't work.

I agree, however in my experience (I've been using TypeScript professionally since ~2019) it's not a problem for the developer to rewrite the code in a way that the compiler can understand. In this case - rewriting foo to bar. While your example was intentionally simple, in practice, restructuring the code so the compiler can understand it, often makes it more clear for the humans behind the screen as well.

We could lower a.bar() to NullCheck(a).bar() which throws an exception if a is null. But what have we gained there? Nothing. The program still aborts with an exception, just like if the hardware checked. Except we've got this manual check that costs extra code and CPU time.

I agree that simply letting the OS handle the segfault is sufficient for 98% of the use cases. For the other 2% (say writing code for kernels-mode or micro controllers without MMU), having a compiler flag to enable rewriting a.bar() to assert(a), a.bar() would be nice.

BTW, doing data flow analysis is very expensive in terms of compiler run time. The optimizer does it, but running the optimizer is optional for that reason.

C# uses control-flow analysis for definite assignment since its early days (I'm not sure if it was part of the first release, or if was added later). In my experience, C# has always been one of the faster languages in terms of compiler time.

I'd be very interested to hear what you have to say about their language specification on definite assignment:

That said, TypeScript takes this (colloquially known as flow typing) much further: https://www.typescriptlang.org/docs/handbook/2/narrowing.html.
It plays extremely pleasingly with their union types.

P.S. please disregard my previous message. I clicked "Send" by mistake.

March 13

Re: Memory safe in D

Posted by Steven Schveighoffer
in reply to Alex

Permalink

Steven Schveighoffer

Posted in reply to Alex

Permalink

Just responding in general to the discussion here:

Knowing 100% whether something is not null

This is a very difficult problem to solve. It can be automated to a degree, but I believe this is equivalent to the halting problem, so it's not solvable in P time.

However, many languages can prove cases of this using developer help (i.e. "unwrapping" a maybe-null value). This means that you have the drawback that your code has to be instrumented with null checks (written by you). And even if you do this, if it turns out the thing is null, you still have to handle it!

Null happens

When you have an object that is null, and it shouldn't be, the means by which that null happened are no longer important. You need to handle it somehow.

The path D takes is, let the hardware solve it.
The path Java takes is to instrument dereferences on possibly-null variables and throw an exception if it's null.
The path many other languages take is to force you to validate it's not null before using it.

These mechanisms all have their advantages and disadvantages.

D and Java have the advantage that you don't have to jump through hoops to prove things to the compiler. If a null pointer fault happens, it's because you have a bug in your code, but the language is your safety net here. It is still memory-safe in D since the hardware fault stops you from continuing execution, and any invalid pointers other than null should not be possible. In Java, it's obviously still memory safe.

For the other languages, you are forced to validate something is null or not null. This has the advantage that certain classes of bugs can be avoided at compile time, and in many cases, the code can be clearer where null pointers might exist. But the cost is that you may be forced to validate something that is obvious to the developer (but not the compiler). It adds to the tediousness of the language.

Relying on segfaults also has a further drawback that unless you happen to be running under a debugger, or have enabled core-dumps, you get no indication where in your program the issue happened. There is also no distinction between memory corruption and null pointer dereferencing. I think we should have a better outcome than this, we definitely can be more informative about what is happening.

The point I'm making is that it's not possible to ensure you never have to deal with null pointers. They happen. They even might happen in memory safe languages such as Rust. The difference is how you have to handle them. Handling them one way or another has benefits or drawbacks, but those are the tradeoffs you must make. It's important to note that in all these situations, the code is still memory safe.

-Steve

March 13

Re: Memory safe in D

Posted by Alex
in reply to Steven Schveighoffer

Permalink

Alex

Posted in reply to Steven Schveighoffer

Permalink

On Wednesday, 13 March 2024 at 18:20:12 UTC, Steven Schveighoffer wrote:

The path D takes is, let the hardware solve it.
The path Java takes is to instrument dereferences on possibly-null variables and throw an exception if it's null.
The path many other languages take is to force you to validate it's not null before using it.

Rust doesn't allow null references at all (exclude unsafe code). It is one more alternative path.

On the other side with this approach developer can choose between nullable type and non-nullable. If he choose nullable type he really have to do checks. But for non-nullable type he can work completely safety without any checks. As I know Kotlin encourages using non-nullable types wherever possible.

March 13

Re: Memory safe in D

Posted by Steven Schveighoffer
in reply to Alex

Permalink

Steven Schveighoffer

Posted in reply to Alex

Permalink

On Wednesday, 13 March 2024 at 19:36:01 UTC, Alex wrote:

On Wednesday, 13 March 2024 at 18:20:12 UTC, Steven Schveighoffer wrote:

The path D takes is, let the hardware solve it.
The path Java takes is to instrument dereferences on possibly-null variables and throw an exception if it's null.
The path many other languages take is to force you to validate it's not null before using it.

Rust doesn't allow null references at all (exclude unsafe code). It is one more alternative path.

Then "optional" types, basically the same thing. Null is a memory safe "invalid thing".

The thing I'm getting at is -- if you have something that doesn't exist, but is supposed to exist, then you have to deal with it. How you deal with it is where the tradeoffs come in.

The basic building block that all memory-safe language tools are built on is -- you should not be able to use invalid memory. null pointers which halt the program are a flavor of that.

> >

The checks have to come somewhere.

If you have a value that is of unknown validity, and you want to ensure it's valid, you need a check. We have different flavors of:

automated checks
how the checks are handled if failure occurs
checks that you are forced to perform
how long such checks are enforced by the compiler (e.g., flow analysis, building into the type the validity).

Building non-null into the type indeed means as long as you have that type, you don't have to check. But to get it into that type, if you started with a possibly-invalid value, somewhere you had to do a check.

Consider an array/vector of items in Rust. And an index. When you index that vector, the compiler has no idea what that index is. It must validate the index before dereferencing the element. This is a check, and the handling of it is defined by the language.

Having a possibly-null pointer is no different. D defines that in safe code, a pointer will be valid or null. The "check" occurs on use, and is performed by the hardware.

This is in stark contrast to having a possibly-invalid pointer to non-null memory (e.g. dangling or buffer overflow). Those should never occur.

-Steve

March 13

Re: Memory safe in D

Posted by Alex
in reply to Steven Schveighoffer

Permalink

Alex

Posted in reply to Steven Schveighoffer

Permalink

On Wednesday, 13 March 2024 at 19:58:24 UTC, Steven Schveighoffer wrote:

Then "optional" types, basically the same thing. Null is a memory safe "invalid thing".

The thing I'm getting at is -- if you have something that doesn't exist, but is supposed to exist, then you have to deal with it. How you deal with it is where the tradeoffs come in.

The basic building block that all memory-safe language tools are built on is -- you should not be able to use invalid memory. null pointers which halt the program are a flavor of that.

I think fundamental difference between optional type and nullable reference is that optional type is under control of compiler as type system of the language. But nullable refrence will "shoot" only in runtime and compiler can't help.

Or have done initialization of variable by a new instance (good practice in contrast with variables without explicitly initialization).

Looks like the array boundary checks is not possible at compilation time at all. But null references might be handled by compiler thanks to type system.

Having a possibly-null pointer is no different. D defines that in safe code, a pointer will be valid or null. The "check" occurs on use, and is performed by the hardware.

But the fundamental difference that null pointer checks is performed in D at runtime but in Kotlin (and other languages which support null safety) at compilation time.

March 13

Re: Memory safe in D

Posted by Alex
in reply to Alex

Permalink

Alex

Posted in reply to Alex

Permalink

On Wednesday, 13 March 2024 at 20:34:42 UTC, Alex wrote:

But the fundamental difference that null pointer checks is performed in D at runtime but in Kotlin (and other languages which support null safety) at compilation time.

I mean that Kotlin can validate code and points all places where null reference check must be performed (for nullable type). But null reference check itself will be performed in runtime.

March 13

Re: Memory safe in D

Posted by Jonathan M Davis
in reply to Alex

Permalink

Jonathan M Davis

Posted in reply to Alex

Permalink

On Wednesday, March 13, 2024 1:43:23 AM MDT Alex via Digitalmars-d wrote:
> On Wednesday, 13 March 2024 at 06:05:35 UTC, Walter Bright wrote:
> > Memory safety is not something we've uniquely defined. It's generally accepted that it specifically means no memory corruption. "Safe" programs can offer additional guarantees, like no race conditions.
>
> Yeah, race condition is the second headache after memory safety
> (include null pointer issues) :)
> I know only one language which give guarantee at compilation time
> about absence of race condition (but not deadlock). It is Rust.
> But D have `shared` keyword and as I understand it can provide
> such guarantee at compilation time for SafeD, right?

What shared is supposed to do is give an error if you attempt do anything with a shared variable that isn't guaranteed to be thread-safe - which basically means that it should give an error when you actually try to do much of anything with a shared variable. However, it's not fully implemented by default right now (it will currently give an error in some cases but not all). The -preview=nosharedaccess switch can be used to make accessing shared variables an error in general (like it's supposed to be), but it hasn't been enabled by default yet.

Ideally, the compiler would know when it was thread-safe to access a shared variable and implicitly remove shared within that code so that you could safely access the variable, but in practice, the compiler has no way of knowing that your code has done what's necessary to protect access to that variable, since that involves doing stuff like locking a mutex whenever that variable is accessed, and the language has no understanding of any of that (and it's not at all easy to give the language such an understanding except for in very simple cases).

So, what happens in practice is that the programmer has to lock the appropriate mutex, then temporarily cast away shared to operate on the variable, then make sure that no thread-safe references to the data exist any longer prior to releasing the mutex. So, you get code like

synchronized(mutex)
{
    int* local = cast()&sharedVar;
    *local = 42;
}

The result is that the code is @system, not @safe, and the programmer has to verify its correctness and mark it with @trusted for it to be useable by @safe code.

However, higher level objects can be written such that they have shared, @safe/@trusted member functions, and those member functions then take care of all of the locking and casting internally so that you can just use the type without directly dealing with the locking or casting.

But ultimately what shared is doing is segregating the code that deals with concurrency and making it so that you have to cast to actually do much of anything with it so that you can't shoot yourself in the foot with shared elsewhere. You have to outright tell the compiler that you want to take the risk. You then only have to examine certain sections of the code to make sure that the code that's actually interacting with shared data does so correctly (whereas in a language like C++, the type system doesn't help you with any of that).

So the way that you write thread-safe code in D is pretty similar to what you'd do in a language like C++ or Java, but shared makes it so that you know which data is shared and so that you can't accidentally access shared data in a manner which isn't thread-safe, whereas the type system really doesn't help you with any of that in C++ or Java.

- Jonathan M Davis

Top | Forum index | About this forum

Forums

Knowing 100% whether something is not null

Null happens