Jump to page: 1 2
Thread overview
[Issue 18016] using uninitialized value is considered @safe but has undefined behavior
Nov 27, 2017
RazvanN
Mar 04, 2018
Walter Bright
Mar 04, 2018
Walter Bright
Jun 05, 2019
Manu
Jun 05, 2019
Manu
Jun 07, 2019
anonymous4
Jun 07, 2019
Manu
Jun 11, 2019
ag0aep6g
Jun 11, 2019
Manu
Jun 11, 2019
anonymous4
Jun 11, 2019
ag0aep6g
Dec 17, 2022
Iain Buclaw
Jan 05, 2023
ag0aep6g
November 27, 2017
https://issues.dlang.org/show_bug.cgi?id=18016

RazvanN <razvan.nitu1305@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |razvan.nitu1305@gmail.com

--- Comment #1 from RazvanN <razvan.nitu1305@gmail.com> ---
How about letting void initialization be acceptable in @safe code only if the value is initialized before being used? In the example from the bug report, the code would error since x is returned before being initialized. But this should be acceptable @safe code:

int f() @safe
{
    int x = void;
    do_some_work();
    x = 7;
    return x;
}

That would imply an AST walker for the current scope to see if x is initialized anywhere.

--
November 27, 2017
https://issues.dlang.org/show_bug.cgi?id=18016

Steven Schveighoffer <schveiguy@yahoo.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |schveiguy@yahoo.com

--- Comment #2 from Steven Schveighoffer <schveiguy@yahoo.com> ---
I personally thought this was not required for memory safety -- since @safe is not allowed to break the type system, having data that is garbage isn't going to corrupt memory, as long as it's not a reference.

Note that =void is not allowed in @safe already for reference types (even though the spec doesn't outline that rule).

I'd want Walter's opinion on this. I thought @safe was specifically for memory safety, and not preventing all undefined behavior. But the way the spec is currently written, void initialization should be disallowed. My vote would be to relax the undefined behavior of =void for value types (for reference types or types that contain references, keep it UB).

(In reply to RazvanN from comment #1)
> How about letting void initialization be acceptable in @safe code only if the value is initialized before being used?

I'm not sure any of the rules for @safe functions require such checking. Not only that, but I'm not sure it's completely solvable. That could result in cases where it's clear from reading the code that the function initializes, but it's something the compiler can't tell.

It's much more straightforward to disallow it. I think the cost of initialization is so low, that we aren't going to affect code that much. Disallowing =void for value types in safe code would be my second choice.

--
November 27, 2017
https://issues.dlang.org/show_bug.cgi?id=18016

uplink.coder@googlemail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |uplink.coder@googlemail.com

--- Comment #3 from uplink.coder@googlemail.com ---
Checking if a void values escapes a function is as difficult as the scope stuff.


void initialized escapes should certainly be disallowed in @safe code.

--
March 04, 2018
https://issues.dlang.org/show_bug.cgi?id=18016

Walter Bright <bugzilla@digitalmars.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bugzilla@digitalmars.com

--- Comment #4 from Walter Bright <bugzilla@digitalmars.com> ---
(In reply to Steven Schveighoffer from comment #2)
> I'd want Walter's opinion on this.

I agree with you. The spec should be fixed to continue to disallow void initialization for reference types, and say that void initialization for value types is implementation defined, not undefined.

--
March 04, 2018
https://issues.dlang.org/show_bug.cgi?id=18016

--- Comment #5 from Walter Bright <bugzilla@digitalmars.com> ---
https://github.com/dlang/dlang.org/pull/2260

--
June 05, 2019
https://issues.dlang.org/show_bug.cgi?id=18016

Manu <turkeyman@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |turkeyman@gmail.com

--- Comment #6 from Manu <turkeyman@gmail.com> ---
I think that's the wrong solution.
Allowing interaction with invalid memory is antithetical to @safe. Why would
you want @safe to allow this?

--
June 05, 2019
https://issues.dlang.org/show_bug.cgi?id=18016

--- Comment #7 from Steven Schveighoffer <schveiguy@yahoo.com> ---
It's garbage data, but it's not garbage pointers. As long as the memory is not used to reference anything, it's not going to cause a memory corruption to use it.

Why would you want to use this? Because it's more efficient to not initialize stack data before overwriting it with the real value.

Can you explain a way that f() is unsafe in the example above? That is, it results in corrupted memory? Or alternatively, show how you can write code that is exploitable or could cause memory corruption?

Would you consider this function @safe?

int[] allocate(int size)
{
   auto result = cast(int *)malloc(size * int.sizeof);
   return result[0 .. size];
}

It doesn't corrupt any memory, the data is not left dangling, as it's not freed, but it's also not initialized. Is that a big problem?

--
June 05, 2019
https://issues.dlang.org/show_bug.cgi?id=18016

--- Comment #8 from Manu <turkeyman@gmail.com> ---
(In reply to Steven Schveighoffer from comment #7)
> It's garbage data, but it's not garbage pointers. As long as the memory is not used to reference anything, it's not going to cause a memory corruption to use it.

You can't know what the memory is going to be used for. You would need
astonishingly competent flow-analysis to make judgements of that kind.
It could be given as an argument to any operation that references something,
perhaps as an offset, or any conceivable thing could be done with that data,
and it's 100% guaranteed to be a rubbish operation.

It's a varifiably rubbish value, how could that inject valid program flow into any usage context?

> Why would you want to use this? Because it's more efficient to not initialize stack data before overwriting it with the real value.

Right, but it requires very special-case handling, and it's error-prone; for
instance, you might think you can simply:
  T x = void;
  x = T();

For some subset of possible T's that might be fine, but then some T arrives
with elaborate assignment semantics and it's a spectacular crash. It would be
easy for that to slip through the cracks, or not demonstrate issue on the lib
author's projects, but then a customer exposes the issue.
@trusted should be a locator for dangerous code, exactly like such an
assignment above. That code above is absolutely not @safe, it's making
assumptions way outside the language semantics, anything could happen if you're
not careful.

D has many semantics when operating on objects that make the basic assumption that objects are *valid*. `init` exists for this reason. Every semantic that assumes a valid object is violated by `= void`, which makes every such operation `risky` at best.

> Can you explain a way that f() is unsafe in the example above?

f() is potentially @safe, assuming that `x` is a type without elaborate
assignment (it is `int` above), but it depends on the compiler having powerful
flow analysis to determine those facts.
So it *could* be @safe, but I don't think DMD has the technology required to
prove that at this time?

> That is, it
> results in corrupted memory? Or alternatively, show how you can write code
> that is exploitable or could cause memory corruption?

Exposing uninitialised memory is a data leak at best. Many forms of exploit take advantage of leaking private or inaccessible data, but typically it can be used to source or craft values that lead to unexpected or otherwise invalid program flow or improper array offsets.

> Would you consider this function @safe?
> 
> int[] allocate(int size)
> {
>    auto result = cast(int *)malloc(size * int.sizeof);
>    return result[0 .. size];
> }
> 
> It doesn't corrupt any memory, the data is not left dangling, as it's not freed, but it's also not initialized. Is that a big problem?

malloc's not @safe (it's not even D), neither is dynamically slicing a pointer,
and the memory is uninitialised. This function is certainly not @safe.
@system functions are perfectly fine. There's nothing wrong with @system code,
it's just up to the caller to confirm a valid interaction.

--
June 06, 2019
https://issues.dlang.org/show_bug.cgi?id=18016

--- Comment #9 from Steven Schveighoffer <schveiguy@yahoo.com> ---
I'll start by saying, I think the operation is @safe, but I'm not sure it's necessary for @safe code. You can indeed escape @safe with @trusted, so there are likely ways around this. The easiest to prove route here is that we just disable void initialization, and tough, you just have to deal with that.

(In reply to Manu from comment #8)
> (In reply to Steven Schveighoffer from comment #7)
> > It's garbage data, but it's not garbage pointers. As long as the memory is not used to reference anything, it's not going to cause a memory corruption to use it.
> 
> You can't know what the memory is going to be used for.> You would need
> astonishingly competent flow-analysis to make judgements of that kind.
> It could be given as an argument to any operation that references something,
> perhaps as an offset, or any conceivable thing could be done with that data,
> and it's 100% guaranteed to be a rubbish operation.

The garbage offset argument is already handled, @safe code will throw an error if you try to escape the bounds of an array.

> It's a varifiably rubbish value, how could that inject valid program flow into any usage context?

Only if written incorrectly. The above certainly is useless as is. It's clearly not something you would want to have in your code. But the problem @safe is trying to prevent is corrupting memory. Tailoring @safe to be as narrow as possible allows more leeway in programs that do not corrupt memory. You should be able to say, if you see the @safe tag, this will NOT corrupt memory.

> > Why would you want to use this? Because it's more efficient to not initialize stack data before overwriting it with the real value.
> 
> Right, but it requires very special-case handling, and it's error-prone; for
> instance, you might think you can simply:
>   T x = void;
>   x = T();
> 
> For some subset of possible T's that might be fine, but then some T arrives with elaborate assignment semantics and it's a spectacular crash.

Spectacular crashes can happen in @safe code. This is @safe:

int* foo;
*foo = 1; // crash

However, this raises a good point that =void overrides the expectations of the type itself. If it's expected the type is at least default initialized, setting it to garbage originally can possibly have safety problems, if there is any @trusted code inside the type itself.

We could potentially limit =void to POD types that contain no references.

> > Can you explain a way that f() is unsafe in the example above?
> 
> f() is potentially @safe, assuming that `x` is a type without elaborate
> assignment (it is `int` above), but it depends on the compiler having
> powerful flow analysis to determine those facts.
> So it *could* be @safe, but I don't think DMD has the technology required to
> prove that at this time?

The compiler knows the type of an item, it can determine whether it has elaborate assignment without powerful flow analysis. We just "is it OK to =void this type?". We already have it for pointers, maybe we also need it for types that have member functions, or elaborate assignment, or something that determines it's possible to exploit this for memory corruption.

> Exposing uninitialised memory is a data leak at best. Many forms of exploit take advantage of leaking private or inaccessible data, but typically it can be used to source or craft values that lead to unexpected or otherwise invalid program flow or improper array offsets.

So it comes down to the questions: is it @safe's charter to prevent such things? and can @safe actually guarantee such things don't happen?

> > Would you consider this function @safe?
> > 
> > int[] allocate(int size)
> > {
> >    auto result = cast(int *)malloc(size * int.sizeof);
> >    return result[0 .. size];
> > }
> > 
> > It doesn't corrupt any memory, the data is not left dangling, as it's not freed, but it's also not initialized. Is that a big problem?
> 
> malloc's not @safe (it's not even D), neither is dynamically slicing a pointer, and the memory is uninitialised. This function is certainly not @safe.

I didn't express what I wanted clearly enough. What I meant was, would you consider calling this function to be a safe call? Personally, I would have no problem marking this @trusted, even though none of the integers are initialized.

To give you an idea, this is allowed currently in d: https://github.com/dlang/phobos/blob/c5664d4436235cba2606103f8729341ac79a4487/std/array.d#L811-L825

--
June 07, 2019
https://issues.dlang.org/show_bug.cgi?id=18016

--- Comment #10 from anonymous4 <dfj1esp02@sneakemail.com> ---
AFAIK, Walter's suggestion is not supported by LLVM. Currently LLVM removes code that uses uninitialized value. To work it around LDC will need to initialize variables initialized with void and provide an different way to declare uninitialized variables. Likely not a problem, but results in minor fragmentation of language. I believe LDC way will have a priority, because DMD is not really about performance anyway, so default initialized variables for it are good enough.

--
« First   ‹ Prev
1 2