Bad array indexing is considered deadly (page 5) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Bad array indexing is considered deadly (page 5)

June 01, 2017

Re: Bad array indexing is considered deadly

Posted by Timon Gehr
in reply to Moritz Maxeiner

Timon Gehr

Posted in reply to Moritz Maxeiner

On 01.06.2017 01:13, Moritz Maxeiner wrote:
> On Wednesday, 31 May 2017 at 22:47:38 UTC, Steven Schveighoffer wrote:
>>
>> Again, there has not been memory corruption.
> 
> Again, the runtime *cannot* know that and hence you *cannot* claim that. It sees an index out of bounds and it *cannot* reason about whether a memory corruption has already occurred or not, which means it *must assume* the worst case (it must *assume* there was).
> 
>> There is a  confusion rampant in this thread that preventing *attempted* memory corruption must mean there *is* memory corruption.
> 
> No, please no. Nobody has written that in the entire thread even once!
> - An index being out of bounds is an error (lowercase!).
> - The runtime sees that error when the array is accessed (what you describe as *attemped* memory corruption.
> - The runtime does not know *why* the index is out of bounds
> It does *not* mean that there *was* memory corruption (and again, nobody claimed that), but the runtime cannot assume that there was not, because that is *unsafe*.
> ...

No, it is perfectly safe, because the language does not guarantee any specific behavior in case memory is corrupted. Therefore the language can /always/ assume that there is no memory corruption.

>> One  does not require the other.
> 
> Correct, but the runtime has to be safe in the *general* case, so it *must* assume the worst in case of a bug.

Software has bugs. The runtime has no business claiming that the scope of any particular bug is the entire service. The practical outcomes of this design are just silly. Data is lost, services go down, etc. When in doubt, the software should just do what the programmer has written. It is not always correct, but it is the best available proxy of the desirable behavior.

May 31, 2017

Re: Bad array indexing is considered deadly

Posted by Jonathan M Davis
in reply to Moritz Maxeiner

Jonathan M Davis

Posted in reply to Moritz Maxeiner

On Wednesday, May 31, 2017 23:13:35 Moritz Maxeiner via Digitalmars-d wrote:
> On Wednesday, 31 May 2017 at 22:47:38 UTC, Steven Schveighoffer wrote:
> > Again, there has not been memory corruption.
>
> Again, the runtime *cannot* know that and hence you *cannot* claim that. It sees an index out of bounds and it *cannot* reason about whether a memory corruption has already occurred or not, which means it *must assume* the worst case (it must *assume* there was).

Honestly, once a memory corruption has occurred, all bets are off anyway. The core thing here is that the contract of indexing arrays was violated, which is a bug. If we're going to argue about whether it makes sense to change that contract, then we have to discuss the consequences of doing so, and I really don't see why whether a memory corruption has occurred previously is relevant. We could easily treat indexing arrays the same as as any other function which chooses to throw an Exception when it's given bad input. The core difference is whether it's considered okay to give bad values or whether it's considered a programming bug to pass bad values. In either case, the runtime has no way of determining the reason for the failure, and I don't see why passing a bad value to index an array is any more indicative of a memory corruption than passing an invalid day of the month to std.datetime's Date when constructing it is indicative of a memory corruption. In both cases, the input is bad, and the runtime doesn't know why. It's just that in the array case, the API of arrays requires that the input be valid, whereas for Date, it's acceptable for bad input to be passed. So, while I can appreciate that you're trying to argue for us keeping RangeError (which I agree with), I think that this whole argument about possible, previous memory corruptions prior to the invalid index being passed is derailing things.

The issue ultimately is what the consequences are of using an Error vs an Exception, and _that_ is what we need to discuss.

- Jonathan M Davis

May 31, 2017

Re: Bad array indexing is considered deadly

Posted by Steven Schveighoffer
in reply to Moritz Maxeiner

Steven Schveighoffer

Posted in reply to Moritz Maxeiner

On 5/31/17 7:13 PM, Moritz Maxeiner wrote:
> On Wednesday, 31 May 2017 at 22:47:38 UTC, Steven Schveighoffer wrote:
>>
>> Again, there has not been memory corruption.
>
> Again, the runtime *cannot* know that and hence you *cannot* claim that.
> It sees an index out of bounds and it *cannot* reason about whether a
> memory corruption has already occurred or not, which means it *must
> assume* the worst case (it must *assume* there was).

Yes, it cannot know at any point whether or not a memory corruption has occurred. However, it has a lever to pull to say "your program cannot continue, and you have no choice." It chooses to pull this lever on any attempt of out of bounds access of an array, regardless of the reason why that is happening. The chances that a memory corruption is the cause is so low, and it doesn't matter even if it is. The program may already have messed up everything by that point. In fact, the current behavior of printing the Error message and doing an orderly shutdown is pretty risky anyway if we think this is a memory corruption.

There are almost no other environmentally caused errors that cause this lever to be pulled. It doesn't make a whole lot of sense that it is.

>
>> There is a  confusion rampant in this thread that preventing
>> *attempted* memory corruption must mean there *is* memory corruption.
>
> No, please no. Nobody has written that in the entire thread even once!

"you have to assume that the index *being* out of bounds is itself the *result* of *already occurred* data corruption;"

> - An index being out of bounds is an error (lowercase!).
> - The runtime sees that error when the array is accessed (what you
> describe as *attemped* memory corruption.
> - The runtime does not know *why* the index is out of bounds
> It does *not* mean that there *was* memory corruption (and again, nobody
> claimed that), but the runtime cannot assume that there was not, because
> that is *unsafe*.

It's not the runtime's job to determine that the cause of an out-of-bounds access could be memory corruption. It's job is to prevent the current attempt. Throwing an Error accomplishes this, yes, but it also means you must shut down the program. I have no problem at all with it preventing the corruption, nor do I have a problem with it throwing an Error, per se. The problem I have is that throwing an Error itself corrupts the program, and makes it unusable. Therefore, it's the wrong tool for that job.

And I absolutely do not think that throwing an Error in this case was the result of a careful choice deciding that memory corruption must be or even might be the cause. I think it's this way because of the desire to write nothrow code without having to pepper your code with try/catch blocks.

>
>> One  does not require the other.
>
> Correct, but the runtime has to be safe in the *general* case, so it
> *must* assume the worst in case of a bug.

It's easy to prove as well that throwing an Exception instead of an Error is perfectly safe. My array wrapper is perfectly safe and does not throw an Error on bad indexing.

-Steve

May 31, 2017

Re: Bad array indexing is considered deadly

Posted by Moritz Maxeiner
in reply to Timon Gehr

Moritz Maxeiner

Posted in reply to Timon Gehr

On Wednesday, 31 May 2017 at 23:40:00 UTC, Timon Gehr wrote:
>> 
>> In the context of the conversation, and error has already occurred and the all cases was referring to all the cases that lead to the error.
> Bounds checks have /no business at all/ trying to handle preexisting memory corruption,

Sure, because the program is in an undefined state by that point. There is only termination.

> and in that sense they are comparable to program startup.

I disagree.

June 01, 2017

Re: Bad array indexing is considered deadly

Posted by Moritz Maxeiner
in reply to Timon Gehr

Moritz Maxeiner

Posted in reply to Timon Gehr

On Wednesday, 31 May 2017 at 23:50:07 UTC, Timon Gehr wrote:
>
> No, it is perfectly safe, because the language does not guarantee any specific behavior in case memory is corrupted.

The language not guaranteeing a specific behaviour on memory corruption does not imply that assuming a bug was not caused by memory corruption is safe.

> Therefore the language can /always/ assume that there is no memory corruption.

That is also not implied.

>
>>> One  does not require the other.
>> 
>> Correct, but the runtime has to be safe in the *general* case, so it *must* assume the worst in case of a bug.
>
> Software has bugs. The runtime has no business claiming that the scope of any particular bug is the entire service.

It absolutely has the business of doing exactly that as long as you, the programmer, do not tell it otherwise; which you can do and is your job.

> The practical outcomes of this design are just silly. Data is lost, services go down, etc. When in doubt, the software should just do what the programmer has written. It is not always correct, but it is the best available proxy of the desirable behavior.

When in doubt about memory corruption, the closest enclosing scope that will get rid of the memory corruption must die. The current behaviour achieves that in many cases.

June 01, 2017

Re: Bad array indexing is considered deadly

Posted by Timon Gehr
in reply to Moritz Maxeiner

Timon Gehr

Posted in reply to Moritz Maxeiner

On 01.06.2017 01:55, Moritz Maxeiner wrote:
> On Wednesday, 31 May 2017 at 23:40:00 UTC, Timon Gehr wrote:
>>>
>>> In the context of the conversation, and error has already occurred and the all cases was referring to all the cases that lead to the error.
>> Bounds checks have /no business at all/ trying to handle preexisting memory corruption,
> 
> Sure, because the program is in an undefined state by that point.

What does that even mean? Everything is perfectly well-defined here:

void main(){
    auto a = new int[](2);
    a[2] = 3;
}

> There is only termination.
> ...

Termination of what? How on earth do you determine that the scope of this "undefined state" is the program, not the machine, or the world? I.e., why terminate the program, but not shut down the machine or nuke the planet?

Scoping really ought to be up to the programmer as it greatly depends on the actual circumstances. Program termination is the only reasonable default behaviour, but it is not the only reasonable behaviour.

June 01, 2017

Re: Bad array indexing is considered deadly

Posted by Moritz Maxeiner
in reply to Jonathan M Davis

Moritz Maxeiner

Posted in reply to Jonathan M Davis

On Wednesday, 31 May 2017 at 23:51:30 UTC, Jonathan M Davis wrote:
> On Wednesday, May 31, 2017 23:13:35 Moritz Maxeiner via Digitalmars-d wrote:
>> On Wednesday, 31 May 2017 at 22:47:38 UTC, Steven Schveighoffer wrote:
>> > Again, there has not been memory corruption.
>>
>> Again, the runtime *cannot* know that and hence you *cannot* claim that. It sees an index out of bounds and it *cannot* reason about whether a memory corruption has already occurred or not, which means it *must assume* the worst case (it must *assume* there was).
>
> Honestly, once a memory corruption has occurred, all bets are off anyway.

Right, and that is why termination when in doubt (and the programmer has not done anything to clear that doubt up) is the sane choice.

> The core thing here is that the contract of indexing arrays was violated, which is a bug.

I disagree about it being the core issue, because that was already established in the OP.

> If we're going to argue about whether it makes sense to change that contract, then we have to discuss the consequences of doing so, and I really don't see why whether a memory corruption has occurred previously is relevant.

Because if such a memory corruption occurred, termination of the closest enclosing scope to get rid of it must follow (or your entire system can end up corrupted).

> We could easily treat indexing arrays the same as as any other function which chooses to throw an Exception when it's given bad input. The core difference is whether it's considered okay to give bad values or whether it's considered a programming bug to pass bad values. In either case, the runtime has no way of determining the reason for the failure, and I don't see why passing a bad value to index an array is any more indicative of a memory corruption than passing an invalid day of the month to std.datetime's Date when constructing it is indicative of a memory corruption. In both cases, the input is bad, and the runtime doesn't know why.

One of those is a library construct, the other is baked into the language; it is perfectly fine for the former to use exceptions, because it can be easily avoided by anyone; the latter is a required component of pretty much everything you can build with D and must thus use the stricter contract.

> The issue ultimately is what the consequences are of using an Error vs an Exception, and _that_ is what we need to discuss.

An Exception leads to unwinding&cleanup, an Error to termination (with unwinding&cleanup in debug mode for debugging purposes). What would you like to discuss here?

June 01, 2017

Re: Bad array indexing is considered deadly

Posted by John Colvin
in reply to Steven Schveighoffer

John Colvin

Posted in reply to Steven Schveighoffer

On Wednesday, 31 May 2017 at 13:04:52 UTC, Steven Schveighoffer wrote:
> I have discovered an annoyance in using vibe.d instead of another web framework. Simple errors in indexing crash the entire application.
>
> For example:
>
> int[3] arr;
> arr[3] = 5;
>
> Compare this to, let's say, a malformed unicode string (exception), malformed JSON data (exception), file not found (exception), etc.
>
> Technically this is a programming error, and a bug. But memory hasn't actually been corrupted. The system properly stopped me from corrupting memory. But my reward is that even though this fiber threw an Error, and I get an error message in the log showing me the bug, the web server itself is now out of commission. No other pages can be served. This is like the equivalent of having a guard rail on a road not only stop you from going off the cliff but proactively disable your car afterwards to prevent you from more harm.
>
> This seems like a large penalty for "almost" corrupting memory. No other web framework I've used crashes the entire web server for such a simple programming error. And vibe.d has no choice. There is no guarantee the stack is properly unwound, so it has to accept the characterization of this is a program-ending error by the D runtime.
>
> I am considering writing a set of array wrappers that throw exceptions when trying to access out of bounds elements. This comes with its own set of problems, but at least the web server should continue to run.
>
> What are your thoughts? Have you run into this? If so, how did you solve it?
>
> -Steve

What things are considered unrecoverable errors or not is debatable, but in the end I think the whole things can be seen from the perspective of a fundamental problem of systems where multiple operations must be able to progress successfully* independently of each other. All operations (a.k.a. processes, fibers, or function calls within fibers, or whatever granularity you choose) that modify shared state (could be external to the fiber, the thread, the process, the machine, could be "real-world") must somehow maintain some consistency with other operations that come before, are interleaved, simultaneous or after.

The way I see it is that you have two choices: reason more explicitly about the relationship between different operations and carefully catch only the mishaps that you know (or are prepared to risk) don't ruin the consistent picture between operations OR remove the need for consistency. A lot of the latter makes the former easier.

IIRC this is what deadalnix has talked about as one of the big wins of php in practice, the separation of state between requests means that things can mess up locally without having to worry about wider consequences except in the specific cases where things are shared; I.e. the set of things that must be maintained consistent are opt-in, as opposed to opt-out in care-free use of the vibe-d model.

* "progress successfully" is itself a tricky idea.

P.S. Sometimes I do feel D is a bit eager on the self-destruct switch, but I think the solution is to rise to the challenge of making better software, not to be more blasé about pretending to know how to recover from unknown logic errors (exposed by unexpected input).

June 01, 2017

Re: Bad array indexing is considered deadly

Posted by Moritz Maxeiner
in reply to Steven Schveighoffer

Moritz Maxeiner

Posted in reply to Steven Schveighoffer

On Wednesday, 31 May 2017 at 23:53:11 UTC, Steven Schveighoffer wrote:
> On 5/31/17 7:13 PM, Moritz Maxeiner wrote:
>> On Wednesday, 31 May 2017 at 22:47:38 UTC, Steven Schveighoffer wrote:
>>>
>>> Again, there has not been memory corruption.
>>
>> Again, the runtime *cannot* know that and hence you *cannot* claim that.
>> It sees an index out of bounds and it *cannot* reason about whether a
>> memory corruption has already occurred or not, which means it *must
>> assume* the worst case (it must *assume* there was).
>
> Yes, it cannot know at any point whether or not a memory corruption has occurred. However, it has a lever to pull to say "your program cannot continue, and you have no choice." It chooses to pull this lever on any attempt of out of bounds access of an array, regardless of the reason why that is happening.

Because assuming the worst is a sane default.

> The chances that a memory corruption is the cause is so low, and it doesn't matter even if it is. The program may already have messed up everything by that point.

True, it might have already corrupted other things; but that is no argument for allowing it to continue to potentially corrupt even more.

> In fact, the  current behavior of printing the Error message and doing an orderly shutdown is pretty risky anyway if we think this is a memory corruption.

AFAIK the orderly shutdown is not guaranteed to be done in release mode and I would welcome for thrown errors in release mode to simply kill the process immediately.

>
>>
>>> There is a  confusion rampant in this thread that preventing
>>> *attempted* memory corruption must mean there *is* memory corruption.
>>
>> No, please no. Nobody has written that in the entire thread even once!
>
> "you have to assume that the index *being* out of bounds is itself the *result* of *already occurred* data corruption;"

Yes, precisely.
I state: "you have to assume that the index *being* out of bounds is itself the *result* of *already occurred* data corruption;"
You state: "that preventing *attempted* memory corruption must mean there *is* memory corruption"

You state that I claim the memory corruption must definitely have occurred, while in contrast I state that one has to *assume* that is has occurred. *Not* the same.

>
> It's not the runtime's job to determine that the cause of an out-of-bounds access could be memory corruption.

That was the job of whoever wrote the runtime, yes.

> It's job is to  prevent the current attempt.

That is one of its jobs. The other is to terminate when it detects potential memory corruptions the programmer has not ensured are not.

> The problem I have is that throwing an Error itself corrupts the program, and makes it unusable.

Because the programmer has not done the steps to ensure the runtime that memory has not been corrupted, that is the only sane choice I see.

> It's easy to prove as well that throwing an Exception instead of an Error is perfectly safe. My array wrapper is perfectly safe and does not throw an Error on bad indexing.

And anyone using wrapper implicitly promises that a wrong index cannot be the result of memory corruption, which can definitely be a sane choice for a lot of use cases, but not as the default for the basic building block in the language.

June 01, 2017

Re: Bad array indexing is considered deadly

Posted by Moritz Maxeiner
in reply to Timon Gehr

Moritz Maxeiner

Posted in reply to Timon Gehr

On Thursday, 1 June 2017 at 00:11:10 UTC, Timon Gehr wrote:
> On 01.06.2017 01:55, Moritz Maxeiner wrote:
>> On Wednesday, 31 May 2017 at 23:40:00 UTC, Timon Gehr wrote:
>>>>
>>>> In the context of the conversation, and error has already occurred and the all cases was referring to all the cases that lead to the error.
>>> Bounds checks have /no business at all/ trying to handle preexisting memory corruption,
>> 
>> Sure, because the program is in an undefined state by that point.
>
> What does that even mean?

That once memory corruption has occurred the state of the program is not well defined anymore.

> Everything is perfectly well-defined here:
>
> void main(){
>     auto a = new int[](2);
>     a[2] = 3;
> }

Sure, because there has been no memory corruption prior to the index out of bounds.
That is not something the runtime should just assume for every out of index error.

>
>> There is only termination.
>> ...
>
>
> Termination of what? How on earth do you determine that the scope of this "undefined state" is the program, not the machine, or the world?

As that is the closest scope current operating systems give us to work with, this is a sane default for the runtime. Nobody stops you from using a different scope if you need it.

> I.e., why terminate the program, but not shut down the machine or nuke the planet?
>
> Scoping really ought to be up to the programmer as it greatly depends on the actual circumstances.

Of course, and if you need something else you can do so.

> Program termination is the only reasonable default behaviour, but it is not the only reasonable behaviour.

Absolutely; rereading through our subthread I realized that I had not made that explicit here (only in other subthreads). I apologize for being imprecise.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation