Bad array indexing is considered deadly (page 4) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Bad array indexing is considered deadly (page 4)

May 31, 2017

Re: Bad array indexing is considered deadly

Posted by Ali Çehreli
in reply to Ola Fosheim Grøstad

Ali Çehreli

Posted in reply to Ola Fosheim Grøstad

On 05/31/2017 02:41 PM, Ola Fosheim Grøstad wrote:
> On Wednesday, 31 May 2017 at 21:30:05 UTC, Ali Çehreli wrote:
>> How could an Exception work in this case? Catch it and repeat the same
>> bug over and over again? What would the program be achieving? (I
>> assume the exception handler will not arbitrarily decrease index values.)
>
> How is this different from a file system exception?
> The file system is memory too...

When you say "memory" I think you refer to the thought of bounds checking being for prevention of memory corruption. True, memory corruption can happen when the program writes out of bounds but it's one special case. The actual reason for bounds checking is maintaining an invariant.

Regarding the file system, because it's part of the environment of the program, hence the program cannot control, it's correct to throw an Exception, in which case the response can be "Cannot open that file; how about another one?".

In the case of array indexes, they are in complete control of the program, hence a bug when out of bounds. It's not possible to say "Bad index; let me try 42 less."

Ali

May 31, 2017

Re: Bad array indexing is considered deadly

Posted by Moritz Maxeiner
in reply to Timon Gehr

Moritz Maxeiner

Posted in reply to Timon Gehr

On Wednesday, 31 May 2017 at 21:29:53 UTC, Timon Gehr wrote:
> On 31.05.2017 22:45, Moritz Maxeiner wrote:
>> On Wednesday, 31 May 2017 at 20:09:16 UTC, Nick Sabalausky (Abscissa) wrote:
>>> [...]
>>>> program is in an undefined state and should terminate asap.
>>>
>>> Then out-of-bounds and assert failures should be Exception not Error. Frankly, even out-of-memory, arguably. And then there's null dereference... In other words, basically everything.
>> 
>> No, because as I stated in my other post, the runtime *cannot* assume that it is safe *in all cases*. If there is even one single case in which it is unsafe, it must abort.
>
> Hence all programs must abort on startup.

In the context of the conversation, and error has already occurred and the all cases was referring to all the cases that lead to the error.

May 31, 2017

Re: Bad array indexing is considered deadly

Posted by Moritz Maxeiner
in reply to H. S. Teoh

Moritz Maxeiner

Posted in reply to H. S. Teoh

On Wednesday, 31 May 2017 at 21:30:47 UTC, H. S. Teoh wrote:
> On Wed, May 31, 2017 at 11:29:53PM +0200, Timon Gehr via Digitalmars-d wrote:
>> On 31.05.2017 22:45, Moritz Maxeiner wrote:
> [...]
>> > No, because as I stated in my other post, the runtime *cannot* assume that it is safe *in all cases*. If there is even one single case in which it is unsafe, it must abort.
>> 
>> Hence all programs must abort on startup.
>
> If D had *true* garbage collection, it would have done this upon starting up any buggy program. :-D
>

I think vigil will be a perfect fit for you[1] ;p

[1] https://github.com/munificent/vigil

May 31, 2017

Re: Bad array indexing is considered deadly

Posted by Moritz Maxeiner
in reply to H. S. Teoh

Moritz Maxeiner

Posted in reply to H. S. Teoh

On Wednesday, 31 May 2017 at 21:45:51 UTC, H. S. Teoh wrote:
> This is an interesting use case, because conceptually speaking, each vibe.d fibre actually represents an independent computation, so any fatal errors like out-of-bounds bugs should cause the termination of the *fibre*, rather than *everything* that just happens to be running in the same process.

While I agree on a theoretical level about the fact that in principal only the fibre (and the same argument goes for threads) should terminate, the problem is that fibres, as well as threads, share the same virtual memory of a process, i.e. memory corruption in one fibre (or thread) cannot in general be safely contained and kept from spreading to the other fibres (or threads; except in the thread case one might argue if you know the memory corruption to have happened only in TLS then you can kill the thread, but I don't know how you would prove that).
If you cannot be sure that the memory corruption is contained in a scope (i.e. a fibre or thread), you must terminate at the closest enclosing scope that you know will keep the error from escaping further outward to the rest of your system; AFAIK in modern operating system the closest such scope is a process.

May 31, 2017

Re: Bad array indexing is considered deadly

Posted by Jonathan M Davis
in reply to Moritz Maxeiner

Jonathan M Davis

Posted in reply to Moritz Maxeiner

On Wednesday, May 31, 2017 19:17:16 Moritz Maxeiner via Digitalmars-d wrote:
> On Wednesday, 31 May 2017 at 13:04:52 UTC, Steven Schveighoffer
>
> wrote:
> > [...]
> >
> > What are your thoughts? Have you run into this? If so, how did you solve it?
>
> It is not that accessing the array out of bounds *leading* to data corruption that is the issue here, but that in general you have to assume that the index *being* out of bounds is itself the *result* of *already occurred* data corruption; and if data corruption occurred for the index, you *cannot* assume that *only* the index has been affected. The runtime cannot simply assume the index being out of bounds is not the result of already occurred data corruption, because that is inherently unsafe, so it *must* terminate asap as the default.
>
> If you get the index as the input to your process - and thus *know* that it being out of bounds is not the result of previous data corruption - then you should check this yourself before accessing the array and handle it appropriately (e.g. via Exception).

I don't think that you even need to worry about whether memory corruption occurred prior to indexing the array with an invalid index. The fact that the array was indexed with an invalid index is a bug. What caused the bug depends entirely on the code. Whether it's a memory corruption or something else is irrelevant. The contract of indexing arrays is that only valid indices be passed. If an invalid index has been passed, then the contract has been violated, and by definition, there's a bug in the program, so the runtime has no choice but to throw an Error or otherwise kill the program. Given the contract, the only alternative would be to use assertions and only check when not compiling with -release, but that would be a serious problem for @safe code, and it really wouldn't help Steven's situation. Either way, the contract of indexing arrays is such that passing an invalid index is a bug, and no program should be doing it. The reason that the index is invalid is pretty much irrelevant to the discussion. It's a bug regardless.

We _could_ make it so that the contract of indexing arrays is such that you're allowed to pass invalid values, but then the runtime would _always_ have to check the indices (even in @system code), and arrays in general could never be used in code that was nothrow without a bunch of extra try-catch blocks. It would be like how auto-decoding and UTFException screws over our ability to have nothrow code with strings, only it would be for _all_ arrays. So, the result would be annoying for a lot of code as well as less efficient.

The vast majority of array code is written in a way that invalid indices are simple never used, and having it so that indexing an array could throw an Exception would cause serious problems for a lot of code - especially when the code is already written in a way that such an exception will never be thrown (similar to how format can't be nothrow even when you know you've passed the correct arguments, and it will never throw).

As such, it really doesn't make sense to force all programs to deal with arrays throwing Exceptions due to bad indices. If a program can't guarantee that it's going to be passing a valid index to an array, then it needs to validate the index first. And if that needs to be done frequently, it makes a lot of sense to either create a wrapper function for indexing arrays which does the check or to outright wrap arrays such that opIndex on that type does the check and throws an Exception before the invalid index is passed to the array. And if the wrapper function is @trusted, it _should_ make it so that druntime doesn't check the index, avoiding having redundant checks.

I can understand Steven's frustration, but I really think that we're better off the way it is now, even if it's not ideal for his current use case.

- Jonathan M Davis

May 31, 2017

Re: Bad array indexing is considered deadly

Posted by Steven Schveighoffer
in reply to Ali Çehreli

Steven Schveighoffer

Posted in reply to Ali Çehreli

On 5/31/17 5:30 PM, Ali Çehreli wrote:
> On 05/31/2017 02:00 PM, Steven Schveighoffer wrote:
>> On 5/31/17 3:17 PM, Moritz Maxeiner wrote:
>
>>> It is not that accessing the array out of bounds *leading* to data
>>> corruption that is the issue here, but that in general you have to
>>> assume that the index *being* out of bounds is itself the *result* of
>>> *already occurred* data corruption;
>>
>> To be blunt, no this is completely wrong.
>
> Blunter: Moritz is right. :)

I'll ignore this section of the debate :)

>
>> Memory corruption *already having happened* can cause any
>> number of errors.
>
> True.
>
>> The point of bounds checking is to prevent memory corruption in
>> the first place.
>
> That's just one goal. It also maintains an invariant of arrays: The
> index value must be within bounds.

But the program cannot possibly know which variable is an index. So it cannot maintain the invariant until it's actually used.

At that point, it can use throwing an Error to say that something isn't right, or it can use throwing an Exception. D chose Error, and the consequences of that choice are that you have to check before D checks or else your entire program is killed.

>
>> I could memory corrupt the length of the array also (assuming a
>> dynamic array), and bounds checking merrily does nothing to
>> stop further memory corruption.
>
> That's true but the language provides no tool to check for that. The
> fact that program correctness is not achievable in general should not
> have any bearing on bounds checking.

My point simply is that assuming corruption is not a good answer. It's a good *excuse* for the current behavior, but doesn't really satisfy any meaningful requirement.

To borrow from another subthread here, imagine if when you attempted to open a non-existent file, the OS assumed that your program must have been memory corrupted and killed it instead of returning ENOENT? It could be a "reasonable" assumption -- memory corruption could have caused that filename to be corrupt, hence you have sniffed out a memory corruption and stopped it in its tracks! Well, actually not really, but you saw the tracks. Or else, maybe someone made a typo?

>>> and if data corruption occurred for
>>> the index, you *cannot* assume that *only* the index has been affected.
>>> The runtime cannot simply assume the index being out of bounds is not
>>> the result of already occurred data corruption, because that is
>>> inherently unsafe, so it *must* terminate asap as the default.
>>
>> The runtime should not assume that crashing the whole program is
>> necessary when an integer is out of range. Preventing actual corruption,
>> yes that is good. But an Exception would have done the job just fine.
>
> How could an Exception work in this case? Catch it and repeat the same
> bug over and over again? What would the program be achieving? (I assume
> the exception handler will not arbitrarily decrease index values.)

Just like it works for all other exceptions -- you print a reasonable message to the offending party (in this case, it would be a 500 error I think), and continue executing other things. No memory corruption has occurred because bounds checking stopped it, therefore the program is still sane.

-Steve

May 31, 2017

Re: Bad array indexing is considered deadly

Posted by Steven Schveighoffer
in reply to Moritz Maxeiner

Steven Schveighoffer

Posted in reply to Moritz Maxeiner

On 5/31/17 6:36 PM, Moritz Maxeiner wrote:
> On Wednesday, 31 May 2017 at 21:45:51 UTC, H. S. Teoh wrote:
>> This is an interesting use case, because conceptually speaking, each
>> vibe.d fibre actually represents an independent computation, so any
>> fatal errors like out-of-bounds bugs should cause the termination of
>> the *fibre*, rather than *everything* that just happens to be running
>> in the same process.
>
> While I agree on a theoretical level about the fact that in principal
> only the fibre (and the same argument goes for threads) should
> terminate, the problem is that fibres, as well as threads, share the
> same virtual memory of a process, i.e. memory corruption in one fibre
> (or thread) cannot in general be safely contained and kept from
> spreading to the other fibres (or threads; except in the thread case one
> might argue if you know the memory corruption to have happened only in
> TLS then you can kill the thread, but I don't know how you would prove
> that).

Again, there has not been memory corruption. There is a confusion rampant in this thread that preventing *attempted* memory corruption must mean there *is* memory corruption. One does not require the other.

-Steve

May 31, 2017

Re: Bad array indexing is considered deadly

Posted by Moritz Maxeiner
in reply to Jonathan M Davis

Moritz Maxeiner

Posted in reply to Jonathan M Davis

On Wednesday, 31 May 2017 at 22:42:30 UTC, Jonathan M Davis wrote:
>
> I don't think that you even need to worry about whether memory corruption occurred prior to indexing the array with an invalid index. The fact that the array was indexed with an invalid index is a bug. What caused the bug depends entirely on the code. Whether it's a memory corruption or something else is irrelevant. The contract of indexing arrays is that only valid indices be passed. [...]

That is correct (and that was even mentioned in the OP), but from my PoV the argument was about whether that contract is sensible the way it is, so I was arguing for why I think the contract is good as it is.
*The contract says so* is not an argument supporting the case of *why* the contract is the way it is.


>
> We _could_ make it so that the contract of indexing arrays is such that you're allowed to pass invalid values, but then [...]

Another reason as to why I support the current contract.

>
> As such, it really doesn't make sense to force all programs to deal with arrays throwing Exceptions due to bad indices. If a program can't guarantee that it's going to be passing a valid index to an array, then it needs to validate the index first.
> And if that needs to be done frequently, it makes a lot of sense to either create a wrapper function for indexing arrays which does the check or to outright wrap arrays such that opIndex on that type does the check and throws an Exception before the invalid index is passed to the array. And if the wrapper function is @trusted, it _should_ make it so that druntime doesn't check the index, avoiding having redundant checks.

Precisely, and that is why I stated that I think he should use a wrapper.

>
> I can understand Steven's frustration, but I really think that we're better off the way it is now, even if it's not ideal for his current use case.

I agree.

May 31, 2017

Re: Bad array indexing is considered deadly

Posted by Moritz Maxeiner
in reply to Steven Schveighoffer

Moritz Maxeiner

Posted in reply to Steven Schveighoffer

On Wednesday, 31 May 2017 at 22:47:38 UTC, Steven Schveighoffer wrote:
>
> Again, there has not been memory corruption.

Again, the runtime *cannot* know that and hence you *cannot* claim that. It sees an index out of bounds and it *cannot* reason about whether a memory corruption has already occurred or not, which means it *must assume* the worst case (it must *assume* there was).

> There is a  confusion rampant in this thread that preventing *attempted* memory corruption must mean there *is* memory corruption.

No, please no. Nobody has written that in the entire thread even once!
- An index being out of bounds is an error (lowercase!).
- The runtime sees that error when the array is accessed (what you describe as *attemped* memory corruption.
- The runtime does not know *why* the index is out of bounds
It does *not* mean that there *was* memory corruption (and again, nobody claimed that), but the runtime cannot assume that there was not, because that is *unsafe*.

> One  does not require the other.

Correct, but the runtime has to be safe in the *general* case, so it *must* assume the worst in case of a bug.

June 01, 2017

Re: Bad array indexing is considered deadly

Posted by Timon Gehr
in reply to Moritz Maxeiner

Timon Gehr

Posted in reply to Moritz Maxeiner

On 01.06.2017 00:22, Moritz Maxeiner wrote:
> On Wednesday, 31 May 2017 at 21:29:53 UTC, Timon Gehr wrote:
>> On 31.05.2017 22:45, Moritz Maxeiner wrote:
>>> On Wednesday, 31 May 2017 at 20:09:16 UTC, Nick Sabalausky (Abscissa) wrote:
>>>> [...]
>>>>> program is in an undefined state and should terminate asap.
>>>>
>>>> Then out-of-bounds and assert failures should be Exception not Error. Frankly, even out-of-memory, arguably. And then there's null dereference... In other words, basically everything.
>>>
>>> No, because as I stated in my other post, the runtime *cannot* assume that it is safe *in all cases*. If there is even one single case in which it is unsafe, it must abort.
>>
>> Hence all programs must abort on startup.
> 
> In the context of the conversation, and error has already occurred and the all cases was referring to all the cases that lead to the error.
Bounds checks have /no business at all/ trying to handle preexisting memory corruption, and in that sense they are comparable to program startup.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation