Bad array indexing is considered deadly (page 8) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Bad array indexing is considered deadly (page 8)

June 01, 2017

Re: Bad array indexing is considered deadly

Posted by Kagamin
in reply to Steven Schveighoffer

Kagamin

Posted in reply to Steven Schveighoffer

On Wednesday, 31 May 2017 at 21:00:43 UTC, Steven Schveighoffer wrote:
> To be blunt, no this is completely wrong. Memory corruption *already having happened* can cause any number of errors. The point of bounds checking is to prevent memory corruption in the first place.

Sad reality is that d programmers are still comfortable writing code in 70s style playing with void* pointers and don't enable bound checks early enough, see https://issues.dlang.org/show_bug.cgi?id=13367

June 01, 2017

Re: Bad array indexing is considered deadly

Posted by Steven Schveighoffer
in reply to Jonathan M Davis

Steven Schveighoffer

Posted in reply to Jonathan M Davis

On 5/31/17 6:42 PM, Jonathan M Davis via Digitalmars-d wrote:
> On Wednesday, May 31, 2017 19:17:16 Moritz Maxeiner via Digitalmars-d wrote:
>> On Wednesday, 31 May 2017 at 13:04:52 UTC, Steven Schveighoffer
>>
>> wrote:
>>> [...]
>>>
>>> What are your thoughts? Have you run into this? If so, how did
>>> you solve it?
>>
>> It is not that accessing the array out of bounds *leading* to
>> data corruption that is the issue here, but that in general you
>> have to assume that the index *being* out of bounds is itself the
>> *result* of *already occurred* data corruption; and if data
>> corruption occurred for the index, you *cannot* assume that
>> *only* the index has been affected. The runtime cannot simply
>> assume the index being out of bounds is not the result of already
>> occurred data corruption, because that is inherently unsafe, so
>> it *must* terminate asap as the default.
>>
>> If you get the index as the input to your process - and thus
>> *know* that it being out of bounds is not the result of previous
>> data corruption - then you should check this yourself before
>> accessing the array and handle it appropriately (e.g. via
>> Exception).
>
> I don't think that you even need to worry about whether memory corruption
> occurred prior to indexing the array with an invalid index. The fact that
> the array was indexed with an invalid index is a bug. What caused the bug
> depends entirely on the code. Whether it's a memory corruption or something
> else is irrelevant. The contract of indexing arrays is that only valid
> indices be passed. If an invalid index has been passed, then the contract
> has been violated, and by definition, there's a bug in the program, so the
> runtime has no choice but to throw an Error or otherwise kill the program.
> Given the contract, the only alternative would be to use assertions and only
> check when not compiling with -release, but that would be a serious problem
> for @safe code, and it really wouldn't help Steven's situation. Either way,
> the contract of indexing arrays is such that passing an invalid index is a
> bug, and no program should be doing it. The reason that the index is invalid
> is pretty much irrelevant to the discussion. It's a bug regardless.

Yes, it's definitely a bug, and that is not something I'm arguing against. The correct handling is to throw something, and prevent the corruption in doing so.

The problem is that the act of throwing itself makes the program unusable after that. I'm not on Nick's side saying that everything should be Exception, especially not out of memory.

But the result of throwing an Error means your entire program has now been *made* invalid, even if it wasn't before. Therefore you must close it. I feel this is a mistake. A bad index can come from anywhere, and to assume it's from memory corruption is a huge leap.

What would have been nice is to have a level between Error and Exception, and to throw that when a bug such as this occurs. Something that a framework can catch, but @safe code couldn't. I feel that when these decisions were made, the concept of a single-process fiber-based server wasn't planned for.

> We _could_ make it so that the contract of indexing arrays is such that
> you're allowed to pass invalid values, but then the runtime would _always_
> have to check the indices (even in @system code), and arrays in general
> could never be used in code that was nothrow without a bunch of extra
> try-catch blocks. It would be like how auto-decoding and UTFException screws
> over our ability to have nothrow code with strings, only it would be for
> _all_ arrays. So, the result would be annoying for a lot of code as well as
> less efficient.

Right, we can't pick that path now anyway. Too much code would break.

>
> The vast majority of array code is written in a way that invalid indices are
> simple never used, and having it so that indexing an array could throw an
> Exception would cause serious problems for a lot of code - especially when
> the code is already written in a way that such an exception will never be
> thrown (similar to how format can't be nothrow even when you know you've
> passed the correct arguments, and it will never throw).
>
> As such, it really doesn't make sense to force all programs to deal with
> arrays throwing Exceptions due to bad indices. If a program can't guarantee
> that it's going to be passing a valid index to an array, then it needs to
> validate the index first. And if that needs to be done frequently, it makes
> a lot of sense to either create a wrapper function for indexing arrays which
> does the check or to outright wrap arrays such that opIndex on that type
> does the check and throws an Exception before the invalid index is passed to
> the array. And if the wrapper function is @trusted, it _should_ make it so
> that druntime doesn't check the index, avoiding having redundant checks.
>
> I can understand Steven's frustration, but I really think that we're better
> off the way it is now, even if it's not ideal for his current use case.

It just means that D is an inferior platform for a web framework, unless you use the process-per-request model so the entire thing doesn't go down for one page request. But that obviously is going to cause performance problems.

Which is unfortunate, because vibe.d is a great platform for web development, other than this. You could go Adam's route and just put the blinders on, but I think that's not a sustainable practice.

-Steve

June 01, 2017

Re: Bad array indexing is considered deadly

Posted by rikki cattermole
in reply to Steven Schveighoffer

rikki cattermole

Posted in reply to Steven Schveighoffer

I'm just sitting here waiting for shared libraries to be properly implemented cross platform.
Then I can start thinking about a proper web server written in D.

Until then, we are not really suited to become a generic web server and should only exist in the context of multiple instances (and restart-able).

June 01, 2017

Re: Bad array indexing is considered deadly

Posted by Steven Schveighoffer
in reply to Walter Bright

Steven Schveighoffer

Posted in reply to Walter Bright

On 5/31/17 9:05 PM, Walter Bright wrote:
> On 5/31/2017 6:04 AM, Steven Schveighoffer wrote:
>> Technically this is a programming error, and a bug. But memory hasn't
>> actually been corrupted.
>
> Since you don't know where the bad index came from, such a conclusion
> cannot be drawn.

You could say that about any error. You could say that about malformed unicode strings, malformed JSON data, file not found. In this mindset, everything should be an Error, and nothing should be recoverable.

>
>
>> This seems like a large penalty for "almost" corrupting memory. No
>> other web framework I've used crashes the entire web server for such a
>> simple programming error.
>
> Hence the endless vectors for malware insertion in those other frameworks.

No, those are due to the implementation of the interpreter. If the interpreter is implemented in @safe D, then you don't have those problems.

>> Compare this to, let's say, a malformed unicode string (exception),
> malformed JSON data (exception), file not found (exception), etc.
>
> That's because those are input and environmental errors, not programming
> bugs.

Not necessarily. A file name could be sourced from the program, but have a typo. An index could come from the environment. The library can't know, but makes assumptions one way or the other. Just like we assume you want to use the GC, these assumptions are harmful for those who need it to be the other way.

> There can be grey areas in classifying problems as input errors or
> programming bugs, and those will need some careful thought by the
> programmer as to which bin they fall into, and then code accordingly.
>
> Array overflows are not a grey area, however. They are always
> programming bugs.

Of course, programming bugs cause all kinds of Errors and Exceptions alike. Environmental bugs can cause Array overflows.

I can detail exactly what happened in my code -- I am accepting dates from a given week from a web request. One of the dates fell outside the week, and so tried to access a 7 element array with index 9. Nothing corrupted memory, but the runtime corrupted my entire process, forcing a shutdown.

With an exception thrown, I still see the programming error, I still can fix it, and other web pages can still continue to be served.

> This topic comes up regularly in this forum - the idea that a program
> that entered an unknown, undefined state is actually ok and can continue
> executing. Maybe that's fine on a system (such as a gaming console)
> where nobody cares if it goes off the deep end and it is not connected
> to the internet so it cannot propagate malware infections.

In fact, it did not enter such a state. The runtime successfully *prevented* such a state. And then instantaneously ruined the state by unwinding the stack without

> Otherwise, while it's hard to write invulnerable programs, it is another
> thing entirely to endorse vulnerabilities. I cannot endorse such
> practices, nor can I endorse vibe.d if it is coded to continue running
> after entering an undefined state.

It's not. And it can't be. What I have to do is re-engineer the contract between myself and arrays. The only way to do that is to not use builtin arrays. That's the part that sucks. My code will be perfectly safe, and not ever experience corruption. It's just a bit ugly.

> A corollary is the idea that one creates reliable systems by writing
> programs that can continue executing after corruption. This is another
> fallacious concept. Reliable systems are ones that have independent
> components that can take over if some part of them fails. Shared memory
> is not independence.

That is not what is happening here. I'm avoiding corruption so I don't have to crash.

-Steve

June 01, 2017

Re: Bad array indexing is considered deadly

Posted by Jonathan M Davis
in reply to Steven Schveighoffer

Jonathan M Davis

Posted in reply to Steven Schveighoffer

On Thursday, June 01, 2017 06:13:25 Steven Schveighoffer via Digitalmars-d wrote:
> It just means that D is an inferior platform for a web framework, unless you use the process-per-request model so the entire thing doesn't go down for one page request. But that obviously is going to cause performance problems.
>
> Which is unfortunate, because vibe.d is a great platform for web development, other than this. You could go Adam's route and just put the blinders on, but I think that's not a sustainable practice.

Honestly, unless something about vibe.d prevents fixing bugs like bad array indices, I'd just use vibe.d and program normally, and if a problem like you hit occurs, I'd fix it, and then that wouldn't crash the program anymore. Depending on how many such logic errors got passed the testing process, it could take a while before the server was stable enough, or it could be very little time at all, but in the long run, there wouldn't be any invalid array indices, because those bugs would have been fixed, and there wouldn't be a problem anymore.

Now, if there's something about vibe.d that outright prevents fixing these bugs or makes them impossible to avoid, then that calls for a different approach, but from what I understand of the situation, I don't see anything here preventing using vibe.d's approach with fibers. It's just that missed bugs will be very annoying until they're fixed, but that's true of most programs.

- Jonathan M Davis

June 01, 2017

Re: Bad array indexing is considered deadly

Posted by Kagamin
in reply to Ali Çehreli

Kagamin

Posted in reply to Ali Çehreli

On Wednesday, 31 May 2017 at 21:30:05 UTC, Ali Çehreli wrote:
> How could an Exception work in this case? Catch it and repeat the same bug over and over again? What would the program be achieving? (I assume the exception handler will not arbitrarily decrease index values.)

Other systems work like this: an internal server error is reported to the client, client reports an unexpected error to the user, and the action is repeated at the user's discretion.

June 01, 2017

Re: Bad array indexing is considered deadly

Posted by Jonathan M Davis
in reply to Steven Schveighoffer

Jonathan M Davis

Posted in reply to Steven Schveighoffer

On Thursday, June 01, 2017 06:26:24 Steven Schveighoffer via Digitalmars-d wrote:
> On 5/31/17 9:05 PM, Walter Bright wrote:
> > On 5/31/2017 6:04 AM, Steven Schveighoffer wrote:
> >> Technically this is a programming error, and a bug. But memory hasn't actually been corrupted.
> >
> > Since you don't know where the bad index came from, such a conclusion cannot be drawn.
>
> You could say that about any error. You could say that about malformed unicode strings, malformed JSON data, file not found. In this mindset, everything should be an Error, and nothing should be recoverable.

I think that it really comes down to what the contract is and how it makes sense to treat bad values. At the one extreme, you can treat all bad input as programmer error, requiring that callers validate all arguments to all functions (in which case, assertions or some other type of Error would be used on failure), and at the other extreme, you can be completely defensive about it and can have every function validate its input and throw an Exception on failure so that the checks never get compiled out, and the caller can choose whether they want to recover or not. Both approaches are of course rather extreme, and what we should do is somewhere in the middle.

So, for any given function, we need to decide whether we want to take the DbC approach and require that the caller validates the input or take the defensive programming approach and have the function itself validate the input. Which makes more sense depends on what the function does and how it's used and is a bit of an art. But ultimately, whether something is a programming error depends on what the API and its contracts are, and that definitely does not mean that one-size-fits-all.

As a default, I think that treating invalid indices as an Error makes a lot of sense, but it is true that if the index comes from user input or is otherwise inferred from user input, having the checks result in Errors is annoying. But you can certainly do additional checks yourself, and if you wrap the actual call to index the array in an @trusted function, it should be possible to avoid there being two checks in the case that the index is valid.

I get the impression that Walter tends to prefer treating stuff as programmatic error due to the types of programs that he usually writes. You get a lot fewer things that come from user input when you're simply processing a file (like you do with a compiler) than you get with stuff like a server application or a GUI. So, I think that he's more inclined to come to the conclusion that something should be treated as programmatic error than some other folks are. That being said, I also think that many folks are too willing to try and make their program continue like nothing was wrong after something fairly catastrophic happened.

> > Otherwise, while it's hard to write invulnerable programs, it is another thing entirely to endorse vulnerabilities. I cannot endorse such practices, nor can I endorse vibe.d if it is coded to continue running after entering an undefined state.
>
> It's not. And it can't be. What I have to do is re-engineer the contract between myself and arrays. The only way to do that is to not use builtin arrays. That's the part that sucks. My code will be perfectly safe, and not ever experience corruption. It's just a bit ugly.

Well, you _can_ use the built-in arrays and just use a helper function for indexing arrays so that the arrays are passed around normally, but you get an Exception for an invalid index rather than an Error. You would have to be careful to remember to index the array through the helper function, but it wouldn't force you to use different data structures. e.g.

auto result = arr[i];

becomes something like

auto result = arr.at(i);

As an aside, I think that there has been way too much talk of memory corruption in this thread, and much of it has derailed the discussion from the actual issue. The array bounds checking prevented the memory corruption problem. The question here is how to deal with invalid indices and whether it should be treated as programmer error or bad input, and that's really a question of whether arrays should use DbC or defensive programming and whether there should be a way to choose based on your application's needs.

- Jonathan M Davis

June 01, 2017

Re: Bad array indexing is considered deadly

Posted by Steven Schveighoffer
in reply to Daniel Kozak

Steven Schveighoffer

Posted in reply to Daniel Kozak

On 6/1/17 6:05 AM, Daniel Kozak via Digitalmars-d wrote:
> [Service]
> ....
>
> Restart=on-failure

Thanks!

-Steve

June 01, 2017

Re: Bad array indexing is considered deadly

Posted by Paolo Invernizzi
in reply to Steven Schveighoffer

Paolo Invernizzi

Posted in reply to Steven Schveighoffer

On Thursday, 1 June 2017 at 10:26:24 UTC, Steven Schveighoffer wrote:
> On 5/31/17 9:05 PM, Walter Bright wrote:
>> On 5/31/2017 6:04 AM, Steven Schveighoffer wrote:
>>> Technically this is a programming error, and a bug. But memory hasn't
>>> actually been corrupted.
>>
>> Since you don't know where the bad index came from, such a conclusion
>> cannot be drawn.
>
> You could say that about any error. You could say that about malformed unicode strings, malformed JSON data, file not found. In this mindset, everything should be an Error, and nothing should be recoverable.

Everything coming as an input of the _process_ should be validated... once validated, if still find during the execution malformed JSON data, malformed unicode strings, etc, there's a bug, and the process should terminate.

>>> This seems like a large penalty for "almost" corrupting memory. No
>>> other web framework I've used crashes the entire web server for such a
>>> simple programming error.
>>
>> Hence the endless vectors for malware insertion in those other frameworks.
>
> No, those are due to the implementation of the interpreter. If the interpreter is implemented in @safe D, then you don't have those problems.

It seems to me that reducing the danger only to corrupted memory is underestimating the damage that can be done, for example by a simple SQL injection, that can be done without corrupting memory at all.

>>> Compare this to, let's say, a malformed unicode string (exception),
>> malformed JSON data (exception), file not found (exception), etc.
>>
>> That's because those are input and environmental errors, not programming
>> bugs.
>
> Not necessarily. A file name could be sourced from the program, but have a typo. An index could come from the environment. The library can't know, but makes assumptions one way or the other. Just like we assume you want to use the GC, these assumptions are harmful for those who need it to be the other way.

The library should not assume nothing about anything coming from the environment, the filesystem, etc: there's must be a validation at the boundaries.

> I can detail exactly what happened in my code -- I am accepting dates from a given week from a web request. One of the dates fell outside the week, and so tried to access a 7 element array with index 9. Nothing corrupted memory, but the runtime corrupted my entire process, forcing a shutdown.

And that's a good thing! The input should be validated, especially because we are talking about a web request.

See it like being kind with the other side of the connection, informing it with a clear "rejected as the date is invalid".

:-)

/Paolo

June 01, 2017

Re: Bad array indexing is considered deadly

Posted by Adam D. Ruppe
in reply to Steven Schveighoffer

Adam D. Ruppe

Posted in reply to Steven Schveighoffer

On Thursday, 1 June 2017 at 10:13:25 UTC, Steven Schveighoffer wrote:
> Which is unfortunate, because vibe.d is a great platform for web development, other than this. You could go Adam's route and just put the blinders on, but I think that's not a sustainable practice.

If you control the deployment, it works perfectly well. You aren't being blind to it, you are just taking control.

I prefer to use processes anyway (they are easier to use, compatible with more libraries, considerably more reliable, and perform quite well - we don't have to spin up a new perl interpreter, 1999 was a long time ago), but fibers can handle RangeError too as long as you never use -release and such.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation