Bad array indexing is considered deadly (page 11)

On 6/1/2017 11:29 AM, Walter Bright wrote: > Joel Spolsky wrote about this issue long ago, in that data in a program should be compartmentalized into untrusted and trusted data. Found it: https://www.joelonsoftware.com/2005/05/11/making-wrong-code-look-wrong/ It's one of those programming essays that everyone should read.

On 6/1/2017 12:16 PM, Timon Gehr wrote: > On 01.06.2017 20:37, Walter Bright wrote: >> It is a programming bug> to not validate the input. It's not that bad to abort programs if you neglected to validate the input. >> ... > > It really depends on the specific circumstances. The stages of programming expertise: 1. newbie - follows the rules because he is told to 2. master - follows the rules because he understands them 3. guru - breaks the rules because he understands the rules don't apply Let's not skip stages :-) >> It is always bad to treat programming bugs as input errors. > They should be treated as bugs, but isn't it plausible that there are circumstances where one does not want to authorize every @safe library function one calls to bring down the entire process? You, as the programmer, need to decide what is validated data and what is not. Being unclear about this is technical debt that is going to cause problems. Validated data that is not valid is a programming bug and the program should be aborted.

On 01.06.2017 02:57, Moritz Maxeiner wrote: >> >> Termination of what? How on earth do you determine that the scope of this "undefined state" is the program, not the machine, or the world? > > As that is the closest scope current operating systems give us to work with, this is a sane default for the runtime. Nobody stops you from using a different scope if you need it. > Yes, they would stop me from using a smaller scope. 'nothrow' functions are not guaranteed to be unwindable and the compiler infers 'nothrow' automatically. Also, null pointer dereferences do not even throw. (On Linux.)

On 2017-06-01 21:20, Timon Gehr wrote: > There is no such tool. In this case, Erlang is a pretty good candidate. It's using green processes that are even more lightweight than fibers. You can have millions of these processes. All data is process local. If there's a corruption in one of the processes it cannot affect the other ones (unless there's a bug in the virtual machine). The major downside is that it's not D and it's a pretty crappy programming language. -- /Jacob Carlborg

June 01, 2017

Re: Bad array indexing is considered deadly

Posted by Timon Gehr
in reply to Walter Bright

Permalink

Timon Gehr

Posted in reply to Walter Bright

Permalink

On 01.06.2017 21:48, Walter Bright wrote:
> On 6/1/2017 12:16 PM, Timon Gehr wrote:
>> On 01.06.2017 20:37, Walter Bright wrote:
>>> It is a programming bug> to not validate the input. It's not that bad to abort programs if you neglected to validate the input.
>>> ...
>>
>> It really depends on the specific circumstances.
> 
> The stages of programming expertise:
> 
> 1. newbie - follows the rules because he is told to
> 2. master - follows the rules because he understands them
> 3. guru - breaks the rules because he understands the rules don't apply
> 
> Let's not skip stages :-)
> ...

This does not really say anything about programming expertise, it says that "the rules" (whatever those are) are incomplete (unless there are no gurus, but then the list is nothing but silly).

I guess "terminate the program upon detection of a bug" is one of your rules. It's incomplete, but the language specification enforces it (for a subset of bugs).

> 
>>> It is always bad to treat programming bugs as input errors.
>> They should be treated as bugs, but isn't it plausible that there are circumstances where one does not want to authorize every @safe library function one calls to bring down the entire process?
> 
> You, as the programmer, need to decide what is validated data and what is not.

There is not only one programmer and not all programmers are me.

> Being unclear about this is technical debt that is going to cause problems.
> ...

This is both obvious and not answering my question.

> Validated data that is not valid is a programming bug

Again, obvious.

> and the program should be aborted.

The buggy subprogram should be. Let's say I want to use library functionality written over the course of years by non-computer scientist domain expert Random C. Monkey. The library is an ugly jungle of special cases but it is mostly correct and makes it trivial to add feature X to my product. It's also pure and @safe without any @trusted functions. I can still serve customers if this library occasionally misbehaves, at a lower quality. (Let's say it is trivial to check whether the code returned a correct result, even though building the result in the first place was hard.) I cannot trust Mr. Monkey to have written only correct code respecting array bounds and null pointers, but if my product does not (seem to) have feature X by tomorrow, I'm most likely going out of business. Now, why exactly should any of Mr. Monkey's bugs terminate my entire service, necessitating a costly restart and causing unnecessary frustration to my customers?

I'm pretty sure D should not outright prevent this use case, even though in an ideal world this situation would never arise.

On 6/1/2017 1:47 PM, Timon Gehr wrote: > I'm pretty sure D should not outright prevent this use case, even though in an ideal world this situation would never arise. C quality code is straightforward in D. Just mark it @system.

On Thursday, 1 June 2017 at 18:29:53 UTC, Walter Bright wrote: > > > What's missing here is looking carefully at a program and deciding what are input (and environmental) errors and what are program bugs. The former are recoverable, the latter are not. > > [...] I think he understood all that already. Array overflow is a sign of a bug, which must not be left to slip past. But I think the point was that it causes so big amount of work -the whole program- to abort. Potentially thousands of customers could lose connection to server because of that. He wishes that just the connection in question crashed, so other users using other, likely bugless, parts of the program would not be disturbed. Personally I have no opinion of this right now, save that it's definitely a tough sounding question.

On 01.06.2017 23:12, Walter Bright wrote: > On 6/1/2017 1:47 PM, Timon Gehr wrote: >> I'm pretty sure D should not outright prevent this use case, even though in an ideal world this situation would never arise. > > C quality code is straightforward in D. Just mark it @system. I don't know what this is, but it is not an answer to my post.

On Thursday, 1 June 2017 at 18:40:28 UTC, Walter Bright wrote: > On 6/1/2017 7:21 AM, Stanislav Blinov wrote: >> Oh yes, there is a way: http://forum.dlang.org/post/psdamamjecdwfeiuvqsz@forum.dlang.org > > > Please post bug reports to bugzilla. Posting them only on the n.g. pretty much ensures they will never get addressed. Please look at the very first post of that thread :\

On Thursday, 1 June 2017 at 18:54:51 UTC, Timon Gehr wrote: > On 01.06.2017 14:25, Paolo Invernizzi wrote: >> >>> I can detail exactly what happened in my code -- I am accepting dates from a given week from a web request. One of the dates fell outside the week, and so tried to access a 7 element array with index 9. Nothing corrupted memory, but the runtime corrupted my entire process, forcing a shutdown. >> >> And that's a good thing! The input should be validated, especially because we are talking about a web request. >> >> See it like being kind with the other side of the connection, informing it with a clear "rejected as the date is invalid". >> >> :-) > > You seem to not understand what happened. There was a single server serving multiple different web pages. There was an out-of-bounds error due to a single user inserting invalid data into a single form with missing data validation. The web server went down, killing all pages for all users. > > There is no question that input data should be validated, but if it isn't, the response should be proportional. It's enough to kill the request, log the exception , notify the developer, and maybe even disable the specific web page. I really understand what is happening: I've a vibe.d server that's serving a US top 5 FMCG world company, and sometime it goes down for a crash. It's dockerized, in a docker swarm, and every times it crashes (or it's "unhealty") it's restarted, and we've a log, that it's helping us to squeeze bugs. Guess it, it's not a problem for the customer (at least right now!) as long as we have taken a clear approach: we are squeezing bug, and if process state is signalling us that a bug has occurred, we simply pull the plug. A proportional response can be archived having multiple processes handling the requests.. it's the only sane way I can think to not kill "all" the sessions, but only a portion. /Paolo

Forums