June 01, 2017
On 6/1/2017 2:53 AM, Vladimir Panteleev wrote:
> 3. Design your program so that it can be terminated at any point without resulting in data corruption. I don't know if Vibe.d can satisfy this constraint, but e.g. the ae.net.http.server workflow is to build/send the entire response atomically, meaning that the Content-Length will always be populated. Wrap your database updates in transactions. Use the "write to temporary file then rename over the original file" pattern when updating files. Etc.

This is the best advice.

I.e. design with the assumption that failure will occur, rather than fruitlessly trying to prevent all failure.

June 01, 2017
On 6/1/2017 7:48 AM, H. S. Teoh via Digitalmars-d wrote:
> Yes.  Saving *after* a crash was detected is stupid, because you no
> longer can guarantee the user data you're saving hasn't already been
> corrupted.  I've experienced over-zealous "crash recovery" code in
> applications overwrite the last known good copy of my data with the
> latest, most up-to-date, and also most-corrupted data after it detected
> a problem. Not nice at all.

An even better idea is to use rolling backups, with the crash recovery backup only being the most recent, not the only, version.
June 01, 2017
On 6/1/2017 3:26 AM, Steven Schveighoffer wrote:
> On 5/31/17 9:05 PM, Walter Bright wrote:
>> On 5/31/2017 6:04 AM, Steven Schveighoffer wrote:
>>> Technically this is a programming error, and a bug. But memory hasn't
>>> actually been corrupted.
>>
>> Since you don't know where the bad index came from, such a conclusion
>> cannot be drawn.
> 
> You could say that about any error. You could say that about malformed unicode strings, malformed JSON data, file not found. In this mindset, everything should be an Error, and nothing should be recoverable.

What's missing here is looking carefully at a program and deciding what are input (and environmental) errors and what are program bugs. The former are recoverable, the latter are not.

For example, malformed unicode strings. Joel Spolsky wrote about this issue long ago, in that data in a program should be compartmentalized into untrusted and trusted data.

Untrusted data comes from the input, and stays untrusted until it is validated. Malformed untrusted data are recoverable. Once it is validated, it becomes trusted data. Any malformations in trusted data are programming bugs. It should be clear in a well designed program what data is trusted and what data is untrusted. Spolsky suggests using different types for them so they are distinct.

For your date case, the date was not validated, and was fed into an array, where the invalid date overflowed the array bounds. The program was relying on the array bounds checking to validate the data.

I'd argue this is a problematic program design because:

1. It's inefficient. Data should be validated once in a clear location in the program. Arrays appear all over the place, and tend to be in hot locations. Validating the same data over and over is highly inefficient.

2. Array bounds checking can be turned off by a compiler switch. Program data validation should not be silently disabled in such an unexpected manner.

3. Arrays are a ubiquitous data structure. They are used all over the place. There is no way to distinguish "this is a data validation use" and "this must be valid data".

4. It would be surprising to anyone familiar with D looking at your code to realize that an array access is data validation rather than bug checking.

5. Arrays are sometimes optimized by removing the bounds checking. This should not turn off data validation.

6. @safe code is intended to find programming bugs, not validate input data.

7. Just because code is marked @safe doesn't mean memory corruption is impossible. Even if @safe is perfect, programs have @trusted and @system code too, and those may have memory corrupting bugs.

8. It does not distinguish array overflow from programming bugs / corruption from invalid program input.
June 01, 2017
On 6/1/2017 3:56 AM, Jonathan M Davis via Digitalmars-d wrote:
> I get the impression that Walter tends to prefer treating stuff as
> programmatic error due to the types of programs that he usually writes. You
> get a lot fewer things that come from user input when you're simply
> processing a file (like you do with a compiler) than you get with stuff like
> a server application or a GUI. So, I think that he's more inclined to come
> to the conclusion that something should be treated as programmatic error
> than some other folks are.

It is a programming bug to not validate the input. It's not that bad to abort programs if you neglected to validate the input.

It is always bad to treat programming bugs as input errors.
June 01, 2017
On 6/1/2017 7:21 AM, Stanislav Blinov wrote:
> Oh yes, there is a way: http://forum.dlang.org/post/psdamamjecdwfeiuvqsz@forum.dlang.org


Please post bug reports to bugzilla. Posting them only on the n.g. pretty much ensures they will never get addressed.
June 01, 2017
On 01.06.2017 14:25, Paolo Invernizzi wrote:
> 
>> I can detail exactly what happened in my code -- I am accepting dates from a given week from a web request. One of the dates fell outside the week, and so tried to access a 7 element array with index 9. Nothing corrupted memory, but the runtime corrupted my entire process, forcing a shutdown.
> 
> And that's a good thing! The input should be validated, especially because we are talking about a web request.
> 
> See it like being kind with the other side of the connection, informing it with a clear "rejected as the date is invalid".
> 
> :-)

You seem to not understand what happened. There was a single server serving multiple different web pages. There was an out-of-bounds error due to a single user inserting invalid data into a single form with missing data validation. The web server went down, killing all pages for all users.

There is no question that input data should be validated, but if it isn't, the response should be proportional. It's enough to kill the request, log the exception , notify the developer, and maybe even disable the specific web page.
June 01, 2017
On 2017-06-01 12:13, Steven Schveighoffer wrote:

> It just means that D is an inferior platform for a web framework, unless
> you use the process-per-request model so the entire thing doesn't go
> down for one page request. But that obviously is going to cause
> performance problems.

You can do a combination of both. One request per fiber and as many instances of your program as cores. That will utilize the hardware better. I've noticed that the multi-threading in vibe.d doesn't seem to work. If one process goes down all those request are lost, but you can still handle new requests. That in the combination of auto restarting the processes if they crash.

-- 
/Jacob Carlborg
June 01, 2017
On Thu, Jun 01, 2017 at 11:29:53AM -0700, Walter Bright via Digitalmars-d wrote: [...]
> Untrusted data comes from the input, and stays untrusted until it is validated. Malformed untrusted data are recoverable. Once it is validated, it becomes trusted data. Any malformations in trusted data are programming bugs. It should be clear in a well designed program what data is trusted and what data is untrusted. Spolsky suggests using different types for them so they are distinct.
> 
> For your date case, the date was not validated, and was fed into an array, where the invalid date overflowed the array bounds. The program was relying on the array bounds checking to validate the data.

+1.  I think this is the root of the problem.  Data that comes from outside sources must never, ever be trusted, until they are validated. Any errors that occur during validation are recoverable, because you *know* they are caused by wrong data from outside.

Once the data is validated, any further errors involving that data are program bugs: either your validation code was incorrect / incomplete, or there is a program logic error that led to an inconsistent state. In this case, aborting the program is the only sane response, especially in an online services setting, because your broken validation code may have let through maliciously-crafted data that can lead to an exploit (better nip it in the bud before the exploit proceeds any further), or the internal program logic is inconsistent, so proceeding further is UB.

Feeding unvalidated, tainted data directly into inner program logic like indexing an array is a bad idea.  The data ought to be validated first.

I like Spolsky's idea of using separate types for tainted / verified input. Let the compiler statically verify that you at least made an attempt at validating your program's inputs (though obviously it can only go so far -- the compiler can't guarantee that your validation code is actually correct).  The problem, though, is that D currently doesn't have tainted types, so for example you can't tell at a glance whether a given string is untrusted user input or validated data, it's all just `string`.  I wonder if tainted types could be something worth adding either to the language or to Phobos.


[...]
> 8. It does not distinguish array overflow from programming bugs / corruption from invalid program input.

Yes, I think this conflation is the root cause of this problem. Validation should be explicit, and separate from inner program logic. Mixing the two together only serves to confuse the issue.


T

-- 
If you think you are too small to make a difference, try sleeping in a closed room with a mosquito. -- Jan van Steenbergen
June 01, 2017
On 01.06.2017 20:37, Walter Bright wrote:
> On 6/1/2017 3:56 AM, Jonathan M Davis via Digitalmars-d wrote:
>> I get the impression that Walter tends to prefer treating stuff as
>> programmatic error due to the types of programs that he usually writes. You
>> get a lot fewer things that come from user input when you're simply
>> processing a file (like you do with a compiler) than you get with stuff like
>> a server application or a GUI. So, I think that he's more inclined to come
>> to the conclusion that something should be treated as programmatic error
>> than some other folks are.
> 
> It is a programming bug> to not validate the input. It's not that bad to abort programs if you neglected to validate the input.
> ...

It really depends on the specific circumstances.

> It is always bad to treat programming bugs as input errors.

They should be treated as bugs, but isn't it plausible that there are circumstances where one does not want to authorize every @safe library function one calls to bring down the entire process?
June 01, 2017
On 01.06.2017 10:47, Paolo Invernizzi wrote:
> On Thursday, 1 June 2017 at 06:11:43 UTC, H. S. Teoh wrote:
>> On Thu, Jun 01, 2017 at 03:24:02AM +0000, John Carter via Digitalmars-d wrote: [...]
>>> [...]
>> [...]
>>
>> Again, from an engineering standpoint, this is a tradeoff.
>>
>> [...]
> 
> That's exactly the point: to use the right tool for the requirement of the job to be done.
> 
> /P

There is no such tool.