Bad array indexing is considered deadly (page 12) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Bad array indexing is considered deadly (page 12)

June 01, 2017

Re: Bad array indexing is considered deadly

Posted by Paolo Invernizzi
in reply to Timon Gehr

Paolo Invernizzi

Posted in reply to Timon Gehr

On Thursday, 1 June 2017 at 19:20:01 UTC, Timon Gehr wrote:
> On 01.06.2017 10:47, Paolo Invernizzi wrote:
>> On Thursday, 1 June 2017 at 06:11:43 UTC, H. S. Teoh wrote:
>>> On Thu, Jun 01, 2017 at 03:24:02AM +0000, John Carter via Digitalmars-d wrote: [...]
>>>> [...]
>>> [...]
>>>
>>> Again, from an engineering standpoint, this is a tradeoff.
>>>
>>> [...]
>> 
>> That's exactly the point: to use the right tool for the requirement of the job to be done.
>> 
>> /P
>
> There is no such tool.

Process isolation was exactly crafted for that.

/Paolo

June 01, 2017

Re: Bad array indexing is considered deadly

Posted by cym13
in reply to H. S. Teoh

cym13

Posted in reply to H. S. Teoh

On Thursday, 1 June 2017 at 19:04:19 UTC, H. S. Teoh wrote:
> I like Spolsky's idea of using separate types for tainted / verified input. Let the compiler statically verify that you at least made an attempt at validating your program's inputs (though obviously it can only go so far -- the compiler can't guarantee that your validation code is actually correct).  The problem, though, is that D currently doesn't have tainted types, so for example you can't tell at a glance whether a given string is untrusted user input or validated data, it's all just `string`.  I wonder if tainted types could be something worth adding either to the language or to Phobos.

I'm not familiar with the idea, do we need more than the following?

struct Tainted {
    T _basetype;
    alias _basetype this;
}


void main(string[] args) {
    auto ts = Tainted!string("Hello");
    writeln(ts);
}

It's a PoC, ok, but it lets you use ts like any variable of the base type, it lets you convert one easily to the other, but this conversion has to be explicit. So, real question, what more do we need?

June 01, 2017

Re: Bad array indexing is considered deadly

Posted by Steven Schveighoffer
in reply to Walter Bright

Steven Schveighoffer

Posted in reply to Walter Bright

On 6/1/17 2:00 PM, Walter Bright wrote:
> On 6/1/2017 2:53 AM, Vladimir Panteleev wrote:
>> 3. Design your program so that it can be terminated at any point
>> without resulting in data corruption. I don't know if Vibe.d can
>> satisfy this constraint, but e.g. the ae.net.http.server workflow is
>> to build/send the entire response atomically, meaning that the
>> Content-Length will always be populated. Wrap your database updates in
>> transactions. Use the "write to temporary file then rename over the
>> original file" pattern when updating files. Etc.
>
> This is the best advice.
>
> I.e. design with the assumption that failure will occur, rather than
> fruitlessly trying to prevent all failure.
>

Indeed it is good advice. I'm thinking actually a good setup is to have 2 levels of processes: one which delivers requests to some set of child processes that handle the requests with fibers, and one which handles the i/o to the client. Then if the subprocess dies, the master process can both inform the client of the failure, and retry other fibers that were in process but never had a chance to finish.

Not sure if I'll get to that point. At this time, I'm writing an array wrapping struct that will turn all range errors into range exceptions. Then at least I can inform the client of the error and continue to handle requests.

-Steve

June 02, 2017

Re: Bad array indexing is considered deadly

Posted by aberba
in reply to Steven Schveighoffer

aberba

Posted in reply to Steven Schveighoffer

On Thursday, 1 June 2017 at 10:13:25 UTC, Steven Schveighoffer wrote:
> On 5/31/17 6:42 PM, Jonathan M Davis via Digitalmars-d wrote:
>> On Wednesday, May 31, 2017 19:17:16 Moritz Maxeiner via Digitalmars-d wrote:

>
> It just means that D is an inferior platform for a web framework, unless you use the process-per-request model so the entire thing doesn't go down for one page request. But that obviously is going to cause performance problems.
>
> Which is unfortunate, because vibe.d is a great platform for web development, other than this. You could go Adam's route and just put the blinders on, but I think that's not a sustainable practice.
>
> -Steve

I'm glad I know enough to know this is an opinion...

anyway, its better to run a vibe.d instance in something like daemonized package. You should also use the vibe.d error handlers.

June 01, 2017

Re: Bad array indexing is considered deadly

Posted by H. S. Teoh
in reply to cym13

H. S. Teoh

Posted in reply to cym13

On Thu, Jun 01, 2017 at 10:09:36PM +0000, cym13 via Digitalmars-d wrote:
> On Thursday, 1 June 2017 at 19:04:19 UTC, H. S. Teoh wrote:
> > I like Spolsky's idea of using separate types for tainted / verified input. Let the compiler statically verify that you at least made an attempt at validating your program's inputs (though obviously it can only go so far -- the compiler can't guarantee that your validation code is actually correct).  The problem, though, is that D currently doesn't have tainted types, so for example you can't tell at a glance whether a given string is untrusted user input or validated data, it's all just `string`.  I wonder if tainted types could be something worth adding either to the language or to Phobos.
> 
> I'm not familiar with the idea, do we need more than the following?
> 
> struct Tainted {
>     T _basetype;
>     alias _basetype this;
> }
> 
> 
> void main(string[] args) {
>     auto ts = Tainted!string("Hello");
>     writeln(ts);
> }
> 
> It's a PoC, ok, but it lets you use ts like any variable of the base type, it lets you convert one easily to the other, but this conversion has to be explicit. So, real question, what more do we need?
[...]

Actually, I re-read Spolsky's blog post[1] again, and apparently he didn't
actually recommend using the type system for enforcing this, but a
naming convention that would make code stick out when it's doing
something funny.

[1] https://www.joelonsoftware.com/2005/05/11/making-wrong-code-look-wrong/

So, for example, you'd name all tainted strings with the prefix `us`, and all functions that return tainted strings are prefixed with `us`, including any string identifiers you might use to refer to the tainted data.  E.g.:

	string usName = usGetParam(httpRequest, "name");
	...
	database.cache("usName", usName);
	...
	string usData = database.read("usName");
	...
	// sEscapeHtmlUs means it converts unsafe data (...Us) to safe
	// data (s...) by escaping dangerous characters.
	string sData = sEscapeHtmlUs(usData);
	...
	// sWrite means it requires safe data
	sWrite(html, "<p>Your name is %s</p>", sData);

The idea is that if you see a line of code where the prefixes don't match, then you immediately know there's a problem. For example:

	// Uh-oh, we assigned unsafe data to a variable that should only
	// hold safe data.
	string sName = usGetParam(httpRequest, "name");

	// Uh-oh, we wrote unsafe data into a database field that should
	// only contain safe data.
	database.cache("sName", usName);

	// Uh-oh, we're printing unsafe data via a function that assumes
	// its input is safe.
	sWrite(html, "<p>Your name is %s</p>", usData);

This is not bad, since with some practice you could immediately identify code that's probably wrong (mixing s- and us- prefixes wrongly, or identifier with no prefix, meaning the code needs to be reviewed and the identifier renamed accordingly).

The problem is that this is still in the realm of coding by convention. What I had in mind was more along the lines of what you proposed, that you'd actually use the type system to enforce a distinction between safe and unsafe data, so that the compiler will reject code that tries to mix the two without an explicit conversion.

I haven't thought too deeply about how to actually implement this, but here's my initial idea: any function that reads data from external sources (network, filesystem, environment) will return Tainted!string or Tainted!(T[]) rather than string or T[]. Unlike what you proposed above, the Tainted wrapper will *not* allow implicit conversion to the underlying type, because otherwise it defeats the purpose (pass Tainted!T to a function that expects T, and the compiler will automatically cast it to T for you: no good).  So you cannot pass this data directly to a function that expects string or T[].  However, they will allow some way of accessing the wrapped data, so that the validation function can inspect the data to ensure that it's OK, then explicitly cast it to the underlying type.

Sketch of code:

	struct Tainted(T)
	{
		// Note: outside code cannot directly access payload.
		private T payload;

		T validate(alias isClean)()
			if (is(typeof(isClean(T.init)) == bool))
		{
			// Do not allow isClean to escape references to
			// payload (?is this correct usage?). Requires
			// -dip1000.
			scope _p = payload;

			if (isClean(_p))
				return payload;
			throw new Exception("Bad data");
		}

		T cleanse(alias cleaner)()
			if (is(typeof(cleaner(T.init)) == T))
		{
			// Prevent cleaner() from cheating and simply
			// returning the payload (?necessary?). Requires
			// -dip1000. The idea being to force the
			// creation of safe data from the payload, e.g.,
			// a HTML-escaped string from a raw string.
			scope _p = payload;

			return cleaner(_p);
		}
	}

	// Note: returns Tainted!T instead of T.
	Tainted!T readParam(T)(HttpRequest req, string paramName);

	// Note: requires string, not Tainted!string
	void writeToOutput(string s);

	void handleRequest(HttpRequest req)
	{
		string[7] daysOfWeek = [
			"Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"
		];

		// Returns Tainted!int
		auto day = req.readParam!int("dayOfWeek");

		// Compile error: cannot index array with Tainted!int
		//writeToOutput(daysOfWeek[day]);

		// Check range and return unwrapped int if OK, throw
		// Exception otherwise.
		auto checkedDay = day.validate!(d => d >= 0 && d < daysOfWeek.length);

		writeToOutput(daysOfWeek[checkedDay]); // OK

		// Returns Tainted!string
		auto name = req.readParam!string("name");

		// Compile error: cannot pass Tainted!string to writeToOutput.
		//writeToOutput(name);

		// Unwrap to string if does not contain meta-characters,
		// throw Exception otherwise.
		auto safeName = name.validate!hasNoMetaCharacters;

		writeToOutput(safeName); // OK

		// Cleanse the string by escaping metacharacters.
		auto escapedName = name.cleanse!escapeHtmlMetaChars;
		writeToOutput(escapedName); // OK
	}

This is just a rough sketch, of course.  A more complete implementation would have to consider what to do about code that obtains unsafe data directly from OS interfaces like core.stdc.stdlib.fread that isn't wrapped by Tainted.

Also, it would have to address what to do about functions like File.rawRead(), that writes to a user-provided buffer, since the caller could just read the tainted data directly from the buffer, bypassing any Tainted protections.

T

-- 
I'm still trying to find a pun for "punishment"...

June 02, 2017

Re: Bad array indexing is considered deadly

Posted by aberba
in reply to Paolo Invernizzi

aberba

Posted in reply to Paolo Invernizzi

On Thursday, 1 June 2017 at 21:55:55 UTC, Paolo Invernizzi wrote:
> On Thursday, 1 June 2017 at 18:54:51 UTC, Timon Gehr wrote:
>> [...]
>
> I really understand what is happening: I've a vibe.d server that's serving a US top 5 FMCG world company, and sometime it goes down for a crash.
>
> [...]

Pretty much it. Containerisation of several stateless instances is pretty much the scalable approach going forward.

June 02, 2017

Re: Bad array indexing is considered deadly

Posted by aberba
in reply to aberba

aberba

Posted in reply to aberba

On Friday, 2 June 2017 at 00:15:39 UTC, aberba wrote:
> On Thursday, 1 June 2017 at 10:13:25 UTC, Steven Schveighoffer wrote:
>> On 5/31/17 6:42 PM, Jonathan M Davis via Digitalmars-d wrote:
>>> [...]
>
>>
>> It just means that D is an inferior platform for a web framework, unless you use the process-per-request model so the entire thing doesn't go down for one page request. But that obviously is going to cause performance problems.
>>
>> Which is unfortunate, because vibe.d is a great platform for web development, other than this. You could go Adam's route and just put the blinders on, but I think that's not a sustainable practice.
>>
>> -Steve
>
> I'm glad I know enough to know this is an opinion...
>
> anyway, its better to run a vibe.d instance in something like daemonized package. You should also use the vibe.d error handlers.

Here is Daemonise https://github.com/NCrashed/daemonize/blob/master/examples/03.Vibed/README.md for running it as a daemon. Offers some control

June 02, 2017

Re: Bad array indexing is considered deadly

Posted by Laeeth Isharc
in reply to Steven Schveighoffer

Laeeth Isharc

Posted in reply to Steven Schveighoffer

On Wednesday, 31 May 2017 at 13:34:25 UTC, Steven Schveighoffer wrote:
> On 5/31/17 9:21 AM, H. S. Teoh via Digitalmars-d wrote:
>> On Wed, May 31, 2017 at 09:04:52AM -0400, Steven Schveighoffer via Digitalmars-d wrote:
>>> I have discovered an annoyance in using vibe.d instead of another web
>>> framework. Simple errors in indexing crash the entire application.
>>>
>>> For example:
>>>
>>> int[3] arr;
>>> arr[3] = 5;
>>>
>>> Compare this to, let's say, a malformed unicode string (exception),
>>> malformed JSON data (exception), file not found (exception), etc.
>>>
>>> Technically this is a programming error, and a bug. But memory hasn't
>>> actually been corrupted. The system properly stopped me from
>>> corrupting memory. But my reward is that even though this fiber threw
>>> an Error, and I get an error message in the log showing me the bug,
>>> the web server itself is now out of commission. No other pages can be
>>> served. This is like the equivalent of having a guard rail on a road
>>> not only stop you from going off the cliff but proactively disable
>>> your car afterwards to prevent you from more harm.
>> [...]
>>
>> Isn't it customary to have the webserver launched by a script that
>> restarts it whenever it crashes (after logging a message in an emergency
>> logfile)?  Not an ideal solution, I know, but at least it minimizes
>> downtime.
>
> Yes, I can likely do this. This kills any existing connections being handled though, and is far far from ideal. It's also a hard crash, any operations such as writing DB data are killed mid-stream.
>
>..
> -Steve

Hi Steve.

Had similar problems early on.  We used supervisord to automatically keep a pool of vibed applications running and put nginx in front as a load balancer. User session info stored in redis.  And a separate process for data communicating with web server over nanomsg.  Zeromq is more mature but I found sometimes socket could get into an inconsistent state if servers crashed midway, and nanomsg doesn't have this problem. So data update either succeeds or fails but no corruption if Web server crashes.

Maybe better ways but it seems to be okay for us.


Laeeth

June 02, 2017

Re: Bad array indexing is considered deadly

Posted by cym13
in reply to H. S. Teoh

cym13

Posted in reply to H. S. Teoh

On Friday, 2 June 2017 at 00:30:39 UTC, H. S. Teoh wrote:
> [...]

Now that I think about it, what we really want going that way is an IO monad.

June 02, 2017

Re: Bad array indexing is considered deadly

Posted by aberba
in reply to Laeeth Isharc

aberba

Posted in reply to Laeeth Isharc

On Friday, 2 June 2017 at 02:11:34 UTC, Laeeth Isharc wrote:
> On Wednesday, 31 May 2017 at 13:34:25 UTC, Steven Schveighoffer wrote:
>> [...]
>
> Hi Steve.
>
> Had similar problems early on.  We used supervisord to automatically keep a pool of vibed applications running and put nginx in front as a load balancer. User session info stored in redis.  And a separate process for data communicating with web server over nanomsg.  Zeromq is more mature but I found sometimes socket could get into an inconsistent state if servers crashed midway, and nanomsg doesn't have this problem. So data update either succeeds or fails but no corruption if Web server crashes.
>
> Maybe better ways but it seems to be okay for us.
>
>
> Laeeth

How does that setup affect response time? Do you cache large query results in redis?

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation