Jump to page: 1 2
Thread overview
interpolation proposals and safety
Dec 23, 2023
Bruce Carneal
Dec 23, 2023
Adam D Ruppe
Aug 22
kdevel
Aug 29
Grandoz
Aug 30
kdevel
Aug 30
kdevel
Aug 30
kdevel
Aug 31
kdevel
December 23, 2023

Are both of the interpolation proposals (1027 and 1036e) equally helpful when seeking to avoid SQL injection attacks, DOM XSS vulnerabilities and the like?

Is it really easy, trivial even, to button things up with either proposal or is one easier to use correctly than the other?

I am neither an SQL savant nor a Web programmer, but it seems like safety in this area would be a real plus for D going forward.

December 23, 2023

On Saturday, 23 December 2023 at 22:55:34 UTC, Bruce Carneal wrote:

>

Is it really easy, trivial even, to button things up with either proposal or is one easier to use correctly than the other?

1027 makes it possible to do some cases correctly, but difficult to trust in the general case since it makes no attempt at type safety and its string cannot differentiate between user-injected strings and format string literals.

So, when you process a 1027 style format string, and see a %, was that part of the string or was that injected by the compiler to indicate a param placeholder? What if the user forgets to escape something, or passes the wrong syntax as a custom specifier? These are all unforced errors in the design of 1027, that led to its DIP being rejected by community review.

On the other hand, 1036e corrects these flaws, while adding the possibility for CTFE manipulation, aggregation, and verification of all string literals passed.

I encourage everyone to look at the sample repository here:

https://github.com/adamdruppe/interpolation-examples/

Several of the use cases selected for that specifically demonstrate how it gives the users the convenient syntax they expect from string interpolation, yet actually lowers to the correct semantics for each specialized problem domain.

Example #1, basics, shows how, when a string is the right thing to do, it works quite easily for it.

Example #2, formatting, shows how format strings can be attached and processed in library code, including compile-time verification associated with the data types passed.

Example #3, printf, shows how you can adapt the advanced usage D provides to be compatible with legacy functions in a zero-runtime-cost manner.

Example #4, internationalization, builds off the techniques shown in the previous examples to use the industry-standard GNU gettext library, coupled with automatic aggregation of translatable strings at compile time, to provide full context to non-developers to add new language packs at run time.

The next three examples are directly relevant to your question, and address common problems web developers face, where security problems are often introduced where strings are convenient, but no longer appropriate for correctness.

Example #5, urls, shows how you can build off the previously demonstrated techniques, to make a directly-manipulable high-level object out of what looks to be a simple, familiar string. Since it works at a high level, aware of the surrounding context, it ensures each injected component is encoded appropriately for that context.

Example #6, sql, directly avoids the trap of sql injection by separating code and data - delegating the recombination of them to the database engine to do it safely and correctly, yet appearing to the user to be a convenient mixture of the two! Notice how the usage example, at the top level of the repository, looks like string interpolation, yet the implementation, in the lib folder, actually binds the data to a prepared statement in a structured way, like the guides say you are supposed to!

Finally, example #7, directly avoids the trap of XSS holes by, again, separating HTML structure from added data and ensuring correct encodings and valid data positioning is done in all contexts. With CTFE validation, it prevents common mistakes that can manifest as bugs or exploitable holes in production, and by working on a high level, using object representations instead of raw strings, it ensures all semantic invariants are maintained from creation to consumption. It goes beyond just bringing web best practices to the D programming language - it also enables innovation by allowing coupling of these security guidelines and development best practices with D's unique features for static analysis and compile-time processing.

Similar examples could be written for shell scripting, json, and more, but I thought this was enough to make the point and demonstrate the relevant patterns.

By the end of this year, when this new feature is merged, D will cement its position as an innovating pioneer, learning the lessons from the past and applying their best libraries in a whole new way.

August 22

On Saturday, 23 December 2023 at 23:33:31 UTC, Adam D Ruppe wrote:

>

1027 makes it possible to do some cases correctly, but difficult to trust in the general case since it makes no attempt at type safety and its string cannot differentiate between user-injected strings and format string literals.

As I will point out below, the current implementation (DMD v2.109.1)
doesn't do either. At least not in the HTML case.

>

[...]

On the other hand, 1036e corrects these flaws, while adding the possibility for CTFE manipulation, aggregation, and verification of all string literals passed.

I encourage everyone to look at the sample repository here:

https://github.com/adamdruppe/interpolation-examples/

>

[...]

>

Finally, example #7, directly avoids the trap of XSS holes by, again, separating HTML structure from added data and ensuring correct encodings and valid data positioning is done in all contexts. [...]

One way to "commit" a mistake is by omitting necessary parts. In A CGI context the webserver is reading the stdout of the CGI application. The original example (with comments stripped) is:

import lib.html;

void main() {
   string name = "<bar>";
   auto element = i"<foo>$(name)</foo>".html;
   assert(element.tagName == "foo");

   import std.stdio;
   writeln(element.toString());

}

Now i forget to import lib.html and to call html on the IES:

void main() {
	string name = "<script>alert(-1)</script>";
	auto element = i"<foo>$(name)</foo>";

	import std.stdio;
	writeln(element);
}
$ dmd htmli.d
$ ./htmli
<foo><script>alert(-1)</script></foo>

name may have been a URL parameter or may be part of the POST body. The important part is that it is attacker supplied and controlled.

writeln should not print unadorned interpolated string expressions.

August 29

On Thursday, 22 August 2024 at 19:34:32 UTC, kdevel wrote:

>

On Saturday, 23 December 2023 at 23:33:31 UTC, Adam D Ruppe wrote:

>

[...]

As I will point out below, the current implementation (DMD v2.109.1)
doesn't do either. At least not in the HTML case.

[...]

Merci

August 29

On Thursday, 22 August 2024 at 19:34:32 UTC, kdevel wrote:

>

Now i forget to import lib.html and to call html on the IES:

void main() {
	string name = "<script>alert(-1)</script>";
	auto element = i"<foo>$(name)</foo>";

	import std.stdio;
	writeln(element);
}
$ dmd htmli.d
$ ./htmli
<foo><script>alert(-1)</script></foo>

name may have been a URL parameter or may be part of the POST body. The important part is that it is attacker supplied and controlled.

writeln should not print unadorned interpolated string expressions.

The real problem here is that the type system does not distinguish between strings that are controlled by the user (and thus may contain malicious data) and strings that are controlled by the programmer. If you define a separate type for user-controlled strings, the mistake is easily caught at compile time:

struct UserString
{
	string unwrap;
	@disable string toString();
}

void main() {
	auto name = UserString("<script>alert(-1)</script>");
	auto element = i"<foo>$(name)</foo>";

	import std.stdio;
	writeln(element);
	// Error: static assert:  "UserString cannot be formatted
	// because its `toString` is marked with `@disable`"
}
August 29

On Thursday, 22 August 2024 at 19:34:32 UTC, kdevel wrote:

>

writeln should not print unadorned interpolated string expressions.

I find this argument unconvincing.

You can print anything with writeln. Even if an IES was something else that doesn't print nicely, writeln will still print it.

The point of making IES play nice with writeln is that it is a major expectation of any kind of interpolation setup. People just expect to log interpolated sequences that have their stuff in it.

These concerns don't translate to other domains. In most SQL libraries, there is not a best effort function that just attempts to translate all data to strings and executes it.

Assigning the thing to a string doesn't work either.

Basically, you found just a very narrow example that is unlikely to exist, but indeed might be confusing if an exact series of mistakes are made. Even without IES, a user is equally likely to use writef to make the same mistake.

-Steve

August 30

On Thursday, 29 August 2024 at 14:18:48 UTC, Paul Backus wrote:

> >

[...]

writeln should not print unadorned interpolated string expressions.

The real problem here is that the type system does not distinguish between strings that are controlled by the user (and thus may contain malicious data) and strings that are controlled by the programmer. If you define a separate type for user-controlled strings, the mistake is easily caught at compile time:

Sure. But if you forget to do so, you have a "typesafe" implementation of XSS. Using the facilities of 1036e in a careless way is actually unsafe.

Ideally compilation of such unadorned writes would fail.

August 30

On Thursday, 29 August 2024 at 14:21:24 UTC, Steven Schveighoffer wrote:

>

On Thursday, 22 August 2024 at 19:34:32 UTC, kdevel wrote:

>

writeln should not print unadorned interpolated string expressions.

I find this argument unconvincing.

You can print anything with writeln. [...]

Not really, e.g. in the case of an object the class name will be printed instead of the potentially dangerous content:

import std.stdio;

class C {
   string s;
   this (string s)
   {
      this.s = s;
   }
}

void main ()
{
   auto c = new C ("<script>alert(-1)</script>");
   writeln (c);
}
$ dmd classprint.d
$ ./classprint
classprint.C

In a superior implementation of write(ln) this would simply also not compile. I mean there is a difference between printing the data payload to the output channel and OTOH dumping debug information to the developer.

>

The point of making IES play nice with writeln is that it is a major expectation of any kind of interpolation setup. People just expect to log interpolated sequences that have their stuff in it.

I don't know if you noticed your own wording: We are expecting to "log" IES data but not to "print" them to the output channel.

>

[SQL]

Basically, you found just a very narrow example that is unlikely to exist, but indeed might be confusing if an exact series of mistakes are made. Even without IES, a user is equally likely to use writef to make the same mistake.

With post-1036e D the user has now three equally potent ways to shoot theirself in the foot:

     data = "alert (-1)";
     writeln ("<script>" ~ data ~ "</script>");
     data = "alert (-1)";
     writeln (format!"<script>%s</script>" (data));
     data = "alert (-1)";
     writeln (i"<script>$(data)</script>");
August 30

On Friday, 30 August 2024 at 12:07:47 UTC, kdevel wrote:

>

[...]

The examples should read like

     auto data = "<script>alert (-1)</script>";
     writeln ("<div>" ~ data ~ "</div>"); // 1.
     writeln (format!"<div>%s</div>" (data)); // 2.
     writeln (i"<div>$(data)</div>"); // 3.
August 30

On Friday, 30 August 2024 at 11:18:10 UTC, kdevel wrote:

>

On Thursday, 29 August 2024 at 14:18:48 UTC, Paul Backus wrote:

>

The real problem here is that the type system does not distinguish between strings that are controlled by the user (and thus may contain malicious data) and strings that are controlled by the programmer. If you define a separate type for user-controlled strings, the mistake is easily caught at compile time:

Sure. But if you forget to do so, you have a "typesafe" implementation of XSS. Using the facilities of 1036e in a careless way is actually unsafe.

Ideally compilation of such unadorned writes would fail.

Interpolation is just syntax sugar. You can use it to build safe APIs, or unsafe ones. writeln is not a safe API (w.r.t. XSS), and will not magically become one just because you used interpolation to pass your arguments to it.

If you were lead to believe that interpolation would somehow make existing unsafe APIs safe (from injection attacks, XSS, etc.), then you were misled.

« First   ‹ Prev
1 2