interpolation proposals and safety (page 2)

Settings

Help

Index » General » interpolation proposals and safety (page 2)

August 30

Re: interpolation proposals and safety

Posted by Adam D Ruppe
in reply to kdevel

Permalink

Adam D Ruppe

Posted in reply to kdevel

Permalink

On Friday, 30 August 2024 at 11:18:10 UTC, kdevel wrote:
> Ideally compilation of such unadorned writes would fail.

for what its worth, after starting to use this more, I'm actually inclined to agree with you.

The idea behind 1036e is that it allows the language to provide the pieces that library authors can use to make things that do the right thing. Two existing library functions work with it out of the box in a reasonably good way: std.conv.text and std.stdio.writeln.

I've seen people in the wild using std.conv.text to do things like make sql queries - exactly what you're not supposed to do. We want some way to do a toString, but it should be a bit of a pain: doing it the right thing should be the easy way. Some of this is just because library support is still a bit immature - I still haven't even put it in place in all the places arsd wants it - but part of it means committing to the change too.

So like we currently have `db.query(string, args...)` and... maybe that should be deprecated, so people stop trying to do `db.query(text(istring))`. Maybe text(istring) itself should be deprecated. It is some breakage though that requires users update their compiler so that makes it more complicated. Also, you sometimes DO want a runtime string, so it probably more of a rename than a remove.

I didn't think about using writeln directly in a cgi context. There's MUCH better ways to make cgi programs! And much better ways to construct html too. And those libraries tend to make you be more careful with these cases. But indeed, writeln bypasses all that. I do think this is a harder case though... outright banning writeln in these cases isn't super realistic so the question is where you balance people's capability of doing it wrong with the flexibility to build anything on top at all.

A lot of it comes back to making the better way the easy way: use my cgi library, and `cgi.write(i"istring")` is a compile error. But it still doesn't prevent you from bypassing the library and doing it wrong. Still, my feeling is if you make the right way just as easy as the wrong way, at least it helps point people the right way.

August 30

Re: interpolation proposals and safety

Posted by Steven Schveighoffer
in reply to kdevel

Permalink

Steven Schveighoffer

Posted in reply to kdevel

Permalink

On Friday, 30 August 2024 at 12:07:47 UTC, kdevel wrote:

On Thursday, 29 August 2024 at 14:21:24 UTC, Steven Schveighoffer wrote:

On Thursday, 22 August 2024 at 19:34:32 UTC, kdevel wrote:

writeln should not print unadorned interpolated string expressions.

I find this argument unconvincing.

You can print anything with writeln. [...]

Not really, e.g. in the case of an object the class name will be printed instead of the potentially dangerous content:

import std.stdio;

class C {
   string s;
   this (string s)
   {
      this.s = s;
   }
}

void main ()
{
   auto c = new C ("<script>alert(-1)</script>");
   writeln (c);
}

But it still prints something. I guarantee if someone could have figured out how to make it print all the fields via a runtime interface, this would have happened.

In other words, the expectation is that the toString value is useful for string representation logging by default. The class name is the best you can do with a runtime-only interface by default, so that's what it is. It wasn't a conscious choice with HTML injection in mind.

In a superior implementation of write(ln) this would simply also not compile. I mean there is a difference between printing the data payload to the output channel and OTOH dumping debug information to the developer.

If Object.toString() didn't exist, then writeln(someObject) would by default print the class name via typeid(someObject).toString(). There is no chance this would be an error by default.

> >

The point of making IES play nice with writeln is that it is a major expectation of any kind of interpolation setup. People just expect to log interpolated sequences that have their stuff in it.

I don't know if you noticed your own wording: We are expecting to "log" IES data but not to "print" them to the output channel.

Yes, I purposely said log, because that's the intention of writeln. It's logging the most useful data to the console. For structs, this is all the fields, and the struct name.

For IES, this is what it does for string representation, because that's what people would expect. Use any other language with interpolation, and you will find it does the same thing when logging. Can you imagine what the blowback would be if writeln(i"hello $(name)!"); was an error?

If anything, this draws attention to the pitfalls of CGI in general.

For instance, what if you call a function that logs something to the console? That gets included in your html.

> >

[SQL]

Basically, you found just a very narrow example that is unlikely to exist, but indeed might be confusing if an exact series of mistakes are made. Even without IES, a user is equally likely to use writef to make the same mistake.

With post-1036e D the user has now three equally potent ways to shoot theirself in the foot:

     data = "alert (-1)";
     writeln ("<script>" ~ data ~ "</script>");

     data = "alert (-1)";
     writeln (format!"<script>%s</script>" (data));

     data = "alert (-1)";
     writeln (i"<script>$(data)</script>");

writefln("<script>%s</script>", data);

When you have a footgun cabinet that is almost full, I'm not sure adding one more is something to worry about.

-Steve

August 30

Re: interpolation proposals and safety

Posted by H. S. Teoh
in reply to Steven Schveighoffer

Permalink

H. S. Teoh

Posted in reply to Steven Schveighoffer

Permalink

On Fri, Aug 30, 2024 at 02:36:56PM +0000, Steven Schveighoffer via Digitalmars-d wrote:
> On Friday, 30 August 2024 at 12:07:47 UTC, kdevel wrote:
[...]
> > With post-1036e D the user has now three equally potent ways to shoot theirself in the foot:
> > 
> > 1.
> > ```
> >      data = "alert (-1)";
> >      writeln ("<script>" ~ data ~ "</script>");
> > ```
> > 
> > 2.
> > ```
> >      data = "alert (-1)";
> >      writeln (format!"<script>%s</script>" (data));
> > ```
> > 
> > 3.
> > ```
> >      data = "alert (-1)";
> >      writeln (i"<script>$(data)</script>");
> > ```
> 
> 4.
> ```d
> writefln("<script>%s</script>", data);
> ```
[...]

Actually, the footgun here is just one: writeln + CGI.  The rest are merely specific instances of the same thing: writing unvetted data to your output stream.  No amount of cleverness is going to prevent this; you, as the programmer, are responsible for making sure the program logic properly vets all data before sending them to the output stream. If some coder wannabe can't figure out how to properly segregate their input data from their output data, no amount of programming language features will be able to help them.  The program logic has to be structured in such a way that *all* input data is properly escaped, or *all* output data is properly encoded.  The latter is much harder; recoding input data is recommended.  If the fundamental program logic is flawed, no magic feature will be able to fix the resulting problems.

T

-- 
Study gravitation, it's a field with a lot of potential.

August 30

Re: interpolation proposals and safety

Posted by Adam D Ruppe
in reply to Steven Schveighoffer

Permalink

Adam D Ruppe

Posted in reply to Steven Schveighoffer

Permalink

On Friday, 30 August 2024 at 14:36:56 UTC, Steven Schveighoffer wrote:
> Yes, I purposely said log, because that's the intention of `writeln`.

Gonna disagree with this: writeln's intention is to write a line to the stdout stream. It has no more semantic knowledge; it is fundamentally a low level function with a wide variety of high level uses.

You can use it for logging to the console, but it might also go to a file, but it might also go to a user for interactive use ("Enter your name: "), and it might also be structured output intended to be consumed by another program: ./yourapp | whatever.

CGI, on the low level, works as one of those unix pipelines. You ought to be more careful with that, but writeln has no idea how it is actually going to be used.

> For instance, what if you call a function that logs something to the console? That gets included in your html.

They're supposed to use the stderr stream for that kind of thing, which is traditionally redirected to the server's log file.

But yeah, writeln is a low level function and you have to treat it as such - limiting what it is allowed to do can help one use case but hurt another, so that's hard to say... but at the same time, users really shouldn't be encouraged to use it. There should be high level intent libs that do those jobs all better and easier.

August 30

Re: interpolation proposals and safety

Posted by Steven Schveighoffer
in reply to Adam D Ruppe

Permalink

Steven Schveighoffer

Posted in reply to Adam D Ruppe

Permalink

On Friday, 30 August 2024 at 15:21:49 UTC, Adam D Ruppe wrote:

On Friday, 30 August 2024 at 14:36:56 UTC, Steven Schveighoffer wrote:

Yes, I purposely said log, because that's the intention of writeln.

Gonna disagree with this: writeln's intention is to write a line to the stdout stream. It has no more semantic knowledge; it is fundamentally a low level function with a wide variety of high level uses.

writeln has a default mechanism for writing out data. What does this default mechanism do? It dumps the contents of a struct. This screams "logging".

If writeln only supported strings, or required opt-in to string conversion (i.e. erroring without explicit toString function), then it would be clearer. But I can't see any purpose for this default other than logging data when it's useful.

But technically, this job belongs to format, so writeln indeed is a mechanism to output formatted data to a stream. It's just that the combination of formatting data with the default format being "contents of object" is best suited to logging.

If one is worried about accidentally writing badly-formatted or accidentally revealing data, one shouldn't use writeln.

This reminds me of javascript when you get a field on the screen and the text is [Object object]

-Steve

August 31

Re: interpolation proposals and safety

Posted by kdevel
in reply to H. S. Teoh

Permalink

kdevel

Posted in reply to H. S. Teoh

Permalink

On Friday, 30 August 2024 at 15:09:39 UTC, H. S. Teoh wrote:
> The program logic has to be structured in such a way that *all* input data is properly escaped,

What does that mean? Can, after such input escaping, Bobby Tables first name still be stored in the database?

> or *all* output data is properly encoded.  The latter is much harder; recoding input data is recommended.

By whom? And "recode" (re-encode) into which code?

The web is full of content recommending the opposite:

- https://benhoyt.com/writings/dont-sanitize-do-escape/

"Every so often developers talk about “sanitizing user input” to prevent cross-site scripting attacks. This is well-intentioned, but leads to a false sense of security, and sometimes mangles perfectly good input."

- https://cheatsheetseries.owasp.org/cheatsheets/Cross_Site_Scripting_Prevention_Cheat_Sheet.html

- https://security.stackexchange.com/questions/32394/when-to-escape-user-input (2013)

"In general, you want to keep strings as strings, and delegate any encoding or escaping to specialized functions which do that well. For instance, for SQL, you use prepared statements. With HTML from a PHP context, you would use htmlspecialchars()."

- https://lukeplant.me.uk/blog/posts/why-escape-on-input-is-a-bad-idea/

"The right way to handle issues with untrusted data is:

    Filter on input, escape on output

This means that you validate or limit data that comes in (filter), but only transform (escape or encode) it at the point you are sending it as output to another system that requires the encoding. It has been standard best practice since just about forever [citation required].

[...]

First of all, escape-on-input is just wrong – you've taken some input and applied some transformation that is totally irrelevant to that data. If, taking our example, you have some data collected by HTTP POST or GET parameters, applying HTML escaping to it is a layering violation – it mixes an output formatting concern into input handling. Layering violations make your code much harder to understand and maintain, because you have to take into account other layers instead of letting each component and layer do its own job.

Doing things ‘right’ is very important, even if doing them ‘wrong’ seems to work and you are tempted to be dismissive of ‘theoretical’ concerns about purity etc. When you have to maintain code, you will be very glad if things are in the right place, and not full of hacks and surprises."

Top | Forum index | About this forum

Forums