March 26, 2009
Walter Bright wrote:
> Daniel Keep wrote:
>> It should be noted that this is really no different to executing
>> arbitrary code on a machine.  That said, compiling a program is not
>> typically thought of as "executing" code, so some restrictions in this
>> case would probably be prudent.
> 
> Here's the scenario I'm concerned about. Let's say you set up a website that instead of supporting javascript, supports D used as a scripting language. The site thus must run the D compiler on the source code. When it executes the resulting code, that execution presumably will run in a "sandbox" at a low privilege level.
> 
> But the compiler itself will be part of the server software, and may run at a higher privilege. The import feature could possible read any file in the system, inserting it into the executable being built. The running executable could then supply this information to the attacker, even though it is sandboxed.
> 
> This is why even using the import file feature must be explicitly enabled by a compiler switch, and which directories it can read must also be explicitly set with a compiler switch. Presumably, it's a lot easier for the server software to control the compiler switches than to parse the D code looking for obfuscated file imports.

As almost everybody else here, I've maintained a couple of websites.

Using D to write CGI programs (that are compiled, real binaries) is appealing, but I'd never even think about having the web server itself use the D compiler!!!

I mean, how often do you see web sites where stuff is fed to a C compiler and the resulting programs run????? (Yes it's too slow, but that's hardly the point here.) That is simply not done.

Rdmd might get one thinking of such, but then, how many websites use dynamically created PHP? Dynamically created pages yes, but with static PHP source.

I must be missing something big here...

March 26, 2009
Georg Wrede wrote:
> Walter Bright wrote:
>> Daniel Keep wrote:
>>> It should be noted that this is really no different to executing
>>> arbitrary code on a machine.  That said, compiling a program is not
>>> typically thought of as "executing" code, so some restrictions in this
>>> case would probably be prudent.
>>
>> Here's the scenario I'm concerned about. Let's say you set up a website that instead of supporting javascript, supports D used as a scripting language. The site thus must run the D compiler on the source code. When it executes the resulting code, that execution presumably will run in a "sandbox" at a low privilege level.
>>
>> But the compiler itself will be part of the server software, and may run at a higher privilege. The import feature could possible read any file in the system, inserting it into the executable being built. The running executable could then supply this information to the attacker, even though it is sandboxed.
>>
>> This is why even using the import file feature must be explicitly enabled by a compiler switch, and which directories it can read must also be explicitly set with a compiler switch. Presumably, it's a lot easier for the server software to control the compiler switches than to parse the D code looking for obfuscated file imports.
> 
> As almost everybody else here, I've maintained a couple of websites.
> 
> Using D to write CGI programs (that are compiled, real binaries) is appealing, but I'd never even think about having the web server itself use the D compiler!!!
> 
> I mean, how often do you see web sites where stuff is fed to a C compiler and the resulting programs run????? (Yes it's too slow, but that's hardly the point here.) That is simply not done.

Of course it is, probably just not in C. Last time I looked, there are two concepts around, one of "statically-generated dynamic pages" and one of "entirely dynamic pages". I know because I installed an Apache server and at that time support for statically-generated dynamic pages was new.

What that means is this:

a) statically-generated dynamic = you generate the page once, it's good until the source of the page changes;

b) "really" dynamic page = you generate the page at each request.

> Rdmd might get one thinking of such, but then, how many websites use dynamically created PHP? Dynamically created pages yes, but with static PHP source.
> 
> I must be missing something big here...

I think D with rdmd would be great for (a).


Andrei
March 26, 2009
Georg Wrede wrote:
> As almost everybody else here, I've maintained a couple of websites.
> 
> Using D to write CGI programs (that are compiled, real binaries) is appealing, but I'd never even think about having the web server itself use the D compiler!!!
> 
> I mean, how often do you see web sites where stuff is fed to a C compiler and the resulting programs run????? (Yes it's too slow, but that's hardly the point here.) That is simply not done.

Similarly, how often do you do code generation in a PHP application? You can do it, and I'm sure people use eval for small things, but anything bigger than that, it just becomes a mess.
March 26, 2009
Andrei Alexandrescu wrote:
> Georg Wrede wrote:
>> Walter Bright wrote:
>>> Daniel Keep wrote:
>>>> It should be noted that this is really no different to executing
>>>> arbitrary code on a machine.  That said, compiling a program is not
>>>> typically thought of as "executing" code, so some restrictions in this
>>>> case would probably be prudent.
>>>
>>> Here's the scenario I'm concerned about. Let's say you set up a website that instead of supporting javascript, supports D used as a scripting language. The site thus must run the D compiler on the source code. When it executes the resulting code, that execution presumably will run in a "sandbox" at a low privilege level.
>>>
>>> But the compiler itself will be part of the server software, and may run at a higher privilege. The import feature could possible read any file in the system, inserting it into the executable being built. The running executable could then supply this information to the attacker, even though it is sandboxed.
>>>
>>> This is why even using the import file feature must be explicitly enabled by a compiler switch, and which directories it can read must also be explicitly set with a compiler switch. Presumably, it's a lot easier for the server software to control the compiler switches than to parse the D code looking for obfuscated file imports.
>>
>> As almost everybody else here, I've maintained a couple of websites.
>>
>> Using D to write CGI programs (that are compiled, real binaries) is appealing, but I'd never even think about having the web server itself use the D compiler!!!
>>
>> I mean, how often do you see web sites where stuff is fed to a C compiler and the resulting programs run????? (Yes it's too slow, but that's hardly the point here.) That is simply not done.
> 
> Of course it is, probably just not in C. Last time I looked, there are two concepts around, one of "statically-generated dynamic pages" and one of "entirely dynamic pages". I know because I installed an Apache server and at that time support for statically-generated dynamic pages was new.
> 
> What that means is this:
> 
> a) statically-generated dynamic = you generate the page once, it's good until the source of the page changes;
> 
> b) "really" dynamic page = you generate the page at each request.

Have you ever done web development? If so, did you actually do *code generation* on each page request? If so, I never want to work with you.

Web applications in compiled languages pretty much never invoke the compiler when they're running. Very few programs need a compiler on the machine they're deployed to. It's a security risk, and it's an unneeded dependency, and it pretty much guarantees a maintenance and debugging problem, and it promises performance issues.
March 26, 2009
Andrei Alexandrescu wrote:
> Georg Wrede wrote:
>> Walter Bright wrote:
>>> Daniel Keep wrote:
>>>> It should be noted that this is really no different to executing
>>>> arbitrary code on a machine.  That said, compiling a program is not
>>>> typically thought of as "executing" code, so some restrictions in this
>>>> case would probably be prudent.
>>>
>>> Here's the scenario I'm concerned about. Let's say you set up a website that instead of supporting javascript, supports D used as a scripting language. The site thus must run the D compiler on the source code. When it executes the resulting code, that execution presumably will run in a "sandbox" at a low privilege level.
>>>
>>> But the compiler itself will be part of the server software, and may run at a higher privilege. The import feature could possible read any file in the system, inserting it into the executable being built. The running executable could then supply this information to the attacker, even though it is sandboxed.
>>>
>>> This is why even using the import file feature must be explicitly enabled by a compiler switch, and which directories it can read must also be explicitly set with a compiler switch. Presumably, it's a lot easier for the server software to control the compiler switches than to parse the D code looking for obfuscated file imports.
>>
>> As almost everybody else here, I've maintained a couple of websites.
>>
>> Using D to write CGI programs (that are compiled, real binaries) is appealing, but I'd never even think about having the web server itself use the D compiler!!!
>>
>> I mean, how often do you see web sites where stuff is fed to a C compiler and the resulting programs run????? (Yes it's too slow, but that's hardly the point here.) That is simply not done.
> 
> Of course it is, probably just not in C. Last time I looked, there are two concepts around, one of "statically-generated dynamic pages" and one of "entirely dynamic pages". I know because I installed an Apache server and at that time support for statically-generated dynamic pages was new.
> 
> What that means is this:
> 
> a) statically-generated dynamic = you generate the page once, it's good until the source of the page changes;
> 
> b) "really" dynamic page = you generate the page at each request.
> 
>> Rdmd might get one thinking of such, but then, how many websites use dynamically created PHP? Dynamically created pages yes, but with static PHP source.
>>
>> I must be missing something big here...
> 
> I think D with rdmd would be great for (a).

I'm still not sure what you mean. I see it as static (as in plain html) vs dynamic (as, FaceBook, Wikipedia, etc.). Now these dynamic pages can be php pages, that get their data from a database (I guess wikimedia would be a good example), but neither case involves creating the server side programs (as in *.php, *.cgi) dynamically.

Or sort-of. Many PHP web applications contain pages that dynamically choose which sub-elements (say a news ticker) to "show", but that's still just combinations of prewritten "mini-pages", if you will. (Some even have them in a RDBMS.)

But a use case where one would need to create CGI-BIN stuff that is so variable as to warrant recompiling, I don't see. One would rather have a set of small D programs (binaries) that do small things, like one for latest news, one for informing about others online, etc.

--------

Of course there are sites where I can type D source code in a box, and have it compiled and run. But I'm sure neither of us are talking about such sites? I mean, to do that, the administrator usually knows what he's doing! And can take care of himself, which means we don't have to accommodate his needs.
March 27, 2009
Georg Wrede wrote:
> I mean, how often do you see web sites where stuff is fed to a C compiler and the resulting programs run????? (Yes it's too slow, but that's hardly the point here.) That is simply not done.

Consider the Java JVM. You've probably got one installed on your computer. It gets java code from gawd knows where (as the result of web browsing), it compiles it, and runs it on your machine unbeknownst to you.

.NET does that too.

Every day my browser downloads javascript code, compiles it, and runs it.

There's no reason in principle that D could not be used instead.

This means that we should think about security issues. Compiling untrusted code should not result in an attack on your system.

http://www.comeaucomputing.com lets you upload random C++ code, compile it on their system, and view the messages put out by their compiler. Suppose you did it with D, had it import some sensitive file, and put it out with a pragma msg statement?
March 27, 2009
Walter Bright wrote:
> Georg Wrede wrote:
>> I mean, how often do you see web sites where stuff is fed to a C compiler and the resulting programs run????? (Yes it's too slow, but that's hardly the point here.) That is simply not done.
> 
> Consider the Java JVM. You've probably got one installed on your computer. It gets java code from gawd knows where (as the result of web browsing), it compiles it, and runs it on your machine unbeknownst to you.
> 
> .NET does that too.
> 
> Every day my browser downloads javascript code, compiles it, and runs it.
> 
> There's no reason in principle that D could not be used instead.
> 
> This means that we should think about security issues. Compiling untrusted code should not result in an attack on your system.
> 
> http://www.comeaucomputing.com lets you upload random C++ code, compile it on their system, and view the messages put out by their compiler. Suppose you did it with D, had it import some sensitive file, and put it out with a pragma msg statement?

Your compiler can do the same:
http://codepad.org/hWC9hbPQ
March 27, 2009
grauzone wrote:
> Walter Bright wrote:
>> http://www.comeaucomputing.com lets you upload random C++ code, compile it on their system, and view the messages put out by their compiler. Suppose you did it with D, had it import some sensitive file, and put it out with a pragma msg statement?
> 
> Your compiler can do the same:
> http://codepad.org/hWC9hbPQ

That's awesome!
March 27, 2009
Walter Bright wrote:
> Georg Wrede wrote:
>> I mean, how often do you see web sites where stuff is fed to a C compiler and the resulting programs run????? (Yes it's too slow, but that's hardly the point here.) That is simply not done.
> 
> Consider the Java JVM. You've probably got one installed on your computer. It gets java code from gawd knows where (as the result of web browsing), it compiles it, and runs it on your machine unbeknownst to you.

I was talking server side. But now that you mention, it would be kind of cool to have D within the browser.

> .NET does that too.
> 
> Every day my browser downloads javascript code, compiles it, and runs it.
> 
> There's no reason in principle that D could not be used instead.

True. But then, what would happen to the Systems Language image of D in folks' minds, if it is run in a browser, next to Javascript, Java, and who knows what "toy" languages? Would Phobos then have to be replaced with another library for running the app within a browser?

> This means that we should think about security issues. Compiling untrusted code should not result in an attack on your system.

Well, removing disk file ops, and OS APIs in general would be the first step. And if you restrict some include paths, then, for symmetry, you should restrict all command line file paths similarly. I think there's a lot to do here. A half baked version would just give bad PR, but a proper and tight version presumably is quite some work -- but is the reward worth it? Is it established that enough people would use it?

Are you thinking of having a parallell Phobos tree for this, or doing it with conditional compilation?

> http://www.comeaucomputing.com lets you upload random C++ code, compile it on their system, and view the messages put out by their compiler. Suppose you did it with D, had it import some sensitive file, and put it out with a pragma msg statement?

Well, even they don't let you run the compiled programs. (But that's probably mostly because kids would waste all the cpu cycles.) And I'd bet the compiler is chrooted, just to be safe. Chrooting would take care of that D pragma "exploit", too.

Ahhh, about links: it's the responsibility of the operator to see that there are no unnecessary links within the chroot tree. But some installations need (and others want, for specific reasons) to have soft links, so I don't think the compiler should frawn upon them. It's really not the compiler's responsibility. Similarly, there may be hard links, and while even they can be "detected", it's really not called for. (One could call this Compartmentalization of Responsibility, unheard of in Windows.)

---

But really, what I'm wondering here is, is this yet another "hey let's do this" thing?? Can we go on like this till September? And where would we be then? Shouldn't there be like a roadmap, or something? Or priorities?
March 27, 2009
Walter Bright wrote:
> grauzone wrote:
>> Walter Bright wrote:
>>> http://www.comeaucomputing.com lets you upload random C++ code, compile it on their system, and view the messages put out by their compiler. Suppose you did it with D, had it import some sensitive file, and put it out with a pragma msg statement?
>>
>> Your compiler can do the same:
>> http://codepad.org/hWC9hbPQ
> 
> That's awesome!

And the system seems protected, too: http://codepad.org/mzAgmvZZ