December 18, 2010
"foobar" <foo@bar.com> wrote in message news:ieijt6$21mh$1@digitalmars.com...
> Don Wrote:
>
>> VladD2 wrote:
>> > Don Wrote:
>> >> Suppose the pre-compiled code, when run, asks what CPU it's on. What's the answer? Is it X or Y?
>> >
>> > Current: X
>> > Target: Y
>> >
>> > Macro - a plugin to the compiler. It works on the same platform as the
>> > compiler, but generates code through the API which abstracts the macro
>> > from the target platform. If you need generate platform specific code,
>> > you should worry about it in macro logic.
>> > In any case macro is a meta-programm wich generate or/and transform
>> > code.
>>
>> Yes. But in D there's no distinction between code which is destined for a macro, versus any other function. You can call a function once at compile time, and the same function at compile time.

I think you mean "You can call a function once at compile time, and the same function at **runtime**."


>> My understanding of
>> Nemerle (which is quite likely to be wrong!) is that at least some
>> functions are callable only at compile-time.
>>

I'd be surprised. I would think that all you would have to do to use the same Nemerle function at both runtime and compile-time would be to include its module in both the "compile the compiler-plugin step" and in the "load compiler-plugins and compile the app" step.


>
> I don't see how there needs to be different code to accomplish the above use case. You have a function that tests the hardware it's being run on. When this function is called in the compiler context it would return X, when it's called from the target executable it returns Y.
>

The problem with that is, what if you're generating target-platform-specific code at compile-time? You'd be generating code for the wrong platform. I think VladD2 is right: You need to keep track of both "current" system and "target" system. Unfortunately, there is some information about the "target" system the compile-time code wouldn't be able discern without giving it the ability to run code (RPC? Virtualization? Really, really good emulator?) on the target system, but then again, that's a limitation with any cross-compiling scenario.


>> I'm also scared of the implications of allowing arbitrary code execution during compilation. Make a typo in your program, and then compilation may wipe files from your hard disk, or corrupt an external database, etc... On some platforms you may be able to sandbox it, but since it's running as part of the compilation process, rather than with the permissions it will eventually have, it just seems like a security nightmare.
>>

That's an interesting point. OTOH, it's common for software to come with its own build script or makefile, and those can certainly do literally anything like you describe above - but I haven't seen that as being a real problem.


>
> This is a void argument since templates are Turing complete.
> It's *already* possible to do all of the above in D at compile time, it's
> just a matter of how much code is required to accomplish this.
>

Not true. This is a frequent misconception about Turning-completeness. Just because something is Turing complete does *not* mean it can do anything that any other Turing complete system can do (contrary to how many college profs explain it). It *does* mean that it can *calculate* anything that any other Turing complete system can calculate. But it doesn't necessarily have *access* to everything that any other Turing complete system has access to. And D's compile-time system, whether CTFE or templates, does not currently provide any way to access any I/O, launch any processes, or do any direct memory access.



December 18, 2010
Nick Sabalausky wrote:
> "foobar" <foo@bar.com> wrote in message news:ieijt6$21mh$1@digitalmars.com...
>> Don Wrote:
>>
>>> VladD2 wrote:
>>>> Don Wrote:
>>>>> Suppose the pre-compiled code, when run, asks what CPU it's on. What's
>>>>> the answer? Is it X or Y?
>>>> Current: X
>>>> Target: Y
>>>>
>>>> Macro - a plugin to the compiler. It works on the same platform as the compiler, but generates code through the API which abstracts the macro from the target platform. If you need generate platform specific code, you should worry about it in macro logic.
>>>> In any case macro is a meta-programm wich generate or/and transform code.
>>> Yes. But in D there's no distinction between code which is destined for
>>> a macro, versus any other function. You can call a function once at
>>> compile time, and the same function at compile time.
> 
> I think you mean "You can call a function once at compile time, and the same function at **runtime**."
> 
> 
>>> My understanding of
>>> Nemerle (which is quite likely to be wrong!) is that at least some
>>> functions are callable only at compile-time.
>>>
> 
> I'd be surprised. I would think that all you would have to do to use the same Nemerle function at both runtime and compile-time would be to include its module in both the "compile the compiler-plugin step" and in the "load compiler-plugins and compile the app" step.
> 
> 
>> I don't see how there needs to be different code to accomplish the above use case. You have a function that tests the hardware it's being run on. When this function is called in the compiler context it would return X, when it's called from the target executable it returns Y.
>>
> 
> The problem with that is, what if you're generating target-platform-specific code at compile-time? You'd be generating code for the wrong platform. I think VladD2 is right: You need to keep track of both "current" system and "target" system. Unfortunately, there is some information about the "target" system the compile-time code wouldn't be able discern without giving it the ability to run code (RPC? Virtualization? Really, really good emulator?) on the target system, but then again, that's a limitation with any cross-compiling scenario.

Note that for this to work at all, the compiler needs to be able to generate exectuable code for platform X as well as for Y -- that is, it needs to include two back-ends.

>>> I'm also scared of the implications of allowing arbitrary code execution
>>> during compilation. Make a typo in your program, and then compilation
>>> may wipe files from your hard disk, or corrupt an external database,
>>> etc... On some platforms you may be able to sandbox it, but since it's
>>> running as part of the compilation process, rather than with the
>>> permissions it will eventually have, it just seems like a security
>>> nightmare.
>>>
> 
> That's an interesting point. OTOH, it's common for software to come with its own build script or makefile, and those can certainly do literally anything like you describe above - but I haven't seen that as being a real problem.

I don't think it's quite the same. In a makefile, every executable is listed, and so you can have some degree of control over it. But in this scenario, the compiler is making calls to arbitrary shared libraries with arbitrary parameters.
It means the compiler cannot be trusted *at all*.
December 18, 2010
"Don" <nospam@nospam.com> wrote in message news:iej7eu$b1l$1@digitalmars.com...
> Nick Sabalausky wrote:
>> "foobar" <foo@bar.com> wrote in message news:ieijt6$21mh$1@digitalmars.com...
>>> I don't see how there needs to be different code to accomplish the above use case. You have a function that tests the hardware it's being run on. When this function is called in the compiler context it would return X, when it's called from the target executable it returns Y.
>>>
>>
>> The problem with that is, what if you're generating target-platform-specific code at compile-time? You'd be generating code for the wrong platform. I think VladD2 is right: You need to keep track of both "current" system and "target" system. Unfortunately, there is some information about the "target" system the compile-time code wouldn't be able discern without giving it the ability to run code (RPC? Virtualization? Really, really good emulator?) on the target system, but then again, that's a limitation with any cross-compiling scenario.
>
> Note that for this to work at all, the compiler needs to be able to generate exectuable code for platform X as well as for Y -- that is, it needs to include two back-ends.
>

But if the compiler doesn't have both backends then the whole question of "how does the user's compile-time code get handled if it's being cross-compiled?" becomes irrelevent - you'd have to compile it on the target plaform anyway, so X == Y. Or am I missing your point?

>>>> I'm also scared of the implications of allowing arbitrary code
>>>> execution
>>>> during compilation. Make a typo in your program, and then compilation
>>>> may wipe files from your hard disk, or corrupt an external database,
>>>> etc... On some platforms you may be able to sandbox it, but since it's
>>>> running as part of the compilation process, rather than with the
>>>> permissions it will eventually have, it just seems like a security
>>>> nightmare.
>>>>
>>
>> That's an interesting point. OTOH, it's common for software to come with its own build script or makefile, and those can certainly do literally anything like you describe above - but I haven't seen that as being a real problem.
>
> I don't think it's quite the same. In a makefile, every executable is
> listed, and so you can have some degree of control over it. But in this
> scenario, the compiler is making calls to arbitrary shared libraries with
> arbitrary parameters.
> It means the compiler cannot be trusted *at all*.

I suppose that's a reasonable point.



December 19, 2010
Don Wrote:

> > I think VladD2 is right: You need to keep track of both "current" system and "target" system. Unfortunately, there is some information about the "target" system the compile-time code wouldn't be able discern without giving it the ability to run code (RPC? Virtualization? Really, really good emulator?) on the target system, but then again, that's a limitation with any cross-compiling scenario.
> 
> Note that for this to work at all, the compiler needs to be able to generate exectuable code for platform X as well as for Y -- that is, it needs to include two back-ends.

If the macros have been compiled and are in binary (executable) form, the compiler must only be able to generate code for platform X, and run macros (execute code from DLL). This is exactly what makes Nemerle compiler.

In this case, compiling of the same macros looks like any other compilation process (on the platform X for the platform Y).


> I don't think it's quite the same. In a makefile, every executable is listed, and so you can have some degree of control over it.

Trust to rmdir ... lol!
And what about NAnt or MSBuild which can have binary extensions?

I think, you are completely wrong.

> But in this
> scenario, the compiler is making calls to arbitrary shared libraries
> with arbitrary parameters.
> It means the compiler cannot be trusted *at all*.

The experience of Lisp (50 years!) and Nemerel (about 6 years) shows that the ability to access any library - is not a problem. This is a huge advantage.

And limit the possibility of a macro, you can simply forbidding them to use some libraries.
December 19, 2010
"Don" <nospam@nospam.com> wrote:
> I don't think it's quite the same. In a makefile, every executable is
> listed, and so you can have some degree of control over it. But in this
> scenario, the compiler is making calls to arbitrary shared libraries with
> arbitrary parameters.
> It means the compiler cannot be trusted *at all*.

You are right only partially - it's unsafe for browser language where code is taken from untrusted source. But this feature gives so much power to the macro sysrem  - that I think is worth considering it. IMO, usually compiled code is run just after compilation (with the same prermissions as compiler) - so compiled code can make dangerous things and can't be trusted at all, but no one is worry about that. Yes compiler can't be *trusted* with this features, but if one knows what he is doing, why to prevent him - add option --enable-ctfe-DANGEROUS-features to allow potentially dangerous features then it wouldn't be so unexpected. Are those features hard to add to the current implementation?


December 20, 2010
VladD2 wrote:
> Don Wrote:
> 
>>> I think VladD2 is right: You need to keep track of both "current" system and "target" system. Unfortunately, there is some information about the "target" system the compile-time code wouldn't be able discern without giving it the ability to run code (RPC? Virtualization? Really, really good emulator?) on the target system, but then again, that's a limitation with any cross-compiling scenario.
>> Note that for this to work at all, the compiler needs to be able to generate exectuable code for platform X as well as for Y -- that is, it needs to include two back-ends.
> 
> If the macros have been compiled and are in binary (executable) form, the compiler must only be able to generate code for platform X,

Yes, but it's not a compiler for platform X! It's only a compiler for platform Y.

> and run macros (execute code from DLL). This is exactly what makes Nemerle compiler.

The .NET system always has a compiler for the platform it's running on. That's not necessarily true for D compilers.

> In this case, compiling of the same macros looks like any other compilation process (on the platform X for the platform Y).
> 
> 
>> I don't think it's quite the same. In a makefile, every executable is listed, and so you can have some degree of control over it. 
> 
> Trust to rmdir ... lol!
> And what about NAnt or MSBuild which can have binary extensions?
> 
> I think, you are completely wrong.
> 
>> But in this scenario, the compiler is making calls to arbitrary shared libraries with arbitrary parameters.
>> It means the compiler cannot be trusted *at all*.
> 
> The experience of Lisp (50 years!) and Nemerel (about 6 years) shows that the ability to access any library - is not a problem. 

I don't think Nemerle has been sufficiently widely used, to be able to draw strong conclusions from it. The Lisp argument is strong though.

> This is a huge advantage.
> 
> And limit the possibility of a macro, you can simply forbidding them to use some libraries. 

I hope you're right, because it's indeed a powerful feature. But I'd want to hear the opinion of a security expert.
In particular, if it can be shown that it's exactly the same as Lisp, I would be convinced.
December 20, 2010
Alex_Dovhal wrote:
> "Don" <nospam@nospam.com> wrote:
>> I don't think it's quite the same. In a makefile, every executable is listed, and so you can have some degree of control over it. But in this scenario, the compiler is making calls to arbitrary shared libraries with arbitrary parameters.
>> It means the compiler cannot be trusted *at all*.
> 
> You are right only partially - it's unsafe for browser language where code is taken from untrusted source. But this feature gives so much power to the macro sysrem  - that I think is worth considering it. IMO, usually compiled code is run just after compilation (with the same prermissions as compiler) - so compiled code can make dangerous things and can't be trusted at all, but no one is worry about that. Yes compiler can't be *trusted* with this features, but if one knows what he is doing, why to prevent him - add option --enable-ctfe-DANGEROUS-features to allow potentially dangerous features then it wouldn't be so unexpected. Are those features hard to add to the current implementation? 

In order for CTFE code to call pre-compiled code, three things are required:
(1) the compiler needs to be able to find the file (.obj/.lib/shared library) containing the compiled code;
(2) the compiler needs to be able to load the module and call it. This requires some form of dynamic linking.
(3) We need a marshalling step, to convert from compiler literal to compiled data, and back.


Step (3) is straightforward. The challenge is step(2), although note that it's a general "allow the compiler to load a plugin" problem, and doesn't have much to do with CTFE.


December 20, 2010
"Don" <nospam@nospam.com> wrote:
> In order for CTFE code to call pre-compiled code, three things are
> required:
> (1) the compiler needs to be able to find the file (.obj/.lib/shared
> library) containing the compiled code;
> (2) the compiler needs to be able to load the module and call it. This
> requires some form of dynamic linking.
> (3) We need a marshalling step, to convert from compiler literal to
> compiled data, and back.
>
>
> Step (3) is straightforward. The challenge is step(2), although note that it's a general "allow the compiler to load a plugin" problem, and doesn't have much to do with CTFE.

Understand. So, it should be dynamic loaded, compiler should know which D library to load for used function and this function's name mangling, also then phobos should be dynamic library to call it's functions in macro. This is non trivial stuff, and compiler itselt is written in C++ so this plugin architecture should be working in C++ too. Also when cross-compile it's neeaded compiler for both X and Y architectures or two compilers, communicating among them. So that compiler for Y when finds macro should call compiler X and dynamically load to itself produced function. OK, IMO it's too complex and experimental to be of any priority in nearest future.


December 21, 2010
"Don" <nospam@nospam.com> wrote:
> In order for CTFE code to call pre-compiled code, three things are
> required:
> (1) the compiler needs to be able to find the file (.obj/.lib/shared
> library) containing the compiled code;
> (2) the compiler needs to be able to load the module and call it. This
> requires some form of dynamic linking.
> (3) We need a marshalling step, to convert from compiler literal to
> compiled data, and back.
>
>
> Step (3) is straightforward. The challenge is step(2), although note that it's a general "allow the compiler to load a plugin" problem, and doesn't have much to do with CTFE.

I thought it over, and got:
(1) can be solved by adding compiler option: --macro-libs=<list_of_libs>, so
the compiler knows in which dynamic libraries to search for the plugins ;
(2) plugin library functions should be *stdcall*, so C++ compiler can load
them. That library should implement function like that:

PluginInfo* getFunctionsInfo () ;
where:
typedef struct _PluginInfo{
 struct _PluginInfo * nextPlugin ;
 char* fcnName ; // short name of a function, e.g. "fcn"
 char* params ; // param list, e.g. "(int, float, char*)"
 char* mangledName;//name of funct. in the library, e.g. "_fcn@12"
 char* returnType ; // e.g. "int"
 bool  isNothrow ; // if useful?
 bool  isPure ;    // ditto
 //... etc ...
} PluginInfo ;

And also should implement all the functions, info about which is returned by
getFunctionsInfo().
D compiler calls getFunctionsInfo from each plugin library - and then loads
all the implemented functions with full info about them.
If one plugin function name found in two or more libraries - D throws
compiler error.
Not a perfect solution, but at least straightforward one and solves given
problem.

One more note: this approach makes CTFE functions calling such plugins uncallable at runtime, which IMO is OK, as one should not call it in any case.

Is that suitable?


December 22, 2010
Alex_Dovhal wrote:
> "Don" <nospam@nospam.com> wrote:
>> In order for CTFE code to call pre-compiled code, three things are required:
>> (1) the compiler needs to be able to find the file (.obj/.lib/shared library) containing the compiled code;
>> (2) the compiler needs to be able to load the module and call it. This requires some form of dynamic linking.
>> (3) We need a marshalling step, to convert from compiler literal to compiled data, and back.
>>
>>
>> Step (3) is straightforward. The challenge is step(2), although note that it's a general "allow the compiler to load a plugin" problem, and doesn't have much to do with CTFE.
> 
> I thought it over, and got:
> (1) can be solved by adding compiler option: --macro-libs=<list_of_libs>, so the compiler knows in which dynamic libraries to search for the plugins ;
> (2) plugin library functions should be *stdcall*, so C++ compiler can load them. That library should implement function like that:
> 
> PluginInfo* getFunctionsInfo () ;
> where:
> typedef struct _PluginInfo{
>  struct _PluginInfo * nextPlugin ;
>  char* fcnName ; // short name of a function, e.g. "fcn"
>  char* params ; // param list, e.g. "(int, float, char*)"
>  char* mangledName;//name of funct. in the library, e.g. "_fcn@12"
>  char* returnType ; // e.g. "int"
>  bool  isNothrow ; // if useful?
>  bool  isPure ;    // ditto
>  //... etc ...
> } PluginInfo ;
> 
> And also should implement all the functions, info about which is returned by getFunctionsInfo().
> D compiler calls getFunctionsInfo from each plugin library - and then loads all the implemented functions with full info about them.
> If one plugin function name found in two or more libraries - D throws compiler error.
> Not a perfect solution, but at least straightforward one and solves given problem.
> 
> One more note: this approach makes CTFE functions calling such plugins uncallable at runtime, which IMO is OK, as one should not call it in any case.
> 
> Is that suitable? 

It's not that complicated. Once you can load the library and call *one* function in it, the problem is solved.
But at a deeper level, I'm not sure what you'd hope to achieve with this.
I mean (excluding some not-yet implemented features), CTFE allows you to execute any pure + safe function which you have source code for.

So, if you had such a plugin, it could do things like make database queries. Everything else you can do already.