December 18, 2012
On 12/18/2012 5:34 AM, Iain Buclaw wrote:
> I'll let you all brood over that though, and would welcome any feedback to get
> some sort of plan rolling (even if it's just a DIP).

I'm open to pull requests which abstract away some of the differences.

December 18, 2012
On 12/18/2012 7:06 AM, Walter Bright wrote:
> On 12/18/2012 5:34 AM, Iain Buclaw wrote:
>> I'll let you all brood over that though, and would welcome any feedback to get
>> some sort of plan rolling (even if it's just a DIP).
>
> I'm open to pull requests which abstract away some of the differences.


This would be similar to the way the port.h/port.c attempts to abstract away the differences between host compilers.
December 18, 2012
On Tue, Dec 18, 2012 at 06:55:34AM -0800, Walter Bright wrote:
> On 12/18/2012 3:43 AM, foobar wrote:
> >Honest question - If D already has all the semantic info in COMDAT sections,
> 
> It doesn't. COMDATs are object file sections. They do not contain type info, for example.
> 
> >  * provide a byte-code solution to support the portability case. e.g
> >  Java byte-code or Google's pNaCL solution that relies on LLVM
> >  bit-code.
> 
> There is no advantage to bytecodes. Putting them in a zip file does not make them produce better results.
[...]

An idea occurred to me while reading this. What if, when compiling a module, say, the compiler not only emits object code, but also information like which functions are implied to be strongly pure, weakly pure, @safe, etc., as well as some kind of symbol dependency information. Basically, any derived information that isn't immediately obvious from the code is saved.

Then when importing the module, the compiler doesn't have to re-derive all of this information, but it is immediately available.

One can also include information like whether a function actually throws an exception (regardless of whether it's marked nothrow), which exception(s) it throws, etc.. This may open up the possibility of doing some things with the language that are currently infeasible, regardless of the obfuscation issue.


T

-- 
There are three kinds of people in the world: those who can count, and those who can't.
December 18, 2012
On Tue, Dec 18, 2012 at 07:01:28AM -0800, Walter Bright wrote:
> On 12/18/2012 1:43 AM, Dmitry Olshansky wrote:
> >Compared to doing computations on AST tries (and looking up every name in symbol table?), creating fake nodes when the result is computed etc?
> 
> CTFE does not look up every (or any) name in the symbol table. I don't see any advantage to interpreting bytecode over interpreting ASTs. In fact, all the Java bytecode is is a serialized AST.

I've always thought that CTFE should run native code.

Yes, I'm aware of the objections related to cross-compiling, etc., but honestly, how many people actually use a cross-compiler (or know what it is)? Interpreting CTFE to me should be a fallback, not the default mode of implementation.


T

-- 
Acid falls with the rain; with love comes the pain.
December 18, 2012
On 12/18/2012 7:51 AM, H. S. Teoh wrote:
> An idea occurred to me while reading this. What if, when compiling a
> module, say, the compiler not only emits object code, but also
> information like which functions are implied to be strongly pure, weakly
> pure, @safe, etc., as well as some kind of symbol dependency
> information. Basically, any derived information that isn't immediately
> obvious from the code is saved.
>
> Then when importing the module, the compiler doesn't have to re-derive
> all of this information, but it is immediately available.
>
> One can also include information like whether a function actually throws
> an exception (regardless of whether it's marked nothrow), which
> exception(s) it throws, etc.. This may open up the possibility of doing
> some things with the language that are currently infeasible, regardless
> of the obfuscation issue.

This is a binary import. It offers negligible advantages over .di files.

December 18, 2012
On 12/18/12 10:01 AM, Walter Bright wrote:
> On 12/18/2012 1:43 AM, Dmitry Olshansky wrote:
>> Compared to doing computations on AST tries (and looking up every name
>> in symbol
>> table?), creating fake nodes when the result is computed etc?
>
> CTFE does not look up every (or any) name in the symbol table. I don't
> see any advantage to interpreting bytecode over interpreting ASTs. In
> fact, all the Java bytecode is is a serialized AST.

My understanding is that Java bytecode is somewhat lowered e.g. using a stack machine for arithmetic, jumps etc. which makes it more amenable to interpretation than what an AST walker would do. Also bytecode is more directly streamable because you don't need any pointer fixups.

In brief I agree there's an isomorphism between bytecode and AST representation, but there are a few differences that may be important to certain applications.


Andrei
December 18, 2012
12/18/2012 7:01 PM, Walter Bright пишет:
> On 12/18/2012 1:43 AM, Dmitry Olshansky wrote:
>> Compared to doing computations on AST tries (and looking up every name
>> in symbol
>> table?), creating fake nodes when the result is computed etc?
>
> CTFE does not look up every (or any) name in the symbol table.

I stand corrected - ditch "the looking up every name in symbol table".

Honestly I've deduced that from your statement:
>>>the type information and AST trees and symbol table.

Note the symbol table. Looking inside I cannot immediately grasp if it ever uses it. I see that e.g. variables are tied to nodes that represent declarations, values to expression nodes of already processed AST.

> I don't
> see any advantage to interpreting bytecode over interpreting ASTs. In
> fact, all the Java bytecode is is a serialized AST.
>
We need no stinkin' Java ;)

But adequate bytecode designed for interpreters (see e.g. Lua) are designed for faster execution. The way CTFE is done now* is a polymorphic call per AST-node that does a lot of analysis that could be decided once and stored in ... *ehm* ... IR. Currently it's also somewhat mixed with semantic analysis (thus rising the complexity).

Another point is that pointer chasing data-structures is not a recipe for fast repeated execution.

To provide an analogy: executing calculation recursively on AST tree of expression is bound to be slower then running the same calculation straight on sanely encoded flat reverse-polish notation.

A hit below belt: also peek at your own DMDScript - why bother with plain IR (_bytecode_!) for JavaScript if it could just fine be interpreted as is on AST-s?

*I judge by a cursory look at source and bits that Don sometimes shares about it.

-- 
Dmitry Olshansky
December 18, 2012
On 12/18/2012 8:54 AM, Andrei Alexandrescu wrote:
> On 12/18/12 10:01 AM, Walter Bright wrote:
>> On 12/18/2012 1:43 AM, Dmitry Olshansky wrote:
>>> Compared to doing computations on AST tries (and looking up every name
>>> in symbol
>>> table?), creating fake nodes when the result is computed etc?
>>
>> CTFE does not look up every (or any) name in the symbol table. I don't
>> see any advantage to interpreting bytecode over interpreting ASTs. In
>> fact, all the Java bytecode is is a serialized AST.
>
> My understanding is that Java bytecode is somewhat lowered e.g. using a stack
> machine for arithmetic, jumps etc. which makes it more amenable to
> interpretation than what an AST walker would do.

The Java bytecode is indeed a stack machine, and a stack machine *is* a serialized AST.


> Also bytecode is more directly
> streamable because you don't need any pointer fixups.

A stack machine *is* a streamable representation of an AST. They are trivially convertible back and forth between each other, and I mean trivially.

December 18, 2012
On 12/18/2012 8:57 AM, Dmitry Olshansky wrote:
> But adequate bytecode designed for interpreters (see e.g. Lua) are designed for
> faster execution. The way CTFE is done now* is a polymorphic call per AST-node
> that does a lot of analysis that could be decided once and stored in ... *ehm*
> ... IR. Currently it's also somewhat mixed with semantic analysis (thus rising
> the complexity).

The architectural failings of CTFE are primary my fault from taking an implementation shortcut and building it out of enhancing the constant folding code.

They are not a demonstration of inherent superiority of one scheme or another. Nor does CTFE's problems indicate that modules should be represented as bytecode externally.

> Another point is that pointer chasing data-structures is not a recipe for fast
> repeated execution.
>
> To provide an analogy: executing calculation recursively on AST tree of
> expression is bound to be slower then running the same calculation straight on
> sanely encoded flat reverse-polish notation.
>
> A hit below belt: also peek at your own DMDScript - why bother with plain IR
> (_bytecode_!) for JavaScript if it could just fine be interpreted as is on AST-s?

Give me some credit for learning something over the last 12 years! I'm not at all convinced I'd use the same design if I were doing it now.

If I was doing it, and speed was paramount, I'd probably fix it to generate native code instead of bytecode and so execute code directly. Even simple JITs dramatically speeded up the early Java VMs.


December 18, 2012
On Tue, Dec 18, 2012 at 08:12:51AM -0800, Walter Bright wrote:
> On 12/18/2012 7:51 AM, H. S. Teoh wrote:
> >An idea occurred to me while reading this. What if, when compiling a module, say, the compiler not only emits object code, but also information like which functions are implied to be strongly pure, weakly pure, @safe, etc., as well as some kind of symbol dependency information. Basically, any derived information that isn't immediately obvious from the code is saved.
> >
> >Then when importing the module, the compiler doesn't have to re-derive all of this information, but it is immediately available.
> >
> >One can also include information like whether a function actually throws an exception (regardless of whether it's marked nothrow), which exception(s) it throws, etc.. This may open up the possibility of doing some things with the language that are currently infeasible, regardless of the obfuscation issue.
> 
> This is a binary import. It offers negligible advantages over .di files.

I was thinking more along the lines of things like fully automatic purity, safety, exception inference. For example, every function body eventually has to be processed by the compiler, so if a particular function is inferred to throw exception X, for example, then when its callers are compiled, this fact can be propagated to them. To do this for the whole program might be infeasible due to the sheer size of things, but if a module contains, for each function exposed by the API, a list of all thrown exceptions, then when the module is imported this information is available up-front and can be propagated further up the call chain. Same thing goes with purity and @safe.

This may even allow us to make pure/@safe/nothrow fully automated so that you don't have to explicitly state them (except when you want the compiler to verify that what you wrote is actually pure, safe, etc.).


T

-- 
Тише едешь, дальше будешь.