December 19, 2012
On Tuesday, December 18, 2012 17:57:50 Brad Roberts wrote:
> On Tue, 18 Dec 2012, Andrei Alexandrescu wrote:
> > On 12/18/12 7:29 PM, H. S. Teoh wrote:
> > > Which right now suffers from some silly things like writefln not being able to be made @safe, just because some obscure formatting parameter is un@safe. Which is exactly how @safe was designed, of course. Except that it makes SafeD ... a bit of a letdown, shall we say? - when it comes to practical real-world applications.
> > > 
> > > (And just to be clear, I'm all for SafeD, but it does still have a ways
> > > to go.)
> > 
> > Yes, there are several bugs related to SafeD.
> > 
> > Andrei
> 
> Are the remaining issues at the compiler, runtime, or phobos levels (or what combination of the three)? Are the bugs filed?

Quite a few are, but it wouldn't surprise me at all if there are quite a few which aren't. For instance, AFAIK, no one ever brought up the issue of slicing static arrays being unsafe until just a couple of months ago:

http://d.puremagic.com/issues/show_bug.cgi?id=8838

Such operations should be @system but are currently considered @safe. Who knows how many others we've missed beyond what's currently in bugzilla.

- Jonathan M Davis
December 19, 2012
On Wednesday, 19 December 2012 at 01:58:54 UTC, Jonathan M Davis wrote:
>> Are the remaining issues at the compiler, runtime, or phobos levels (or
>> what combination of the three)? Are the bugs filed?
>
> Quite a few are, but it wouldn't surprise me at all if there are quite a few
> which aren't. For instance, AFAIK, no one ever brought up the issue of slicing
> static arrays being unsafe until just a couple of months ago:
>
> http://d.puremagic.com/issues/show_bug.cgi?id=8838
>
> Such operations should be @system but are currently considered @safe. Who
> knows how many others we've missed beyond what's currently in bugzilla.
>

This is chicken and egg issue. Due to limitations, enforcing @safe is hard to do in many code that is safe. So you actually don't notice that some stuff are considered/not considered safe/system when they should.
December 19, 2012
On 12/18/2012 5:58 PM, Jonathan M Davis wrote:
> On Tuesday, December 18, 2012 17:57:50 Brad Roberts wrote:
>> On Tue, 18 Dec 2012, Andrei Alexandrescu wrote:
>>> On 12/18/12 7:29 PM, H. S. Teoh wrote:
>>>> Which right now suffers from some silly things like writefln not being able to be made @safe, just because some obscure formatting parameter is un@safe. Which is exactly how @safe was designed, of course. Except that it makes SafeD ... a bit of a letdown, shall we say? - when it comes to practical real-world applications.
>>>>
>>>> (And just to be clear, I'm all for SafeD, but it does still have a ways
>>>> to go.)
>>>
>>> Yes, there are several bugs related to SafeD.
>>>
>>> Andrei
>>
>> Are the remaining issues at the compiler, runtime, or phobos levels (or what combination of the three)? Are the bugs filed?
> 
> Quite a few are, but it wouldn't surprise me at all if there are quite a few which aren't. For instance, AFAIK, no one ever brought up the issue of slicing static arrays being unsafe until just a couple of months ago:
> 
> http://d.puremagic.com/issues/show_bug.cgi?id=8838
> 
> Such operations should be @system but are currently considered @safe. Who knows how many others we've missed beyond what's currently in bugzilla.
> 
> - Jonathan M Davis
> 

The part I'm particularly interested in is the compiler layer.
December 19, 2012
On Wednesday, 19 December 2012 at 01:09:14 UTC, F i L wrote:
> Without bytecode, the entire compiler becomes a dependency of a AOT/JIT compiled program.. not only does bytecode allow for faster on-site compilations, it also means half the compiler can be stripped away (so i'm told, i'm not claiming to be an expert here).
>
> I'm actually kinda surprised there hasn't been more of a AOT/JIT compiling push within the D community.. D's the best there is at code specialization, but half of that battle seems to be hardware specifics only really known on-site... like SIMD for example. I've been told many game companies compile against SIMD 3.1 because that's the base-line x64 instruction set. If you could query the hardware post-distribution (vs pre-distribution) without any performance loss or code complication (to the developer), that would be incredibly idea. (ps. I acknowledge that this would probably _require_ the full compiler, so there's probably not be to much value in a D-bytecode).
>
> The D compiler is small enough for distribution I think (only ~10mb compressed?), but the back-end license restricts it right?

I'm not claiming to be an expert in this area either, however it seems obvious that there are significant theoretical and practical advantages with using the bytecode concept.

My understanding is that with byte code and a suitable VM to process it, one can abstract away the underlying high-level language that was used to produce the byte code, therefore it is possible to use alternate high level languages with front-ends that compile to the same common bytecode instruction set. This is exactly the same as what is being done with the D front end, and other front ends for the GCC, except for the difference that the machine code produced needs a physical cpu to process it, and there is no machine code instruction set that is common across all architectures.

Effectively, the bytecode serves as the common native machine code for a standardized virtualized cpu (the VM) and the VM can sit on top of any given architecture (more or less).

Of course there are significant execution inefficiencies with this method, however bytecode can be compiled into native code - keeping in mind that you did not have to transport whatever the high level language was that was compiled into the byte code for this to be possible.

So in summary, the primary purpose of byte code is to serve as an intermediate common language that can be run directly on a VM, or compiled directly into native machine code. There's no need to transport or even know what language was used for producing the byte code.

As a reminder, this is what "my understanding is", which may be incorrect in one or more areas, so if I'm wrong, I'd like to be corrected.

Thanks

--rt
December 19, 2012
On Wednesday, 19 December 2012 at 01:58:54 UTC, Jonathan M Davis wrote:
> Such operations should be @system but are currently considered @safe. Who
> knows how many others we've missed beyond what's currently in bugzilla.
>
> - Jonathan M Davis

Unfortunately fixing these will break existing code, or can the behavior be depreciated?

--rt
December 19, 2012
On 12/18/2012 11:04 PM, Rob T wrote:
> I'm not claiming to be an expert in this area either, however it seems obvious
> that there are significant theoretical and practical advantages with using the
> bytecode concept.

Evidently you've dismissed all of my posts in this thread on that topic :-)

December 19, 2012
On Tuesday, 18 December 2012 at 18:11:37 UTC, Walter Bright wrote:
> An interesting datapoint in regards to bytecode is Javascript. Note that Javascript is not distributed in bytecode form. There is no Javascript VM. It is distributed as source code. Sometimes, that source code is compressed and obfuscated, nevertheless it is still source code.
>
> How the end system chooses to execute the js is up to that end system, and indeed there are a great variety of methods in use.
>
> Javascript proves that bytecode is not required for "write once, run everywhere", which was one of the pitches for bytecode.
>
> What is required for w.o.r.e. is a specification for the source code that precludes undefined and implementation defined behavior.
>
> Note also that Typescript compiles to Javascript. I suspect there are other languages that do so, too.

True, however JavaScript's case is similar to C.

Many compilers make use of C as an high level assembler and JavaScript,
like it or not, is the C of Internet.

--
Paulo
December 19, 2012
On Wednesday, 19 December 2012 at 07:22:45 UTC, Walter Bright wrote:
> On 12/18/2012 11:04 PM, Rob T wrote:
>> I'm not claiming to be an expert in this area either, however it seems obvious
>> that there are significant theoretical and practical advantages with using the
>> bytecode concept.
>
> Evidently you've dismissed all of my posts in this thread on that topic :-)

As you dismissed all points in favor of bytecode. Such as it being a standardized AST representation for multiple languages. CLI is all about that, which is reflected in its name. LLVM is used almost exclusively for that purpose (clang is great).

Not advocating bytecode here but you claiming it is completely useless is so D-ish :).
December 19, 2012
On 12/19/2012 12:19 AM, Max Samukha wrote:
>> Evidently you've dismissed all of my posts in this thread on that topic :-)
> As you dismissed all points in favor of bytecode.

And I gave detailed reasons why.

> Such as it being a
> standardized AST representation for multiple languages. CLI is all about that,
> which is reflected in its name. LLVM is used almost exclusively for that purpose
> (clang is great).

My arguments were all based on the idea of distributing "compiled" source code in bytecode format.

The idea of using some common intermediate format to tie together multiple front ends and multiple back ends is something completely different.

And, surprise (!), I've done that, too. The original C compiler I wrote for many years was a multipass affair, that communicated the data from one pass to the next via an intermediate file. I was forced into such a system because DOS just didn't have enough memory to combine the passes.

I dumped it when more memory became available, as it was the source of major slowdowns in the compilation process.

Note that such a system need not be *bytecode* at all, it can just hand the data structure off from one pass to the next. In fact, an actual bytecode requires a serialization of the data structures and then a reconstruction of them - rather pointless.


> Not advocating bytecode here but you claiming it is completely useless is so
> D-ish :).

I'm not without experience doing everything bytecode is allegedly good at.

As for CLI, it is great for implementing C#. For other languages, not so much. There turned out to be no way to efficiently represent D slices in it, for example.
December 19, 2012
On Wednesday, 19 December 2012 at 07:22:45 UTC, Walter Bright wrote:
> On 12/18/2012 11:04 PM, Rob T wrote:
>> I'm not claiming to be an expert in this area either, however it seems obvious
>> that there are significant theoretical and practical advantages with using the
>> bytecode concept.
>
> Evidently you've dismissed all of my posts in this thread on that topic :-)

I really am trying to understand your POV, but I'm having a difficult time with the point concerning performance.

Using the JS code as an example, you are stating that the JS source code itself could just as well be viewed as the "bytecode", and therefore given what I previously wrote concerning the "advantages", I could replace "bytecode" with "JS source code" and achieve the exact same result. Am I Correct?

I will agree that the bytecode could be encoded as JS (or as another language) and used as a common base for interpretation or compilation to machine code. I can also agree that other languages can be "compiled" into the common "bytecode" language provided that it is versatile enough, so from that POV I will agree that you are correct.

I thought that transforming source code into bytecode was an optimization technique intended to improve interpretation performance while preserving portability across architectures, i.e., the bytecode language was designed specifically to improve interpretation performance - but you say that the costs of performing the transformations from a high-level language into the optimized bytecode language far outweigh the advantages of leaving it as-is, i.e., whatever performance gains you get through the transformation is not significant enough to justify the costs of performing the transformation.

Is my understanding of your POV correct?

What I'm having trouble understanding is this:

If the intention of something like the Java VM was to create a portable virtualized machine that could be used to execute any language, then would it not make sense to select a common bytecode language that was optimized for execution performance, rather than using another common language that was not specifically designed for that purpose?

Do you have a theory or insight that can explain why a situation like the Java bytecode VM came to be and why it persists despite your suggestion that it is not required or of enough advantage to justify using it (may as well use Java source directly)?

--rt