Jump to page: 1 25  
Page
Thread overview
Labels as values and threaded-code interpretation
Jun 01, 2013
Manu
Jun 01, 2013
bearophile
Jun 02, 2013
Walter Bright
Jun 02, 2013
Paulo Pinto
Jun 02, 2013
Brad Roberts
Jun 02, 2013
Paulo Pinto
Jun 02, 2013
Brad Roberts
Jun 02, 2013
Walter Bright
Jun 02, 2013
Brad Roberts
Jun 03, 2013
Jacob Carlborg
Jun 02, 2013
Manu
Jun 01, 2013
bearophile
Jun 01, 2013
Timon Gehr
Jun 02, 2013
bearophile
Jun 02, 2013
Andrej Mitrovic
Jun 01, 2013
bearophile
Jun 01, 2013
Diggory
Jun 01, 2013
Timon Gehr
Jun 01, 2013
Adam D. Ruppe
Jun 01, 2013
Rob T
Jun 02, 2013
Carl Sturtivant
Jun 02, 2013
Dmitry Olshansky
Jun 02, 2013
Dmitry Olshansky
Jun 02, 2013
Dmitry Olshansky
Jun 02, 2013
Dmitry Olshansky
Jun 02, 2013
Dmitry Olshansky
Jun 02, 2013
Walter Bright
Jun 02, 2013
Dmitry Olshansky
Jun 02, 2013
Walter Bright
Jun 02, 2013
nazriel
June 01, 2013
Hi,

I'm sure this has been brought up before, but I feel I need to bring it up again (because I'm going to be writing a threaded-code interpreter): http://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html

This is an incredibly important extension. The final switch statement is not a replacement because it doesn't allow the programmer to store a label address directly into a code stream, which is what's essential to write a threaded-code interpreter.

The Erlang folks went through hell just to use this feature; see the 5th Q at: http://www.erlang.org/doc/installation_guide/INSTALL-WIN32.html#Frequently-Asked-Questions

The idea is to be able to write code like this:

----

import std.algorithm;

enum Op : ubyte
{
    imm,
    add,
    sub,
    // ...
    ret,
}

final class Insn
{
    Op op;
    size_t[] args;
    void* lbl;
    Insn next;
}

final class State
{
    Insn pc;
    size_t[64] regs;
}

size_t interp(Insn[] code)
{
    // Set up the instruction stream with label addresses
    // the first time that it is executed. Label addresses
    // are stable, so we only do this once.

    foreach (insn; code.filter!(x => !x.lbl)())
    {
        void* lbl;

        with (Op)
        {
            final switch (insn.op)
            {
                case imm: lbl = &&handle_imm; break;
                case add: lbl = &&handle_add; break;
                case sub: lbl = &&handle_sub; break;
                // ...
                case ret: lbl = &&handle_ret; break;
            }
        }

        insn.lbl = lbl;
    }

    // Start interpreting the entry instruction.

    auto state = new State;
    state.pc = code[0];
    goto *state.pc.lbl;

    // Interpreter logic follows...

    // The whole point is to avoid unnecessary function
    // calls, so we use a mixin template for this stuff.
    enum advance = "state.pc = state.pc.next;" ~
                   "goto *state.pc.lbl;";

    handle_imm:
    {
        state.regs[state.pc.args[0]] = state.pc.args[1];
        mixin(advance);
    }

    handle_add:
    {
        state.regs[state.pc.args[0]] = state.regs[state.pc.args[1]] + state.regs[state.pc.args[2]];
        mixin(advance);
    }

    handle_sub:
    {
        state.regs[state.pc.args[0]] = state.regs[state.pc.args[1]] - state.regs[state.pc.args[2]];
        mixin(advance);
    }

    // ...

    handle_ret:
    {
        return state.regs[state.pc.args[0]];
    }

    assert(false);
}

----

Notice that there are no function calls going on. Just plain jumps all the way through. This is a technique that many real world interpreters use to get significant speedups, and I for one think we desperately need it.

Implementing it should be trivial as both LLVM and GCC support taking the address of a block. I'm sure the DMD back end could be extended to support it too.

-- 
Alex Rønne Petersen
alex@alexrp.com / alex@lycus.org
http://alexrp.com / http://lycus.org
June 01, 2013
GCC has a non-standard extension to do this, I've used it to get great
performance wins writing emulators. I love this feature, love to see it in
D!
On 1 Jun 2013 15:30, "Alex Rønne Petersen" <alex@lycus.org> wrote:

> Hi,
>
> I'm sure this has been brought up before, but I feel I need to bring it up
> again (because I'm going to be writing a threaded-code interpreter):
> http://gcc.gnu.org/onlinedocs/**gcc/Labels-as-Values.html<http://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html>
>
> This is an incredibly important extension. The final switch statement is not a replacement because it doesn't allow the programmer to store a label address directly into a code stream, which is what's essential to write a threaded-code interpreter.
>
> The Erlang folks went through hell just to use this feature; see the 5th Q
> at: http://www.erlang.org/doc/**installation_guide/INSTALL-**
> WIN32.html#Frequently-Asked-**Questions<http://www.erlang.org/doc/installation_guide/INSTALL-WIN32.html#Frequently-Asked-Questions>
>
> The idea is to be able to write code like this:
>
> ----
>
> import std.algorithm;
>
> enum Op : ubyte
> {
>     imm,
>     add,
>     sub,
>     // ...
>     ret,
> }
>
> final class Insn
> {
>     Op op;
>     size_t[] args;
>     void* lbl;
>     Insn next;
> }
>
> final class State
> {
>     Insn pc;
>     size_t[64] regs;
> }
>
> size_t interp(Insn[] code)
> {
>     // Set up the instruction stream with label addresses
>     // the first time that it is executed. Label addresses
>     // are stable, so we only do this once.
>
>     foreach (insn; code.filter!(x => !x.lbl)())
>     {
>         void* lbl;
>
>         with (Op)
>         {
>             final switch (insn.op)
>             {
>                 case imm: lbl = &&handle_imm; break;
>                 case add: lbl = &&handle_add; break;
>                 case sub: lbl = &&handle_sub; break;
>                 // ...
>                 case ret: lbl = &&handle_ret; break;
>             }
>         }
>
>         insn.lbl = lbl;
>     }
>
>     // Start interpreting the entry instruction.
>
>     auto state = new State;
>     state.pc = code[0];
>     goto *state.pc.lbl;
>
>     // Interpreter logic follows...
>
>     // The whole point is to avoid unnecessary function
>     // calls, so we use a mixin template for this stuff.
>     enum advance = "state.pc = state.pc.next;" ~
>                    "goto *state.pc.lbl;";
>
>     handle_imm:
>     {
>         state.regs[state.pc.args[0]] = state.pc.args[1];
>         mixin(advance);
>     }
>
>     handle_add:
>     {
>         state.regs[state.pc.args[0]] = state.regs[state.pc.args[1]] +
> state.regs[state.pc.args[2]];
>         mixin(advance);
>     }
>
>     handle_sub:
>     {
>         state.regs[state.pc.args[0]] = state.regs[state.pc.args[1]] -
> state.regs[state.pc.args[2]];
>         mixin(advance);
>     }
>
>     // ...
>
>     handle_ret:
>     {
>         return state.regs[state.pc.args[0]];
>     }
>
>     assert(false);
> }
>
> ----
>
> Notice that there are no function calls going on. Just plain jumps all the way through. This is a technique that many real world interpreters use to get significant speedups, and I for one think we desperately need it.
>
> Implementing it should be trivial as both LLVM and GCC support taking the address of a block. I'm sure the DMD back end could be extended to support it too.
>
> --
> Alex Rønne Petersen
> alex@alexrp.com / alex@lycus.org
> http://alexrp.com / http://lycus.org
>


June 01, 2013
Alex Rønne Petersen:

> I'm sure this has been brought up before, but I feel I need to bring it up again (because I'm going to be writing a threaded-code interpreter): http://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html
>
> This is an incredibly important extension.

This was discussed more than once, and I think it's a valuable improvement for certain kinds of low level D programming.

Walter says that he has not added this feature to D because it's useful only to write interpreters and the like. I have found it useful in GCC to also write very fast finite state machines to analyse genomic strings. But even if Walter is right, writing interpreters (and emulators) is an important purpose for a system language as D. You can't write them efficient in scripting languages, and even Haskell/F# are not good at all to write them. Languages like C/D/C++ (and few others, as later probably Rust) are the only few natural languages to write them.

"Recently" the Python C interpreter was modified and speed up thanks to this non-standard feature. CPython source code has two versions, one with computed gotos and one without, to compile it even if your C compiler doesn't support them or their GNU-C syntax.


> Implementing it should be trivial as both LLVM and GCC support taking the address of a block. I'm sure the DMD back end could be extended to support it too.

Even if DMD back-end will never implement it, I think it's important to define it formally and officially in the D language and add the relative syntax to the front-end (plus a standard version identifier that allows to write programs that have both a routine that uses computed gotos and one that doesn't use them).

This has the big advantage that all future compilers that will want to implement it will use the same nice standard syntax. If D doesn't define this syntax, then maybe GDC will add it, and maybe later LDC will add it, and later maybe other future compilers will add it, and we can't be sure they will share the same syntax.

Another advantage of having this syntax in D, is that even if a compiler back-end doesn't support computed gotos, the program compiles without syntax errors when you use conditional compilation to disable the piece of code that uses computed gotos.

Bye,
bearophile
June 01, 2013
Alex Rønne Petersen:

>             final switch (insn.op)
>             {
>                 case imm: lbl = &&handle_imm; break;
>                 case add: lbl = &&handle_add; break;
>                 case sub: lbl = &&handle_sub; break;
>                 // ...
>                 case ret: lbl = &&handle_ret; break;

Regarding the syntax, why do you use "&&"? Isn't a single "&" enough?

                case imm: lbl = &handle_imm; break;

If such gotos become a natural part of the D syntax then it's not necessary to copy the GNU C syntax.

Bye,
bearophile
June 01, 2013
As example the very small but fast virtual machine written in GNU C++ from the International Contest on Functional Programming 2006:

http://codepad.org/iibBeWKw

It's faster than the same code without computed gotos.

Bye,
bearophile
June 01, 2013
On Saturday, 1 June 2013 at 11:50:23 UTC, bearophile wrote:
> As example the very small but fast virtual machine written in GNU C++ from the International Contest on Functional Programming 2006:
>
> http://codepad.org/iibBeWKw
>
> It's faster than the same code without computed gotos.
>
> Bye,
> bearophile

Would be cool if there was a platform independent way in D to construct a sequence of "call" instructions in memory. Then it would be possible to write a JIT compiler in the exact same style as that example.

It would be relatively easy to add to phobos although you'd have to be careful about DEP. Hmm, perhaps I will try writing something like this...
June 01, 2013
On 06/01/2013 11:43 AM, bearophile wrote:
> Alex Rønne Petersen:
>
>>             final switch (insn.op)
>>             {
>>                 case imm: lbl = &&handle_imm; break;
>>                 case add: lbl = &&handle_add; break;
>>                 case sub: lbl = &&handle_sub; break;
>>                 // ...
>>                 case ret: lbl = &&handle_ret; break;
>
> Regarding the syntax, why do you use "&&"? Isn't a single "&" enough?
>
>                  case imm: lbl = &handle_imm; break;
>
> If such gotos become a natural part of the D syntax then it's not
> necessary to copy the GNU C syntax.
>
> Bye,
> bearophile

D (like C) uses a different namespace for labels and symbols that are not labels.

This compiles today:

void main(){
    int foo;
    foo: auto b = &foo;
}

June 01, 2013
On 06/01/2013 07:29 AM, Alex Rønne Petersen wrote:
> Hi,
>
> I'm sure this has been brought up before, but I feel I need to bring it
> up again (because I'm going to be writing a threaded-code interpreter):
> http://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html
>
> This is an incredibly important extension. The final switch statement is
> not a replacement because it doesn't allow the programmer to store a
> label address directly into a code stream, which is what's essential to
> write a threaded-code interpreter.
>

I'd also like to see this.

> The Erlang folks went through hell just to use this feature; see the 5th
> Q at:
> http://www.erlang.org/doc/installation_guide/INSTALL-WIN32.html#Frequently-Asked-Questions
>
>
> The idea is to be able to write code like this:
>
> ----
>
> import std.algorithm;
>
> enum Op : ubyte
> {
>      imm,
>      add,
>      sub,
>      // ...
>      ret,
> }
>
> final class Insn
> {
>      Op op;
>      size_t[] args;
>      void* lbl;
>      Insn next;
> }
>
> final class State
> {
>      Insn pc;
>      size_t[64] regs;
> }
>
> size_t interp(Insn[] code)
> {
>      // Set up the instruction stream with label addresses
>      // the first time that it is executed. Label addresses
>      // are stable, so we only do this once.
>
>      foreach (insn; code.filter!(x => !x.lbl)())
>      {
>          void* lbl;
>
>          with (Op)
>          {
>              final switch (insn.op)
>              {
>                  case imm: lbl = &&handle_imm; break;
>                  case add: lbl = &&handle_add; break;
>                  case sub: lbl = &&handle_sub; break;
>                  // ...
>                  case ret: lbl = &&handle_ret; break;
>              }
>          }
>
>          insn.lbl = lbl;
>      }
> ...

(This must be supported at compile time!)

June 01, 2013
On Saturday, 1 June 2013 at 16:13:18 UTC, Timon Gehr wrote:
> I'd also like to see this.

Me too. Last time this came up I said no since there's some inline asm hacks you can do but turns out those hacks suck in capability and appearance compared to just having this feature.
June 01, 2013
On Saturday, 1 June 2013 at 16:23:11 UTC, Adam D. Ruppe wrote:
> On Saturday, 1 June 2013 at 16:13:18 UTC, Timon Gehr wrote:
>> I'd also like to see this.
>
> Me too. Last time this came up I said no since there's some inline asm hacks you can do but turns out those hacks suck in capability and appearance compared to just having this feature.

I wonder if this sort of thing would help solve another implementation problem, which is implementing very fast coroutines without the overhead seen with similar features such as fibers.

Some discussion here:
http://forum.dlang.org/thread/pcexsqbwefjueyylyyoz@forum.dlang.org

--rt
« First   ‹ Prev
1 2 3 4 5