June 02, 2013 Re: Labels as values and threaded-code interpretation | ||||
---|---|---|---|---|
| ||||
Posted in reply to Dmitry Olshansky | On 02-06-2013 19:43, Dmitry Olshansky wrote: > 02-Jun-2013 20:48, Alex Rønne Petersen пишет: >> On 02-06-2013 10:52, Dmitry Olshansky wrote: >>> 01-Jun-2013 20:13, Timon Gehr пишет: >>>> On 06/01/2013 07:29 AM, Alex Rønne Petersen wrote: >>>>> Hi, >>>>> >>>>> I'm sure this has been brought up before, but I feel I need to >>>>> bring it >>>>> up again (because I'm going to be writing a threaded-code >>>>> interpreter): >>>>> http://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html >>>>> >>>>> This is an incredibly important extension. The final switch >>>>> statement is >>>>> not a replacement because it doesn't allow the programmer to store a >>>>> label address directly into a code stream, which is what's >>>>> essential to >>>>> write a threaded-code interpreter. >>>>> >>>> >>>> I'd also like to see this. >>>> >>> >>> Same here. >>> >>> Though I believe a way to force tail-call can support the same use case >>> also helping functional programming. >>> >>> Say: >>> >>> goto Call-Expression; //forced tail call >>> >>> instead of: >>> >>> return Call-Expression; >>> >> >> I'm not sure that can support threaded-code interpretation. > > Why not: > > alias OpCode = function void(); > > OpCode[] opcodes = [ opcode_1, ... ]; > > int pc; > ... > > void opcode_1() > { > ... //pick operands do whatever > pc = pc + num_operands; //skip over arguments > OpCode next = cast(OpCode)bytecode[pc]; > goto next(); //this is baked-in threaded dispatch > } > > void opcode_2(){ ... } > > //say bytecode contains operands and addresses of fucntions > void execute(size_t[] bytecode, int pc) > { > OpCode start = cast(OpCode)bytecode[pc]; > pc++; > goto start(); > } > > One can get away without casting if data is in a separate array. > Then this solution is perfectly safe in this limited form. Call would > only point to a function hence no problem with jumping who knows where > and less problems for optimizer. > The problem here is assuming the interpreter state can be global. Once you make it non-global (and thus have to pass it in the goto start(...) call) you get all the overhead of a regular function call. -- Alex Rønne Petersen alex@alexrp.com / alex@lycus.org http://alexrp.com / http://lycus.org |
June 02, 2013 Re: Labels as values and threaded-code interpretation | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | On 02-06-2013 19:44, Walter Bright wrote: > On 6/2/2013 12:49 AM, Brad Roberts wrote: >> Many if not most of those non-unix platforms use gcc, which is >> included in the >> set of compilers that does support it. The point is, which you didn't >> change, >> is that's it's a defacto standard even if not technically in the c or c++ >> language specs. > > The curious question is why it never gets into the newer C and C++ > Standards. > That one's easy: It makes 'too many' assumptions about the capabilities of the target machine. Sort of like assuming signed integers overflow. In practice, though, any relevant target can support computed gotos. -- Alex Rønne Petersen alex@alexrp.com / alex@lycus.org http://alexrp.com / http://lycus.org |
June 02, 2013 Re: Labels as values and threaded-code interpretation | ||||
---|---|---|---|---|
| ||||
Posted in reply to Alex Rønne Petersen | 02-Jun-2013 21:49, Alex Rønne Petersen пишет: > On 02-06-2013 19:43, Dmitry Olshansky wrote: >> 02-Jun-2013 20:48, Alex Rønne Petersen пишет: >>> On 02-06-2013 10:52, Dmitry Olshansky wrote: >>>> 01-Jun-2013 20:13, Timon Gehr пишет: >>>>> On 06/01/2013 07:29 AM, Alex Rønne Petersen wrote: >>>>>> Hi, >>>>>> >>>>>> I'm sure this has been brought up before, but I feel I need to >>>>>> bring it >>>>>> up again (because I'm going to be writing a threaded-code >>>>>> interpreter): >>>>>> http://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html >>>>>> >>>>>> This is an incredibly important extension. The final switch >>>>>> statement is >>>>>> not a replacement because it doesn't allow the programmer to store a >>>>>> label address directly into a code stream, which is what's >>>>>> essential to >>>>>> write a threaded-code interpreter. >>>>>> >>>>> >>>>> I'd also like to see this. >>>>> >>>> >>>> Same here. >>>> >>>> Though I believe a way to force tail-call can support the same use case >>>> also helping functional programming. >>>> >>>> Say: >>>> >>>> goto Call-Expression; //forced tail call >>>> >>>> instead of: >>>> >>>> return Call-Expression; >>>> >>> >>> I'm not sure that can support threaded-code interpretation. >> >> Why not: >> >> alias OpCode = function void(); >> >> OpCode[] opcodes = [ opcode_1, ... ]; >> >> int pc; >> ... >> >> void opcode_1() >> { >> ... //pick operands do whatever >> pc = pc + num_operands; //skip over arguments >> OpCode next = cast(OpCode)bytecode[pc]; >> goto next(); //this is baked-in threaded dispatch >> } >> >> void opcode_2(){ ... } >> >> //say bytecode contains operands and addresses of fucntions >> void execute(size_t[] bytecode, int pc) >> { >> OpCode start = cast(OpCode)bytecode[pc]; >> pc++; >> goto start(); >> } >> >> One can get away without casting if data is in a separate array. >> Then this solution is perfectly safe in this limited form. Call would >> only point to a function hence no problem with jumping who knows where >> and less problems for optimizer. >> > > The problem here is assuming the interpreter state can be global. Once > you make it non-global (and thus have to pass it in the goto start(...) > call) you get all the overhead of a regular function call. > One pointer to a struct MyInterpreterState. Think of it as 'this' pointer :) Anyhow cramming the whole interpreter in one huge function too has downsides (and hopelessly inefficent register allocation is one). And you still access most locals via stack pointer. And then there got to be something beyond locals - 'context', that is still passed from one bytecode execution to another. BTW Globals are actually quite expansive to access anyway, this was a mock example. -- Dmitry Olshansky |
June 02, 2013 Re: Labels as values and threaded-code interpretation | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | On 6/2/13 10:44 AM, Walter Bright wrote:
> On 6/2/2013 12:49 AM, Brad Roberts wrote:
>> Many if not most of those non-unix platforms use gcc, which is included in the
>> set of compilers that does support it. The point is, which you didn't change,
>> is that's it's a defacto standard even if not technically in the c or c++
>> language specs.
>
> The curious question is why it never gets into the newer C and C++ Standards.
I can only surmise since I've never been a part of those processes, but:
1) since the support is already there for the people that want it, why go through the hassle?
2) going through the hassle just to get it more broadly supported is a lot of hassle.
3) people are generally fundamentally allergic to hassle.
Which also lines up with why people are asking for it to be added to d, because it's not already available, so it's worth the hassle.
|
June 02, 2013 Re: Labels as values and threaded-code interpretation | ||||
---|---|---|---|---|
| ||||
Posted in reply to Dmitry Olshansky | On 02-06-2013 19:58, Dmitry Olshansky wrote: > 02-Jun-2013 21:49, Alex Rønne Petersen пишет: >> On 02-06-2013 19:43, Dmitry Olshansky wrote: >>> 02-Jun-2013 20:48, Alex Rønne Petersen пишет: >>>> On 02-06-2013 10:52, Dmitry Olshansky wrote: >>>>> 01-Jun-2013 20:13, Timon Gehr пишет: >>>>>> On 06/01/2013 07:29 AM, Alex Rønne Petersen wrote: >>>>>>> Hi, >>>>>>> >>>>>>> I'm sure this has been brought up before, but I feel I need to >>>>>>> bring it >>>>>>> up again (because I'm going to be writing a threaded-code >>>>>>> interpreter): >>>>>>> http://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html >>>>>>> >>>>>>> This is an incredibly important extension. The final switch >>>>>>> statement is >>>>>>> not a replacement because it doesn't allow the programmer to store a >>>>>>> label address directly into a code stream, which is what's >>>>>>> essential to >>>>>>> write a threaded-code interpreter. >>>>>>> >>>>>> >>>>>> I'd also like to see this. >>>>>> >>>>> >>>>> Same here. >>>>> >>>>> Though I believe a way to force tail-call can support the same use >>>>> case >>>>> also helping functional programming. >>>>> >>>>> Say: >>>>> >>>>> goto Call-Expression; //forced tail call >>>>> >>>>> instead of: >>>>> >>>>> return Call-Expression; >>>>> >>>> >>>> I'm not sure that can support threaded-code interpretation. >>> >>> Why not: >>> >>> alias OpCode = function void(); >>> >>> OpCode[] opcodes = [ opcode_1, ... ]; >>> >>> int pc; >>> ... >>> >>> void opcode_1() >>> { >>> ... //pick operands do whatever >>> pc = pc + num_operands; //skip over arguments >>> OpCode next = cast(OpCode)bytecode[pc]; >>> goto next(); //this is baked-in threaded dispatch >>> } >>> >>> void opcode_2(){ ... } >>> >>> //say bytecode contains operands and addresses of fucntions >>> void execute(size_t[] bytecode, int pc) >>> { >>> OpCode start = cast(OpCode)bytecode[pc]; >>> pc++; >>> goto start(); >>> } >>> >>> One can get away without casting if data is in a separate array. >>> Then this solution is perfectly safe in this limited form. Call would >>> only point to a function hence no problem with jumping who knows where >>> and less problems for optimizer. >>> >> >> The problem here is assuming the interpreter state can be global. Once >> you make it non-global (and thus have to pass it in the goto start(...) >> call) you get all the overhead of a regular function call. >> > > One pointer to a struct MyInterpreterState. Think of it as 'this' > pointer :) That's one load and store for every single step in the interpreter. Sounds harmless, but every cycle matters when every goto start(...) amounts to a single instruction in the program you're running. > Anyhow cramming the whole interpreter in one huge function too has > downsides (and hopelessly inefficent register allocation is one). I'd frankly be surprised if that was the case. But without any investigation, it's just word against word. > And you still access most locals via stack pointer. And then there got > to be something beyond locals - 'context', that is still passed from one > bytecode execution to another. It depends on what you mean by context. Can you elaborate? > > BTW Globals are actually quite expansive to access anyway, this was a > mock example. > Sure. Either way, evidence suggests that this style of interpreter has performance advantages in practice. -- Alex Rønne Petersen alex@alexrp.com / alex@lycus.org http://alexrp.com / http://lycus.org |
June 02, 2013 Re: Labels as values and threaded-code interpretation | ||||
---|---|---|---|---|
| ||||
Posted in reply to Dmitry Olshansky | On 6/2/2013 10:58 AM, Dmitry Olshansky wrote:
> Anyhow cramming the whole interpreter in one huge function too has downsides
> (and hopelessly inefficent register allocation is one).
This should not be the case with register allocation using live ranges.
You can also help with this by organizing declarations so the variables have the shortest scope.
|
June 02, 2013 Re: Labels as values and threaded-code interpretation | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | 02-Jun-2013 22:54, Walter Bright пишет: > On 6/2/2013 10:58 AM, Dmitry Olshansky wrote: >> Anyhow cramming the whole interpreter in one huge function too has >> downsides >> (and hopelessly inefficent register allocation is one). > > This should not be the case with register allocation using live ranges. > The problem is that if every goto *get_me_new_opcode compile may have no idea of where it will through it with computed labels. Hence some conservativeness of what registers contain and reloads and in effect all labels end up as isolated "functions". Plus a big stack frame is liability. And then there is a fair share of obstacles in making the kind of threaded interpreter code optimized properly, this post I find interesting: http://article.gmane.org/gmane.comp.lang.lua.general/75426 > You can also help with this by organizing declarations so the variables > have the shortest scope. Then splitting opcodes as functions separates their timespan even more. -- Dmitry Olshansky |
June 02, 2013 Re: Labels as values and threaded-code interpretation | ||||
---|---|---|---|---|
| ||||
Posted in reply to Alex Rønne Petersen | 02-Jun-2013 22:25, Alex Rønne Petersen пишет: > On 02-06-2013 19:58, Dmitry Olshansky wrote: >>> >>> The problem here is assuming the interpreter state can be global. Once >>> you make it non-global (and thus have to pass it in the goto start(...) >>> call) you get all the overhead of a regular function call. >>> >> >> One pointer to a struct MyInterpreterState. Think of it as 'this' >> pointer :) > > That's one load and store for every single step in the interpreter. > Sounds harmless, but every cycle matters when every goto start(...) > amounts to a single instruction in the program you're running. > >> Anyhow cramming the whole interpreter in one huge function too has >> downsides (and hopelessly inefficent register allocation is one). > > I'd frankly be surprised if that was the case. But without any > investigation, it's just word against word. See my reply to Walter. Again I personally have not done any measurements but what that post by LuaJIt author makes a lot of sense to me. > >> And you still access most locals via stack pointer. And then there got >> to be something beyond locals - 'context', that is still passed from one >> bytecode execution to another. > > It depends on what you mean by context. Can you elaborate? > Context is what a virtual thread represents in the global world. Any outside hooks into it like argv in C's main are there, function tables (=extensions) and whatnot. -- Dmitry Olshansky |
June 02, 2013 Re: Labels as values and threaded-code interpretation | ||||
---|---|---|---|---|
| ||||
Posted in reply to Alex Rønne Petersen | 02-Jun-2013 22:25, Alex Rønne Petersen пишет: > On 02-06-2013 19:58, Dmitry Olshansky wrote: >> >> One pointer to a struct MyInterpreterState. Think of it as 'this' >> pointer :) > > That's one load and store for every single step in the interpreter. > Sounds harmless, but every cycle matters when every goto start(...) > amounts to a single instruction in the program you're running. > If the instruction set is RISC-y then you are in trouble anyway (writing interpreter not in ASM). One way out is fusing multiple simple instructions into macro-instructions and interpreting those. > Either way, evidence suggests that this style of interpreter has > performance advantages in practice. > Agreed. Though this day I'd go for what I call "bastardized JIT". In essense just produce a chunk of "call XYZ" instructions, comnditionals would have to have a test + jmp. And then there is 'ret' at the end of this chunk ~4 instructions to have to deal with. What you get is branch prediction done on the hardware level and no instruction counter to worry about. -- Dmitry Olshansky |
June 02, 2013 Re: Labels as values and threaded-code interpretation | ||||
---|---|---|---|---|
| ||||
Posted in reply to Dmitry Olshansky | On 6/2/2013 1:10 PM, Dmitry Olshansky wrote:
> The problem is that if every goto *get_me_new_opcode compile may have no idea of
> where it will through it with computed labels. Hence some conservativeness of
> what registers contain and reloads and in effect all labels end up as isolated
> "functions". Plus a big stack frame is liability.
>
> And then there is a fair share of obstacles in making the kind of threaded
> interpreter code optimized properly, this post I find interesting:
> http://article.gmane.org/gmane.comp.lang.lua.general/75426
Thanks for the interesting article.
|
Copyright © 1999-2021 by the D Language Foundation