July 25, 2012 Re: Computed gotos on Reddit | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | On 25-Jul-12 23:58, Walter Bright wrote: > On 7/25/2012 12:52 PM, Dmitry Olshansky wrote: >> On 25-Jul-12 21:47, Walter Bright wrote: >>> On 7/25/2012 10:29 AM, Dmitry Olshansky wrote: >>>>> Is it possible you could code it up and test it using inline asm? >> ... >> >> >>>> Any tips on which spare registers to use (I guess ecx is no go, as >>>> there is >>>> 'this' pointer present) ? >>> >>> I wouldn't worry about it. EAX is good. >>> >> OK. I'm almost there, here is what I have: >> //my byte code sets 8-bit to discern code/data >> uint c = re.ir[t.pc].code - 128; >> //no idea how to code the above in asm >> //.code is { return x & mask; } property >> asm{ >> mov EAX, c; >> lea EAX, L_jumptable[EAX][EAX*4]; >> jmp EAX; >> } >> L_jumptable: >> mixin(`asm{` >> ~ genJumpTable() >> ~ `} `); >> >> So I have proper table generated and it all goes fine untill I get: >> >> std\regex.d(5118): Error: undefined identifier 'L_jumptable' >> > > I was afraid of that. You may have to approximate it by loading the > address of L_jumptable into a register and adding it in instead of using > the addressing mode. like this ? mov EDX, L_jumpable move EAX, EDX[EAX][EAX*4] doesn't work. Seems like label is nonexistent anywhere but jump instruction. Will this one do it: lea EAX, $[EAX+5][EAX*4]; jmp EAX; Compiles. Maybe I miscalculated though. Indirect jump has size of ? >Then, add those extra instructions as dummies into > the other path. Ehm? Other path like what? You mean compare it with jump table that uses plain addresses? Please expand a bit. -- Dmitry Olshansky |
July 25, 2012 Re: Computed gotos on Reddit | ||||
---|---|---|---|---|
| ||||
Posted in reply to Dmitry Olshansky | On 26-Jul-12 00:06, Dmitry Olshansky wrote: > On 25-Jul-12 23:58, Walter Bright wrote: >> On 7/25/2012 12:52 PM, Dmitry Olshansky wrote: >>> On 25-Jul-12 21:47, Walter Bright wrote: >>>> On 7/25/2012 10:29 AM, Dmitry Olshansky wrote: >>>>>> Is it possible you could code it up and test it using inline asm? >>> ... >>> >>> >>>>> Any tips on which spare registers to use (I guess ecx is no go, as >>>>> there is >>>>> 'this' pointer present) ? >>>> >>>> I wouldn't worry about it. EAX is good. >>>> >>> OK. I'm almost there, here is what I have: >>> //my byte code sets 8-bit to discern code/data >>> uint c = re.ir[t.pc].code - 128; >>> //no idea how to code the above in asm >>> //.code is { return x & mask; } property >>> asm{ >>> mov EAX, c; >>> lea EAX, L_jumptable[EAX][EAX*4]; >>> jmp EAX; >>> } >>> L_jumptable: >>> mixin(`asm{` >>> ~ genJumpTable() >>> ~ `} `); >>> >>> So I have proper table generated and it all goes fine untill I get: >>> >>> std\regex.d(5118): Error: undefined identifier 'L_jumptable' >>> >> >> I was afraid of that. You may have to approximate it by loading the >> address of L_jumptable into a register and adding it in instead of using >> the addressing mode. > like this ? > mov EDX, L_jumptable > mov EAX, EDX[EAX][EAX*4] Corrected typos. Still it doesn't take off. -- Dmitry Olshansky |
July 25, 2012 Re: Computed gotos on Reddit | ||||
---|---|---|---|---|
| ||||
Posted in reply to Dmitry Olshansky | On 07/25/2012 01:06 PM, Dmitry Olshansky wrote: > On 25-Jul-12 23:58, Walter Bright wrote: >> I was afraid of that. You may have to approximate it by loading the >> address of L_jumptable into a register and adding it in instead of using >> the addressing mode. > like this ? > mov EDX, L_jumpable I hope it is just that typo! :p Ali |
July 25, 2012 Re: Computed gotos on Reddit | ||||
---|---|---|---|---|
| ||||
Posted in reply to Dmitry Olshansky | >>>> std\regex.d(5118): Error: undefined identifier 'L_jumptable' >>>> >>> >>> I was afraid of that. You may have to approximate it by loading the >>> address of L_jumptable into a register and adding it in instead of using >>> the addressing mode. I failed to load it in any register or interact with it in any way. I think I've stalled. There has to be a way to get label address somehow, I got tired of guess and shot :( BTW that would allow us to do computed gotos but only in inline asm. -- Dmitry Olshansky |
July 25, 2012 Re: Computed gotos on Reddit | ||||
---|---|---|---|---|
| ||||
Posted in reply to Dmitry Olshansky | On 7/25/2012 2:55 PM, Dmitry Olshansky wrote:
>>>>> std\regex.d(5118): Error: undefined identifier 'L_jumptable'
>>>>>
>>>>
>>>> I was afraid of that. You may have to approximate it by loading the
>>>> address of L_jumptable into a register and adding it in instead of using
>>>> the addressing mode.
>
> I failed to load it in any register or interact with it in any way.
> I think I've stalled. There has to be a way to get label address somehow, I got
> tired of guess and shot :(
>
> BTW that would allow us to do computed gotos but only in inline asm.
>
How to get an address:
call jump_table
L1:
...
jump_table:
pop EAX
jmp L1
.. the rest of the jump table ...
Yes, it's awful, but we're just trying to take some measurements here, so it doesn't matter.
What I meant was add these extra instructions in to the switch version as dummies in order to make the extra time they take irrelevant to looking at the difference.
|
July 26, 2012 Re: Computed gotos on Reddit | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | On 7/25/2012 10:19 AM, Walter Bright wrote: > Is it possible you could code it up and test it using inline asm? I dummied up some code to do it: int test(int i) { switch (i) { case 3: i += 3; break; case 4: i += 4; break; case 5: i += 5; break; case 6: i += 6; break; case 7: i += 7; break; case 8: i += 8; break; default: i += 100; break; } return i; } enter 4,0 push EBX mov -4[EBP],EAX mov EBX,EAX sub EBX,3 cmp EBX,5 ja L5D lea ECX,_D3foo4testFiZi[01Bh][EBX*4][EBX] jmp ECX jmp near ptr L39 jmp near ptr L3F jmp near ptr L45 jmp near ptr L4B jmp near ptr L51 jmp near ptr L57 L39: add dword ptr -4[EBP],3 jmp short L61 L3F: add dword ptr -4[EBP],4 jmp short L61 L45: add dword ptr -4[EBP],5 jmp short L61 L4B: add dword ptr -4[EBP],6 jmp short L61 L51: add dword ptr -4[EBP],7 jmp short L61 L57: add dword ptr -4[EBP],8 jmp short L61 L5D: add dword ptr -4[EBP],064h L61: mov EAX,-4[EBP] pop EBX leave ret =============================================== Sadly, it's significantly slower than: enter 4,0 push EBX mov -4[EBP],EAX mov EBX,EAX sub EBX,3 cmp EBX,5 ja L3D jmp dword ptr FLAT:_DATA[00h][EBX*4] add dword ptr -4[EBP],3 jmp short L41 add dword ptr -4[EBP],4 jmp short L41 add dword ptr -4[EBP],5 jmp short L41 add dword ptr -4[EBP],6 jmp short L41 add dword ptr -4[EBP],7 jmp short L41 add dword ptr -4[EBP],8 jmp short L41 L3D: add dword ptr -4[EBP],064h L41: mov EAX,-4[EBP] pop EBX leave ret Maybe the branch prediction logic doesn't work for JMP ECX. |
July 26, 2012 Re: Computed gotos on Reddit | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | On 7/25/2012 11:24 PM, Walter Bright wrote:
> jmp near ptr L39
> jmp near ptr L3F
> jmp near ptr L45
> jmp near ptr L4B
> jmp near ptr L51
> jmp near ptr L57
I tried replacing these with 2 byte jmp instructions, instead of 5 byte ones, but no improvement.
|
July 26, 2012 Re: Computed gotos on Reddit | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | On 26/07/12 00:46, Walter Bright wrote:
> On 7/25/2012 2:55 PM, Dmitry Olshansky wrote:
>>>>>> std\regex.d(5118): Error: undefined identifier 'L_jumptable'
>>>>>>
>>>>>
>>>>> I was afraid of that. You may have to approximate it by loading the
>>>>> address of L_jumptable into a register and adding it in instead of
>>>>> using
>>>>> the addressing mode.
>>
>> I failed to load it in any register or interact with it in any way.
>> I think I've stalled. There has to be a way to get label address
>> somehow, I got
>> tired of guess and shot :(
>>
>> BTW that would allow us to do computed gotos but only in inline asm.
>>
>
> How to get an address:
>
> call jump_table
> L1:
>
> ...
> jump_table:
> pop EAX
> jmp L1
> .. the rest of the jump table ...
>
> Yes, it's awful, but we're just trying to take some measurements here,
> so it doesn't matter.
>
> What I meant was add these extra instructions in to the switch version
> as dummies in order to make the extra time they take irrelevant to
> looking at the difference.
But doing that screws up the CPU"s stack prediction so badly that it will dominate the timing
At least do something like:
jump_table:
move EAX, [ESP]
ret
|
July 26, 2012 Re: Computed gotos on Reddit | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | On 26-Jul-12 10:24, Walter Bright wrote: > On 7/25/2012 10:19 AM, Walter Bright wrote: >> Is it possible you could code it up and test it using inline asm? > > I dummied up some code to do it: > > int test(int i) > { > switch (i) > { > case 3: i += 3; break; > case 4: i += 4; break; > case 5: i += 5; break; > case 6: i += 6; break; > case 7: i += 7; break; > case 8: i += 8; break; > default: i += 100; break; > } > return i; > } > Do the above in loop. And more cases of course. Something around 40 should do. I'm still figuring out why my inline asm version segfaults :) > enter 4,0 > push EBX > mov -4[EBP],EAX > mov EBX,EAX > sub EBX,3 > cmp EBX,5 > ja L5D > lea ECX,_D3foo4testFiZi[01Bh][EBX*4][EBX] > jmp ECX > jmp near ptr L39 > jmp near ptr L3F > jmp near ptr L45 > jmp near ptr L4B > jmp near ptr L51 > jmp near ptr L57 > L39: add dword ptr -4[EBP],3 > jmp short L61 > L3F: add dword ptr -4[EBP],4 > jmp short L61 > L45: add dword ptr -4[EBP],5 > jmp short L61 > L4B: add dword ptr -4[EBP],6 > jmp short L61 > L51: add dword ptr -4[EBP],7 > jmp short L61 > L57: add dword ptr -4[EBP],8 > jmp short L61 > L5D: add dword ptr -4[EBP],064h > L61: mov EAX,-4[EBP] > pop EBX > leave > ret > > =============================================== > Sadly, it's significantly slower than: > > enter 4,0 > push EBX > mov -4[EBP],EAX > mov EBX,EAX > sub EBX,3 > cmp EBX,5 > ja L3D > jmp dword ptr FLAT:_DATA[00h][EBX*4] > add dword ptr -4[EBP],3 > jmp short L41 > add dword ptr -4[EBP],4 > jmp short L41 > add dword ptr -4[EBP],5 > jmp short L41 > add dword ptr -4[EBP],6 > jmp short L41 > add dword ptr -4[EBP],7 > jmp short L41 > add dword ptr -4[EBP],8 > jmp short L41 > L3D: add dword ptr -4[EBP],064h > L41: mov EAX,-4[EBP] > pop EBX > leave > ret > > Maybe the branch prediction logic doesn't work for JMP ECX. -- Dmitry Olshansky |
July 26, 2012 Re: Computed gotos on Reddit | ||||
---|---|---|---|---|
| ||||
Posted in reply to Don Clugston | On 26-Jul-12 11:56, Don Clugston wrote: > But doing that screws up the CPU"s stack prediction so badly that it > will dominate the timing > At least do something like: > > jump_table: > move EAX, [ESP] > ret > BTW this seems to be a roundabout way to get address of label that I can use do a threaded code interpreter. Basically every branch is: L_opcode_1: asm { mov EAX, [ESP]; ret; } ... real code here L_opcode_2: asm { mov EAX, [ESP]; ret; } ... real code here Then "compile" step looks like this: while(not_last_opcode(code[pc]){ size_t c = code[pc]; switch(c){ case op_1: asm{ call L_opcode_1; add EAX, 4; mov c, EAX; } break; case op_2: ... } code[pc] = c; //now we have label address pc++; } //interpret: pc = 0; size_t op = code[pc]; asm { mov EAX, op; jump eax; } //here we go, almost computed goto Obviously it's backwards and awful. Makes me wonder why can't we take it directly, what's limitation ? How about allowing it, at least in inline assembly? -- Dmitry Olshansky |
Copyright © 1999-2021 by the D Language Foundation