Jump to page: 1 2
Thread overview
Fascinating new switch mechanism in assembler
Mar 18, 2006
Dan Lewis
Mar 18, 2006
Sean Kelly
Re: Fascinating new switch mechanism in assembler (jump table)
Mar 18, 2006
Dan Lewis
Re: jump table (improved?)
Mar 18, 2006
Dan Lewis
Mar 18, 2006
Walter Bright
Mar 20, 2006
Lionello Lunesu
Mar 21, 2006
Walter Bright
Apr 15, 2006
Dan Lewis
Apr 15, 2006
Lionello Lunesu
Apr 22, 2006
Dan
Apr 26, 2006
Lionello Lunesu
March 18, 2006
Hi guys,
I've been working on a lexical analyzer for my new scripting engine, and I
stumbled upon an interesting algorithm.

Essentially:

goto ((charCode << 2)+subroutinePointerArray);

The advantage of this strategy is the elimination of some 90% of the conditional testing for each character in a string being interpreted (Yay for code that runs in 40% the time!).  The disadvantage is that you need a table of pointers; if your cache requirements get too big you can suffer cache thrashing in the scanner loop which will slow it down even more (-900%?) so you need to keep the rest of the scanner relatively compact.

Provided with a table of 256 pointers, we would have a single-jump unconditional branch to get to the correct handler for each character.  Further refining the process, I'm skipping characters >0x7F, and masking 0x60-0x7F onto 0x40-0x5F; which gives you essentially the same handler functions (provided you still preserve the original character somewhere, it makes no difference)  Then instead of taking 1kb of cache, it only takes 380 bytes for the table at a cost of introducing two conditional branches; one of which is often mispredicted.

So I have:

// ...
asm
{
naked;
cld;
mov	ECX,	len;
mov	ESI,	p;
xor	EAX,	EAX;
L1:	lodsb;
jecxz	short LX;
mov	EBX,	EAX;
and	EBX,	0xFFFF_FF80;
jnz	BAIL;
mov	EDX,	EAX;
mov	EBX,	EAX;
shr	EDX,	5;
cmp	EDX,	3;
je	short J1;
J2:	shl	EBX,	2;
add	EBX,	jumpGateP;
jmp	[EBX];
J1:	sub	EBX,	byte 0x20;
jmp	short J2; // ...

I'm hoping for some feedback and progression of the idea; and if it's already been out there for ages, perhaps a link or two on the subject?

I'm just waddling through D, so right now I'm struggling to figure out how to get my jumpGate table variables to point to functions.  :p  Once I'm done that, I'll write the rest of it.

(c) Dan Lewis, 2006 (contact me, I'm typically cooperative)
http://murpsoft.com
March 18, 2006
Dan Lewis wrote:
> Hi guys,
> I've been working on a lexical analyzer for my new scripting engine, and I
> stumbled upon an interesting algorithm.
> 
> Essentially:
> 
> goto ((charCode << 2)+subroutinePointerArray);
...
> I'm hoping for some feedback and progression of the idea; and if it's already
> been out there for ages, perhaps a link or two on the subject?

Sounds like you're describing a Jump Table.  Here's a PDF I dug up on the subject: http://www.inf.ethz.ch/personal/muellren/sysprog/ws05/session2-ifwd42rm2.pdf


Sean
March 18, 2006
>
>Sounds like you're describing a Jump Table.  Here's a PDF I dug up on the subject: http://www.inf.ethz.ch/personal/muellren/sysprog/ws05/session2-ifwd42rm2.pdf
>
>
>Sean

Oh.  I figured it was a little too simple to be a new idea.  From the material I've read on the 'net, I'm getting the idea that I should let the 128 bytes sit there to avoid the unpredictable branch; but that my jump table is otherwise alot faster than what most C compilers would generate.  : )

I was thinking I might be able to squeeze an extra few cycles out of each character if I knew what I was doing... but I'll save that for version 1.0.



March 18, 2006
Hi folks.  Here's a gimme for anyone out there writing a lexer.  Give it three variables, and you have a tight jump table switch.  I'm about to port this over to nasm to avoid the GC/Phobos overhead.  :p

asm
{
naked;
even;
cld;
mov	ECX,	strLength;
mov	ESI,	charP;
xor	EAX,	EAX;
L1:	lodsb;
jecxz	short EXIT_SUCCESS;
test	AL,	0x80;
jnz	short EXIT_BAD_CHAR;
lea	EBX,	[AL*4+jumpTableP]
jmp	EBX;
}


March 18, 2006
"Dan Lewis" <Dan_member@pathlink.com> wrote in message news:dvfnur$a9$1@digitaldaemon.com...
> Hi guys,
> I've been working on a lexical analyzer for my new scripting engine, and I
> stumbled upon an interesting algorithm.
>
> Essentially:
>
> goto ((charCode << 2)+subroutinePointerArray);
>
> The advantage of this strategy is the elimination of some 90% of the
> conditional
> testing for each character in a string being interpreted (Yay for code
> that runs
> in 40% the time!).

Given:

int test(int i)
{
    switch (i)
    {
 case 1:
 case 2:
 case 3:
 case 4:
 case 5:
 case 6:
 case 7:
     return i;
 default:
     return i + 1;
    }
}

The assembler produced is:

?test@@YAHH@Z:
  push EBX
  mov EBX,8[ESP]
  sub EBX,1
  cmp EBX,6
  ja L1A
  jmp dword ptr FLAT:_DATA[00h][EBX*4]
  mov EAX,8[ESP]
  pop EBX
  ret
L1A:  mov EAX,8[ESP]
  inc EAX
  pop EBX
  ret
_TEXT ends
_DATA segment
 dd offset FLAT:?test@@YAHH@Z[014h]
 dd offset FLAT:?test@@YAHH@Z[014h]
 dd offset FLAT:?test@@YAHH@Z[014h]
 dd offset FLAT:?test@@YAHH@Z[014h]
 dd offset FLAT:?test@@YAHH@Z[014h]
 dd offset FLAT:?test@@YAHH@Z[014h]
 dd offset FLAT:?test@@YAHH@Z[014h]
_DATA ends



March 20, 2006
> ?test@@YAHH@Z:
>  push EBX
>  mov EBX,8[ESP]
>  sub EBX,1
>  cmp EBX,6
>  ja L1A
>  jmp dword ptr FLAT:_DATA[00h][EBX*4]
>  mov EAX,8[ESP]
>  pop EBX
>  ret
> L1A:  mov EAX,8[ESP]
>  inc EAX
>  pop EBX
>  ret
> _TEXT ends
> _DATA segment
> dd offset FLAT:?test@@YAHH@Z[014h]
> dd offset FLAT:?test@@YAHH@Z[014h]
> dd offset FLAT:?test@@YAHH@Z[014h]
> dd offset FLAT:?test@@YAHH@Z[014h]
> dd offset FLAT:?test@@YAHH@Z[014h]
> dd offset FLAT:?test@@YAHH@Z[014h]
> dd offset FLAT:?test@@YAHH@Z[014h]
> _DATA ends

In fact, it's even better. I got this code (dmd -O -release -inline):

00402010 push ebx
00402011 mov ebx,eax
00402013 sub ebx,1
00402016 mov ecx,eax
00402018 cmp ebx,6
0040201B ja 00402026
0040201D jmp dword ptr [ebx*4+411080h]
00402024 pop ebx
00402025 ret
00402026 pop ebx
00402027 lea eax,[ecx+1]
0040202A ret

Although the point is clear (the compiler already uses jump tables) the code does not seem optimal. Of course, my assembly knowledge is kind-of rusty (instruction pairing for the pentium 1 is the latest optimization I know of).

First of all the jmp is useless (in fact, the jump table is useless). The code already compares the "default:" case, so there's no need to further differentiate between the actual values. I would have written the switch as follows:

  push ebx
  mov ebx,eax
  dec ebx
  mov ecx,eax
  cmp ebx,6
  jbe L1A
  lea eax,[ecx+1]
L1A:
  pop ebx
  ret

What's the reason for the "sub ebx, 1", intead of "dec ebx"? Isn't an instruction using an immediate value slower (larger instruction => less instructions in cache) than one without? (At least that's what I remember from the pentium 1).

Would the usage of the instructions setbe/seta improve the above code even more? It'll get rid of the conditional jump, but I have no idea what the performance of those set* instructions are.

L.


March 21, 2006
"Lionello Lunesu" <lio@remove.lunesu.com> wrote in message news:dvm0dd$2bhp$1@digitaldaemon.com...
> What's the reason for the "sub ebx, 1", intead of "dec ebx"? Isn't an instruction using an immediate value slower (larger instruction => less instructions in cache) than one without? (At least that's what I remember from the pentium 1).

Not necessarilly. It depends on which version of the CPU.


April 15, 2006
>Given:
>
>int test(int i)
>{
>    switch (i)
>    {
> case 1:
> case 2:
> case 3:
> case 4:
> case 5:
> case 6:
> case 7:
>     return i;
> default:
>     return i + 1;
>    }
>}
>
>The assembler produced is:
>
>?test@@YAHH@Z:
>  push EBX
>  mov EBX,8[ESP]
>  sub EBX,1
>  cmp EBX,6
>  ja L1A
>  jmp dword ptr FLAT:_DATA[00h][EBX*4]
>  mov EAX,8[ESP]
>  pop EBX
>  ret
>L1A:  mov EAX,8[ESP]
>  inc EAX
>  pop EBX
>  ret
>_TEXT ends
>_DATA segment
> dd offset FLAT:?test@@YAHH@Z[014h]
> dd offset FLAT:?test@@YAHH@Z[014h]
> dd offset FLAT:?test@@YAHH@Z[014h]
> dd offset FLAT:?test@@YAHH@Z[014h]
> dd offset FLAT:?test@@YAHH@Z[014h]
> dd offset FLAT:?test@@YAHH@Z[014h]
> dd offset FLAT:?test@@YAHH@Z[014h]
>_DATA ends
>

Walter - I *think* your jump gate is jumping *into* the table.  If I'm wrong, then disregard all this...

~~~

This is bad because you're flushing the code cash twice to get where you're going, and the jump instruction itself is being stored repeatedly.  Instead you want to load the 'final' jump address into a register and jump to that address. Then the table is essentially a void*[].

so:

mov eax, table[x];
jmp [eax];

dd a;
dd b;

is better than

mov eax, table[x];
jmp eax;

table:
jmp a;
jmp b;

This will roughly half the memory size.  That said, I plan to use a D switch and trust you'll optimize it well enough.  :)  That way my code is more portable and clean.  Thanks for showing that!

Sincerely, Dan.


April 15, 2006
I'm no Walter, but I think he had it right.

>> jmp dword ptr FLAT:_DATA[00h][EBX*4]

It's not jumping to "ebx" but to "[ebx*4], which means the value (dword) at (ebx*4) is read first.

The *4 also shows that the table contains dwords: the offsets of the code jumped to for each case.
>> dd offset FLAT:?test@@YAHH@Z[014h]

L.
April 22, 2006
In article <e1qf9o$1me0$1@digitaldaemon.com>, Lionello Lunesu says...
>
>I'm no Walter, but I think he had it right.
>
> >> jmp dword ptr FLAT:_DATA[00h][EBX*4]
>
>It's not jumping to "ebx" but to "[ebx*4], which means the value (dword) at (ebx*4) is read first.
>
>The *4 also shows that the table contains dwords: the offsets of the code jumped to for each case.
> >> dd offset FLAT:?test@@YAHH@Z[014h]
>
>L.

Lionello, I have no idea what "dd offset FLAT:?test@@YAHH@Z[014h]" means.  I can read assembly, even a touch of hex, but I skipped gibberish class.  Anyways, [EBX*4] makes sense to me, so I know you're right.  : )



« First   ‹ Prev
1 2