Table lookups - this is pretty definitive (page 7) - D Programming Language Discussion Forum

On Monday, 7 April 2014 at 11:35:37 UTC, Martin Nowak wrote: > On 04/01/2014 08:35 PM, Walter Bright wrote: >> Try this benchmark comparing various classification schemes: > > Counter example: > https://github.com/D-Programming-Language/phobos/commit/8599d6b4acde0eff886876a56f54ecd8e6b528e9 It's all a matter of how complicated the "non-table" implementation is I guess. table-lookup has a "steep entry cost", but is constant, regardless of complexity. Table lookups were "recently" removed from std.ascii too. The implementation walter is benching "against" is particularly complicated here.

On 4/7/2014 2:41 AM, "Ola Fosheim Grøstad" <ola.fosheim.grostad+dlang@gmail.com>" wrote: >> on. Unfortunately for your small program, gcc is too smart and >> optimises everything away. > > Haha, that's funny. But why don't you put the data in a separate compilation unit? True, it's pretty easy to defeat that optimization.

I added a lookup scheme of my own, its not as fast as Walters (in fact its the slowest without -inline - release -O) but it uses 1 bit per entry in the table instead of a whole byte so you can have lots and lots of different tables. I'm even reasonably sure that it works correctly! =============================== import core.stdc.stdlib; import core.stdc.string; import std.algorithm; import std.array; import std.ascii; import std.datetime; import std.range; import std.stdio; import std.traits; bool isIdentifierChar0(ubyte c) { return isAlphaNum(c) || c == '_' || c == '$'; } bool isIdentifierChar1(ubyte c) { return ((c >= '0' || c == '$') && (c <= '9' || c >= 'A') && (c <= 'Z' || c >= 'a' || c == '_') && (c <= 'z')); } immutable bool[256] tab2; static this() { for (size_t u = 0; u < 0x100; ++u) { tab2[u] = isIdentifierChar0(cast(ubyte)u); } } bool isIdentifierChar2(ubyte c) { return tab2[c]; } immutable ulong[4] tab3; static this() { for (size_t u = 0; u < 0x100; ++u) { if (isIdentifierChar0(cast(ubyte)u)) { auto sub = u >>> 6; auto b = u & 0x3f; auto mask = 0x01L << b; tab3[sub] |= mask; } } } bool isIdentifierChar3(ubyte c) { auto sub = c >>> 6; c &= 0x3f; auto mask = 0x01L << c; return (tab3[sub] & mask) > 0; } int f0() { int x; for (uint u = 0; u < 0x100; ++u) { x += isIdentifierChar0(cast(ubyte)u); } return x; } int f1() { int x; for (uint u = 0; u < 0x100; ++u) { x += isIdentifierChar1(cast(ubyte)u); } return x; } int f2() { int x; for (uint u = 0; u < 0x100; ++u) { x += isIdentifierChar2(cast(ubyte)u); } return x; } int f3() { int x; for (uint u = 0; u < 0x100; ++u) { x += isIdentifierChar3(cast(ubyte)u); } return x; } void main() { auto r = benchmark!(f0, f1, f2, f3)(10_000); writefln("Milliseconds %s %s %s %s", r[0].msecs, r[1].msecs, r[2].msecs, r[3].msecs); }

April 17, 2014

Re: Table lookups - this is pretty definitive

Posted by ixid
in reply to Alix Pexton

Permalink

ixid

Posted in reply to Alix Pexton

Permalink

On Thursday, 17 April 2014 at 16:27:26 UTC, Alix Pexton wrote:
> I added a lookup scheme of my own, its not as fast as Walters (in fact its the slowest without -inline - release -O) but it uses 1 bit per entry in the table instead of a whole byte so you can have lots and lots of different tables. I'm even reasonably sure that it works correctly!
>
>
> ===============================
> import core.stdc.stdlib;
> import core.stdc.string;
>
> import std.algorithm;
> import std.array;
> import std.ascii;
> import std.datetime;
> import std.range;
> import std.stdio;
> import std.traits;
>
> bool isIdentifierChar0(ubyte c)
> {
>     return isAlphaNum(c) || c == '_' || c == '$';
> }
>
> bool isIdentifierChar1(ubyte c)
> {
>     return ((c >= '0' || c == '$') &&
>             (c <= '9' || c >= 'A')  &&
>             (c <= 'Z' || c >= 'a' || c == '_') &&
>             (c <= 'z'));
> }
>
> immutable bool[256] tab2;
> static this()
> {
>     for (size_t u = 0; u < 0x100; ++u)
>     {
>         tab2[u] = isIdentifierChar0(cast(ubyte)u);
>     }
> }
>
> bool isIdentifierChar2(ubyte c)
> {
>     return tab2[c];
> }
>
> immutable ulong[4] tab3;
> static this()
> {
>     for (size_t u = 0; u < 0x100; ++u)
>     {
>         if (isIdentifierChar0(cast(ubyte)u))
>         {
>             auto sub = u >>> 6;
>             auto b = u & 0x3f;
>             auto mask = 0x01L << b;
>             tab3[sub] |= mask;
>         }
>     }
> }
>
> bool isIdentifierChar3(ubyte c)
> {
> 	auto sub = c >>> 6;
> 	c &= 0x3f;
> 	auto mask = 0x01L << c;
> 	return (tab3[sub] & mask) > 0;
> }
>
> int f0()
> {
>     int x;
>     for (uint u = 0; u < 0x100; ++u)
>     {
>         x += isIdentifierChar0(cast(ubyte)u);
>     }
>     return x;
> }
>
> int f1()
> {
>     int x;
>     for (uint u = 0; u < 0x100; ++u)
>     {
>         x += isIdentifierChar1(cast(ubyte)u);
>     }
>     return x;
> }
>
> int f2()
> {
>     int x;
>     for (uint u = 0; u < 0x100; ++u)
>     {
>         x += isIdentifierChar2(cast(ubyte)u);
>     }
>     return x;
> }
>
> int f3()
> {
>     int x;
>     for (uint u = 0; u < 0x100; ++u)
>     {
>         x += isIdentifierChar3(cast(ubyte)u);
>     }
>     return x;
> }
>
> void main()
> {
>     auto r = benchmark!(f0, f1, f2, f3)(10_000);
>     writefln("Milliseconds %s %s %s %s", r[0].msecs, r[1].msecs, r[2].msecs, r[3].msecs);
> }

I feel like there must be a way of making a fast bit look up but my version is only moderate in speed. You can get all the bits you need on two 64 bit registers or one SSE register. I haven't tried bt, does that work with a 64 bit register?

On Thursday, 17 April 2014 at 18:07:24 UTC, ixid wrote: > I feel like there must be a way of making a fast bit look up but my version is only moderate in speed. You can get all the bits you need on two 64 bit registers or one SSE register. I haven't tried bt, does that work with a 64 bit register? http://dlang.org/phobos/core_bitop.html#.bt ? Note it can be applied to the table in general, rather than the byte themselves. EG: ubyte[256] buf; auto b = bt(buf.ptr, 428);

On 17/04/2014 8:41 PM, monarch_dodra wrote: > On Thursday, 17 April 2014 at 18:07:24 UTC, ixid wrote: >> I feel like there must be a way of making a fast bit look up but my >> version is only moderate in speed. You can get all the bits you need >> on two 64 bit registers or one SSE register. I haven't tried bt, does >> that work with a 64 bit register? > > http://dlang.org/phobos/core_bitop.html#.bt > > ? > > Note it can be applied to the table in general, rather than the byte > themselves. EG: > > ubyte[256] buf; > auto b = bt(buf.ptr, 428); I didn't think to look in core.bitop for a faster way to check bits, I'll check if it closes the gap to Walter's version. A...

I tested this against the others... (with "-inline -release -O" of course) === uint[8] tab4; // bitop function only work on uints static this() { for (size_t u = 0; u < 0x100; ++u) { if (isIdentifierChar0(cast(ubyte)u)) { bts(tab4.ptr, u); } } } bool isIdentifierChar4(ubyte c) { return bt(tab4.ptr, c) != 0; } === it takes about 3 times as long (about the same as the original isIdentifierChar1, that used lots of &&s and ||s). So 2 shifts, 2 ands and a compare to zero, beats core.bitop. for clarity, the output of my benchmark for all 5 version was Milliseconds 21 15 2 5 15 the 2 (which on some runs was a 3) is Walter's table of bools, the 5 is my table of ulongs and the 15 on the end is core.bitop. Short of delving into the asm, which I'm trying to avoid, any other suggestions for speed-ups? A...

Forums