April 01, 2014 Table lookups - this is pretty definitive | ||||
---|---|---|---|---|
| ||||
Try this benchmark comparing various classification schemes: --------------------------------- import core.stdc.stdlib; import core.stdc.string; import std.algorithm; import std.array; import std.ascii; import std.datetime; import std.range; import std.stdio; import std.traits; bool isIdentifierChar0(ubyte c) { return isAlphaNum(c) || c == '_' || c == '$'; } bool isIdentifierChar1(ubyte c) { return ((c >= '0' || c == '$') && (c <= '9' || c >= 'A') && (c <= 'Z' || c >= 'a' || c == '_') && (c <= 'z')); } immutable bool[256] tab2; static this() { for (size_t u = 0; u < 0x100; ++u) { tab2[u] = isIdentifierChar0(cast(ubyte)u); } } bool isIdentifierChar2(ubyte c) { return tab2[c]; } /*********************************************/ int f0() { int x; for (uint u = 0; u < 0x100; ++u) { x += isIdentifierChar0(cast(ubyte)u); } return x; } int f1() { int x; for (uint u = 0; u < 0x100; ++u) { x += isIdentifierChar1(cast(ubyte)u); } return x; } int f2() { int x; for (uint u = 0; u < 0x100; ++u) { x += isIdentifierChar2(cast(ubyte)u); } return x; } void main() { auto r = benchmark!(f0, f1, f2)(10_000); writefln("Milliseconds %s %s %s", r[0].msecs, r[1].msecs, r[2].msecs); } |
April 01, 2014 Re: Table lookups - this is pretty definitive | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | >Milliseconds 22 14 10
Yeah, table lookup is the way to go for ASCII, it seems.
|
April 01, 2014 Re: Table lookups - this is pretty definitive | ||||
---|---|---|---|---|
| ||||
Posted in reply to w0rp | On Tuesday, 1 April 2014 at 18:41:03 UTC, w0rp wrote: >>Milliseconds 22 14 10 > > Yeah, table lookup is the way to go for ASCII, it seems. > Milliseconds 13 8 2 After I remember to use -inline -release -O |
April 01, 2014 Re: Table lookups - this is pretty definitive | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | 01.04.2014 22:35, Walter Bright пишет: > Try this benchmark comparing various classification schemes: > --------------------------------- > import core.stdc.stdlib; > import core.stdc.string; > > import std.algorithm; > import std.array; > import std.ascii; > import std.datetime; > import std.range; > import std.stdio; > import std.traits; > > bool isIdentifierChar0(ubyte c) > { > return isAlphaNum(c) || c == '_' || c == '$'; > } > > bool isIdentifierChar1(ubyte c) > { > return ((c >= '0' || c == '$') && > (c <= '9' || c >= 'A') && > (c <= 'Z' || c >= 'a' || c == '_') && > (c <= 'z')); > } > > immutable bool[256] tab2; > static this() > { > for (size_t u = 0; u < 0x100; ++u) > { > tab2[u] = isIdentifierChar0(cast(ubyte)u); > } > } > > bool isIdentifierChar2(ubyte c) > { > return tab2[c]; > } > > /*********************************************/ > > int f0() > { > int x; > for (uint u = 0; u < 0x100; ++u) > { > x += isIdentifierChar0(cast(ubyte)u); > } > return x; > } > > int f1() > { > int x; > for (uint u = 0; u < 0x100; ++u) > { > x += isIdentifierChar1(cast(ubyte)u); > } > return x; > } > > int f2() > { > int x; > for (uint u = 0; u < 0x100; ++u) > { > x += isIdentifierChar2(cast(ubyte)u); > } > return x; > } > > void main() > { > auto r = benchmark!(f0, f1, f2)(10_000); > writefln("Milliseconds %s %s %s", r[0].msecs, r[1].msecs, r[2].msecs); > } Some regular benchmark notes: 1. The first one passed to `benchmark` is always slower (e.g. pass `f2` and see). 2. Unexpected program behaviour changes: Let's use `benchmark!(f1, f1, f1)(1_000_000)`: Milliseconds 928 889 888 Then copy `isAlphaNum` in file (benchmark still shows same results) and change `dchar c` to `ubyte c`, result changes: Milliseconds 849 815 827 The difference is sufficient but `isAlphaNum` not called in benchmark. Also `f0` is faster than `f1`, benchmark!(f0, f0, f0)(1_000_000): Milliseconds 731 694 702 And `f2` wins, it's the only obvious thing, benchmark!(f2, f2, f2)(1_000_000): Milliseconds 227 184 184 Compiler used: dmd -O -inline -release -- Денис В. Шеломовский Denis V. Shelomovskij |
April 01, 2014 Re: Table lookups - this is pretty definitive | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | On Tuesday, 1 April 2014 at 18:35:50 UTC, Walter Bright wrote:
> Try this benchmark comparing various classification schemes:
> ---------------------------------
> import core.stdc.stdlib;
> import core.stdc.string;
>
> import std.algorithm;
> import std.array;
> import std.ascii;
> import std.datetime;
> import std.range;
> import std.stdio;
> import std.traits;
>
> bool isIdentifierChar0(ubyte c)
> {
> return isAlphaNum(c) || c == '_' || c == '$';
> }
>
> bool isIdentifierChar1(ubyte c)
> {
> return ((c >= '0' || c == '$') &&
> (c <= '9' || c >= 'A') &&
> (c <= 'Z' || c >= 'a' || c == '_') &&
> (c <= 'z'));
> }
>
> immutable bool[256] tab2;
> static this()
> {
> for (size_t u = 0; u < 0x100; ++u)
> {
> tab2[u] = isIdentifierChar0(cast(ubyte)u);
> }
> }
>
> bool isIdentifierChar2(ubyte c)
> {
> return tab2[c];
> }
>
> /*********************************************/
>
> int f0()
> {
> int x;
> for (uint u = 0; u < 0x100; ++u)
> {
> x += isIdentifierChar0(cast(ubyte)u);
> }
> return x;
> }
>
> int f1()
> {
> int x;
> for (uint u = 0; u < 0x100; ++u)
> {
> x += isIdentifierChar1(cast(ubyte)u);
> }
> return x;
> }
>
> int f2()
> {
> int x;
> for (uint u = 0; u < 0x100; ++u)
> {
> x += isIdentifierChar2(cast(ubyte)u);
> }
> return x;
> }
>
> void main()
> {
> auto r = benchmark!(f0, f1, f2)(10_000);
> writefln("Milliseconds %s %s %s", r[0].msecs, r[1].msecs, r[2].msecs);
> }
It's not really possible to tell anything from that benchmark, especially with fancy modern optimisers and branch predictors.
1) ldc and gdc both optimise away some/all of the tests respectively.
2) once isIdentifierCharX is inlined, the compiler can determine what the results will be before-hand.
3) Even without 1) and 2), the results are unrealistically (very!) correlated, leading to a branch-prediction bonanza. I'm sure you know how significant that can be on modern CPUs. It is also very friendly to pre-fetching of the lookup table*
Perhaps have the same benchmark, but working on realistic data from a file.
*In my experience, the cost of using lookup tables often only appears in real code, where cache pressure and predictability becomes worse. This is especially true of lazy, range-based code. If several layers of a range use different lookup tables you can quickly end up ruining cache performance compared to an eager approach.
|
April 01, 2014 Re: Table lookups - this is pretty definitive | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | On Tuesday, 1 April 2014 at 18:35:50 UTC, Walter Bright wrote: > immutable bool[256] tab2; > static this() > { > for (size_t u = 0; u < 0x100; ++u) > { > tab2[u] = isIdentifierChar0(cast(ubyte)u); > } > } Table lookups are awesome. While tab2 there lends itself to being populated in a static this() it would really lower the bar if we could define static immutable AA literals at compile-time. Or at least CTFE-fill them. Having to rely on static this() for such is unfortunate and hacky. Relevant is http://forum.dlang.org/thread/oxnwtojtdymopgvanbwz@forum.dlang.org in which I get a speed increase of >3.5x by doing string-to-enum translation/mapping using someEnum[string] AA lookups, instead of using someString.to!someEnum. (Given CtAAs, std.conv.to could then generate such for various translations where cheap -- assuming we still like AAs?) |
April 01, 2014 Re: Table lookups - this is pretty definitive | ||||
---|---|---|---|---|
| ||||
Posted in reply to Denis Shelomovskij | 01.04.2014 23:22, Denis Shelomovskij пишет: > 01.04.2014 22:35, Walter Bright пишет: >> Try this benchmark comparing various classification schemes: >> --------------------------------- >> import core.stdc.stdlib; >> import core.stdc.string; >> >> import std.algorithm; >> import std.array; >> import std.ascii; >> import std.datetime; >> import std.range; >> import std.stdio; >> import std.traits; >> >> bool isIdentifierChar0(ubyte c) >> { >> return isAlphaNum(c) || c == '_' || c == '$'; >> } >> >> bool isIdentifierChar1(ubyte c) >> { >> return ((c >= '0' || c == '$') && >> (c <= '9' || c >= 'A') && >> (c <= 'Z' || c >= 'a' || c == '_') && >> (c <= 'z')); >> } >> >> immutable bool[256] tab2; >> static this() >> { >> for (size_t u = 0; u < 0x100; ++u) >> { >> tab2[u] = isIdentifierChar0(cast(ubyte)u); >> } >> } >> >> bool isIdentifierChar2(ubyte c) >> { >> return tab2[c]; >> } >> >> /*********************************************/ >> >> int f0() >> { >> int x; >> for (uint u = 0; u < 0x100; ++u) >> { >> x += isIdentifierChar0(cast(ubyte)u); >> } >> return x; >> } >> >> int f1() >> { >> int x; >> for (uint u = 0; u < 0x100; ++u) >> { >> x += isIdentifierChar1(cast(ubyte)u); >> } >> return x; >> } >> >> int f2() >> { >> int x; >> for (uint u = 0; u < 0x100; ++u) >> { >> x += isIdentifierChar2(cast(ubyte)u); >> } >> return x; >> } >> >> void main() >> { >> auto r = benchmark!(f0, f1, f2)(10_000); >> writefln("Milliseconds %s %s %s", r[0].msecs, r[1].msecs, >> r[2].msecs); >> } > > Some regular benchmark notes: > > 1. The first one passed to `benchmark` is always slower (e.g. pass `f2` > and see). > > 2. Unexpected program behaviour changes: > > Let's use `benchmark!(f1, f1, f1)(1_000_000)`: > Milliseconds 928 889 888 > > Then copy `isAlphaNum` in file (benchmark still shows same results) and > change `dchar c` to `ubyte c`, result changes: > Milliseconds 849 815 827 > The difference is sufficient but `isAlphaNum` not called in benchmark. > > Also `f0` is faster than `f1`, benchmark!(f0, f0, f0)(1_000_000): > Milliseconds 731 694 702 > > And `f2` wins, it's the only obvious thing, benchmark!(f2, f2, > f2)(1_000_000): > Milliseconds 227 184 184 > > > Compiler used: dmd -O -inline -release > Few more words about `dmd`: "Hey, silly, you still use `== x` to compare with `x`? Use table lookups! And never cast `size_t` to `ubyte`! See: --- import std.datetime; import std.stdio; immutable bool[256] tab2; static this() { foreach(size_t u; 0 .. 0x100) tab2[u] = u == '_'; } /*********************************************/ int f0() { int x; foreach(uint u; 0 .. 0x100) x += u == '_'; return x; } int f2() { int x; foreach(uint u; 0 .. 0x100) x += tab2[cast(ubyte)u]; return x; } int f2plus() { int x; foreach(size_t u; 0 .. 0x100) x += tab2[u]; return x; } void main() { auto r = benchmark!(f0, f0, f0)(1_000_000); writefln("Milliseconds %s %s %s (f0)", r[0].msecs, r[1].msecs, r[2].msecs); r = benchmark!(f2, f2, f2)(1_000_000); writefln("Milliseconds %s %s %s (f2)", r[0].msecs, r[1].msecs, r[2].msecs); r = benchmark!(f2plus, f2plus, f2plus)(1_000_000); writefln("Milliseconds %s %s %s (f2plus)", r[0].msecs, r[1].msecs, r[2].msecs); } --- Milliseconds 323 274 281 (f0) Milliseconds 185 185 185 (f2) Milliseconds 105 97 105 (f2plus) --- P.S. I don't want to say anything with these results. I just have no idea what happens here and I don't want to investigate dmd's way of optimizing things now. -- Денис В. Шеломовский Denis V. Shelomovskij |
April 01, 2014 Re: Table lookups - this is pretty definitive | ||||
---|---|---|---|---|
| ||||
Posted in reply to Denis Shelomovskij | On 4/1/2014 12:22 PM, Denis Shelomovskij wrote:
> Compiler used: dmd -O -inline -release
For me, rewriting main() as:
void main()
{
{
auto r = benchmark!(f0, f1, f2)(100_000);
writefln("Milliseconds %s %s %s", r[0].msecs, r[1].msecs, r[2].msecs);
}
{
auto r = benchmark!(f0, f1, f2)(100_000);
writefln("Milliseconds %s %s %s", r[0].msecs, r[1].msecs, r[2].msecs);
}
}
Gives:
Milliseconds 139 99 15
Milliseconds 122 93 13
|
April 01, 2014 Re: Table lookups - this is pretty definitive | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | On Tuesday, 1 April 2014 at 18:35:50 UTC, Walter Bright wrote:
> Try this benchmark comparing various classification schemes:
> ---------------------------------
Original program... 3 4 1
10x as many iterations... 36 47 19
I think this benchmark is flawed...
1) Apparently there are rounding issues... assuming linear scaling, 1.9 ms yields ~1ms? Probably better to use usecs...
2) Table lookups will nearly always win in isolated 1 function tests... but not necessarily in real apps.
|
April 01, 2014 Re: Table lookups - this is pretty definitive | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | 01-Apr-2014 22:35, Walter Bright пишет: > Try this benchmark comparing various classification schemes: > int f0() > { > int x; > for (uint u = 0; u < 0x100; ++u) > { > x += isIdentifierChar0(cast(ubyte)u); > } > return x; > } > > int f1() > { > int x; > for (uint u = 0; u < 0x100; ++u) > { > x += isIdentifierChar1(cast(ubyte)u); > } > return x; > } > > int f2() > { > int x; > for (uint u = 0; u < 0x100; ++u) > { > x += isIdentifierChar2(cast(ubyte)u); > } > return x; > } Would strongly suggest to use 2 kinds of data - randomized and some text. And, of course, it must be loaded at run-time. -- Dmitry Olshansky |
Copyright © 1999-2021 by the D Language Foundation