Jump to page: 1 27  
Page
Thread overview
Table lookups - this is pretty definitive
Apr 01, 2014
Walter Bright
Apr 01, 2014
w0rp
Apr 01, 2014
w0rp
Apr 01, 2014
Denis Shelomovskij
Apr 01, 2014
Denis Shelomovskij
Apr 01, 2014
Walter Bright
Apr 01, 2014
Walter Bright
Apr 01, 2014
John Colvin
Apr 01, 2014
Walter Bright
Apr 01, 2014
Peter Alexander
Apr 01, 2014
Walter Bright
Apr 01, 2014
JR
Apr 01, 2014
Artur Skawina
Apr 01, 2014
bearophile
Apr 01, 2014
Andrej Mitrovic
Apr 01, 2014
Walter Bright
Apr 01, 2014
bearophile
Apr 02, 2014
Brad Anderson
Apr 02, 2014
bearophile
Apr 03, 2014
Brad Anderson
Apr 01, 2014
Walter Bright
Apr 01, 2014
bearophile
Apr 01, 2014
Tove
Apr 01, 2014
Dmitry Olshansky
Apr 01, 2014
Walter Bright
Apr 02, 2014
Dmitry Olshansky
Apr 01, 2014
Ivan Kazmenko
Apr 01, 2014
Ivan Kazmenko
Apr 01, 2014
Vladimir Panteleev
Apr 01, 2014
Walter Bright
Apr 02, 2014
Marco Leise
Apr 02, 2014
Daniel N
Apr 02, 2014
Dmitry Olshansky
Apr 02, 2014
bearophile
Apr 02, 2014
Dmitry Olshansky
Apr 02, 2014
Walter Bright
Apr 02, 2014
Dmitry Olshansky
Apr 02, 2014
Walter Bright
Apr 02, 2014
bearophile
Apr 03, 2014
Walter Bright
Apr 03, 2014
bearophile
Apr 03, 2014
Walter Bright
Apr 03, 2014
bearophile
Apr 03, 2014
Artur Skawina
Apr 03, 2014
Dmitry Olshansky
Apr 06, 2014
Marco Leise
Apr 03, 2014
Dmitry Olshansky
Apr 02, 2014
monarch_dodra
Apr 02, 2014
Daniel N
Apr 02, 2014
Dmitry Olshansky
Apr 02, 2014
Walter Bright
Apr 02, 2014
Dmitry Olshansky
Apr 03, 2014
monarch_dodra
Apr 03, 2014
monarch_dodra
Apr 03, 2014
Dmitry Olshansky
Apr 02, 2014
Walter Bright
Apr 07, 2014
Iain Buclaw
Apr 07, 2014
Walter Bright
Apr 07, 2014
Martin Nowak
Apr 07, 2014
monarch_dodra
Apr 07, 2014
Walter Bright
Apr 17, 2014
Alix Pexton
Apr 17, 2014
ixid
Apr 17, 2014
monarch_dodra
Apr 18, 2014
Alix Pexton
Apr 18, 2014
Alix Pexton
Apr 18, 2014
Alix Pexton
April 01, 2014
Try this benchmark comparing various classification schemes:
---------------------------------
import core.stdc.stdlib;
import core.stdc.string;

import std.algorithm;
import std.array;
import std.ascii;
import std.datetime;
import std.range;
import std.stdio;
import std.traits;

bool isIdentifierChar0(ubyte c)
{
    return isAlphaNum(c) || c == '_' || c == '$';
}

bool isIdentifierChar1(ubyte c)
{
    return ((c >= '0' || c == '$') &&
            (c <= '9' || c >= 'A')  &&
            (c <= 'Z' || c >= 'a' || c == '_') &&
            (c <= 'z'));
}

immutable bool[256] tab2;
static this()
{
    for (size_t u = 0; u < 0x100; ++u)
    {
        tab2[u] = isIdentifierChar0(cast(ubyte)u);
    }
}

bool isIdentifierChar2(ubyte c)
{
    return tab2[c];
}

/*********************************************/

int f0()
{
    int x;
    for (uint u = 0; u < 0x100; ++u)
    {
        x += isIdentifierChar0(cast(ubyte)u);
    }
    return x;
}

int f1()
{
    int x;
    for (uint u = 0; u < 0x100; ++u)
    {
        x += isIdentifierChar1(cast(ubyte)u);
    }
    return x;
}

int f2()
{
    int x;
    for (uint u = 0; u < 0x100; ++u)
    {
        x += isIdentifierChar2(cast(ubyte)u);
    }
    return x;
}

void main()
{
    auto r = benchmark!(f0, f1, f2)(10_000);
    writefln("Milliseconds %s %s %s", r[0].msecs, r[1].msecs, r[2].msecs);
}
April 01, 2014
>Milliseconds 22 14 10

Yeah, table lookup is the way to go for ASCII, it seems.
April 01, 2014
On Tuesday, 1 April 2014 at 18:41:03 UTC, w0rp wrote:
>>Milliseconds 22 14 10
>
> Yeah, table lookup is the way to go for ASCII, it seems.

> Milliseconds 13 8 2

After I remember to use -inline -release -O
April 01, 2014
01.04.2014 22:35, Walter Bright пишет:
> Try this benchmark comparing various classification schemes:
> ---------------------------------
> import core.stdc.stdlib;
> import core.stdc.string;
>
> import std.algorithm;
> import std.array;
> import std.ascii;
> import std.datetime;
> import std.range;
> import std.stdio;
> import std.traits;
>
> bool isIdentifierChar0(ubyte c)
> {
>      return isAlphaNum(c) || c == '_' || c == '$';
> }
>
> bool isIdentifierChar1(ubyte c)
> {
>      return ((c >= '0' || c == '$') &&
>              (c <= '9' || c >= 'A')  &&
>              (c <= 'Z' || c >= 'a' || c == '_') &&
>              (c <= 'z'));
> }
>
> immutable bool[256] tab2;
> static this()
> {
>      for (size_t u = 0; u < 0x100; ++u)
>      {
>          tab2[u] = isIdentifierChar0(cast(ubyte)u);
>      }
> }
>
> bool isIdentifierChar2(ubyte c)
> {
>      return tab2[c];
> }
>
> /*********************************************/
>
> int f0()
> {
>      int x;
>      for (uint u = 0; u < 0x100; ++u)
>      {
>          x += isIdentifierChar0(cast(ubyte)u);
>      }
>      return x;
> }
>
> int f1()
> {
>      int x;
>      for (uint u = 0; u < 0x100; ++u)
>      {
>          x += isIdentifierChar1(cast(ubyte)u);
>      }
>      return x;
> }
>
> int f2()
> {
>      int x;
>      for (uint u = 0; u < 0x100; ++u)
>      {
>          x += isIdentifierChar2(cast(ubyte)u);
>      }
>      return x;
> }
>
> void main()
> {
>      auto r = benchmark!(f0, f1, f2)(10_000);
>      writefln("Milliseconds %s %s %s", r[0].msecs, r[1].msecs, r[2].msecs);
> }

Some regular benchmark notes:

1. The first one passed to `benchmark` is always slower (e.g. pass `f2` and see).

2. Unexpected program behaviour changes:

Let's use `benchmark!(f1, f1, f1)(1_000_000)`:
Milliseconds 928 889 888

Then copy `isAlphaNum` in file (benchmark still shows same results) and change `dchar c` to `ubyte c`, result changes:
Milliseconds 849 815 827
The difference is sufficient but `isAlphaNum` not called in benchmark.

Also `f0` is faster than `f1`, benchmark!(f0, f0, f0)(1_000_000):
Milliseconds 731 694 702

And `f2` wins, it's the only obvious thing, benchmark!(f2, f2, f2)(1_000_000):
Milliseconds 227 184 184


Compiler used: dmd -O -inline -release

-- 
Денис В. Шеломовский
Denis V. Shelomovskij
April 01, 2014
On Tuesday, 1 April 2014 at 18:35:50 UTC, Walter Bright wrote:
> Try this benchmark comparing various classification schemes:
> ---------------------------------
> import core.stdc.stdlib;
> import core.stdc.string;
>
> import std.algorithm;
> import std.array;
> import std.ascii;
> import std.datetime;
> import std.range;
> import std.stdio;
> import std.traits;
>
> bool isIdentifierChar0(ubyte c)
> {
>     return isAlphaNum(c) || c == '_' || c == '$';
> }
>
> bool isIdentifierChar1(ubyte c)
> {
>     return ((c >= '0' || c == '$') &&
>             (c <= '9' || c >= 'A')  &&
>             (c <= 'Z' || c >= 'a' || c == '_') &&
>             (c <= 'z'));
> }
>
> immutable bool[256] tab2;
> static this()
> {
>     for (size_t u = 0; u < 0x100; ++u)
>     {
>         tab2[u] = isIdentifierChar0(cast(ubyte)u);
>     }
> }
>
> bool isIdentifierChar2(ubyte c)
> {
>     return tab2[c];
> }
>
> /*********************************************/
>
> int f0()
> {
>     int x;
>     for (uint u = 0; u < 0x100; ++u)
>     {
>         x += isIdentifierChar0(cast(ubyte)u);
>     }
>     return x;
> }
>
> int f1()
> {
>     int x;
>     for (uint u = 0; u < 0x100; ++u)
>     {
>         x += isIdentifierChar1(cast(ubyte)u);
>     }
>     return x;
> }
>
> int f2()
> {
>     int x;
>     for (uint u = 0; u < 0x100; ++u)
>     {
>         x += isIdentifierChar2(cast(ubyte)u);
>     }
>     return x;
> }
>
> void main()
> {
>     auto r = benchmark!(f0, f1, f2)(10_000);
>     writefln("Milliseconds %s %s %s", r[0].msecs, r[1].msecs, r[2].msecs);
> }

It's not really possible to tell anything from that benchmark, especially with fancy modern optimisers and branch predictors.

1) ldc and gdc both optimise away some/all of the tests respectively.
2) once isIdentifierCharX is inlined, the compiler can determine what the results will be before-hand.
3) Even without 1) and 2), the results are unrealistically (very!) correlated, leading to a branch-prediction bonanza. I'm sure you know how significant that can be on modern CPUs. It is also very friendly to pre-fetching of the lookup table*

Perhaps have the same benchmark, but working on realistic data from a file.


*In my experience, the cost of using lookup tables often only appears in real code, where cache pressure and predictability becomes worse. This is especially true of lazy, range-based code. If several layers of a range use different lookup tables you can quickly end up ruining cache performance compared to an eager approach.
April 01, 2014
On Tuesday, 1 April 2014 at 18:35:50 UTC, Walter Bright wrote:
> immutable bool[256] tab2;
> static this()
> {
>     for (size_t u = 0; u < 0x100; ++u)
>     {
>         tab2[u] = isIdentifierChar0(cast(ubyte)u);
>     }
> }

Table lookups are awesome.

While tab2 there lends itself to being populated in a static this() it would really lower the bar if we could define static immutable AA literals at compile-time. Or at least CTFE-fill them. Having to rely on static this() for such is unfortunate and hacky.

Relevant is http://forum.dlang.org/thread/oxnwtojtdymopgvanbwz@forum.dlang.org in which I get a speed increase of >3.5x by doing string-to-enum translation/mapping using someEnum[string] AA lookups, instead of using someString.to!someEnum.

(Given CtAAs, std.conv.to could then generate such for various translations where cheap -- assuming we still like AAs?)
April 01, 2014
01.04.2014 23:22, Denis Shelomovskij пишет:
> 01.04.2014 22:35, Walter Bright пишет:
>> Try this benchmark comparing various classification schemes:
>> ---------------------------------
>> import core.stdc.stdlib;
>> import core.stdc.string;
>>
>> import std.algorithm;
>> import std.array;
>> import std.ascii;
>> import std.datetime;
>> import std.range;
>> import std.stdio;
>> import std.traits;
>>
>> bool isIdentifierChar0(ubyte c)
>> {
>>      return isAlphaNum(c) || c == '_' || c == '$';
>> }
>>
>> bool isIdentifierChar1(ubyte c)
>> {
>>      return ((c >= '0' || c == '$') &&
>>              (c <= '9' || c >= 'A')  &&
>>              (c <= 'Z' || c >= 'a' || c == '_') &&
>>              (c <= 'z'));
>> }
>>
>> immutable bool[256] tab2;
>> static this()
>> {
>>      for (size_t u = 0; u < 0x100; ++u)
>>      {
>>          tab2[u] = isIdentifierChar0(cast(ubyte)u);
>>      }
>> }
>>
>> bool isIdentifierChar2(ubyte c)
>> {
>>      return tab2[c];
>> }
>>
>> /*********************************************/
>>
>> int f0()
>> {
>>      int x;
>>      for (uint u = 0; u < 0x100; ++u)
>>      {
>>          x += isIdentifierChar0(cast(ubyte)u);
>>      }
>>      return x;
>> }
>>
>> int f1()
>> {
>>      int x;
>>      for (uint u = 0; u < 0x100; ++u)
>>      {
>>          x += isIdentifierChar1(cast(ubyte)u);
>>      }
>>      return x;
>> }
>>
>> int f2()
>> {
>>      int x;
>>      for (uint u = 0; u < 0x100; ++u)
>>      {
>>          x += isIdentifierChar2(cast(ubyte)u);
>>      }
>>      return x;
>> }
>>
>> void main()
>> {
>>      auto r = benchmark!(f0, f1, f2)(10_000);
>>      writefln("Milliseconds %s %s %s", r[0].msecs, r[1].msecs,
>> r[2].msecs);
>> }
>
> Some regular benchmark notes:
>
> 1. The first one passed to `benchmark` is always slower (e.g. pass `f2`
> and see).
>
> 2. Unexpected program behaviour changes:
>
> Let's use `benchmark!(f1, f1, f1)(1_000_000)`:
> Milliseconds 928 889 888
>
> Then copy `isAlphaNum` in file (benchmark still shows same results) and
> change `dchar c` to `ubyte c`, result changes:
> Milliseconds 849 815 827
> The difference is sufficient but `isAlphaNum` not called in benchmark.
>
> Also `f0` is faster than `f1`, benchmark!(f0, f0, f0)(1_000_000):
> Milliseconds 731 694 702
>
> And `f2` wins, it's the only obvious thing, benchmark!(f2, f2,
> f2)(1_000_000):
> Milliseconds 227 184 184
>
>
> Compiler used: dmd -O -inline -release
>

Few more words about `dmd`:

"Hey, silly, you still use `== x` to compare with `x`? Use table lookups! And never cast `size_t` to `ubyte`! See:
---
import std.datetime;
import std.stdio;


immutable bool[256] tab2;

static this()
{
    foreach(size_t u; 0 .. 0x100)
        tab2[u] = u == '_';
}

/*********************************************/

int f0()
{
    int x;
    foreach(uint u; 0 .. 0x100)
        x += u == '_';
    return x;
}

int f2()
{
    int x;
    foreach(uint u; 0 .. 0x100)
        x += tab2[cast(ubyte)u];
    return x;
}

int f2plus()
{
    int x;
    foreach(size_t u; 0 .. 0x100)
        x += tab2[u];
    return x;
}

void main()
{
    auto r = benchmark!(f0, f0, f0)(1_000_000);
    writefln("Milliseconds %s %s %s (f0)", r[0].msecs, r[1].msecs, r[2].msecs);

    r = benchmark!(f2, f2, f2)(1_000_000);
    writefln("Milliseconds %s %s %s (f2)", r[0].msecs, r[1].msecs, r[2].msecs);

    r = benchmark!(f2plus, f2plus, f2plus)(1_000_000);
    writefln("Milliseconds %s %s %s (f2plus)", r[0].msecs, r[1].msecs, r[2].msecs);
}
---
Milliseconds 323 274 281 (f0)
Milliseconds 185 185 185 (f2)
Milliseconds 105 97 105 (f2plus)
---


P.S.
I don't want to say anything with these results. I just have no idea what happens here and I don't want to investigate dmd's way of optimizing things now.


-- 
Денис В. Шеломовский
Denis V. Shelomovskij
April 01, 2014
On 4/1/2014 12:22 PM, Denis Shelomovskij wrote:
> Compiler used: dmd -O -inline -release

For me, rewriting main() as:

void main()
{
  {
    auto r = benchmark!(f0, f1, f2)(100_000);
    writefln("Milliseconds %s %s %s", r[0].msecs, r[1].msecs, r[2].msecs);
  }
  {
    auto r = benchmark!(f0, f1, f2)(100_000);
    writefln("Milliseconds %s %s %s", r[0].msecs, r[1].msecs, r[2].msecs);
  }
}

Gives:

Milliseconds 139 99 15
Milliseconds 122 93 13


April 01, 2014
On Tuesday, 1 April 2014 at 18:35:50 UTC, Walter Bright wrote:
> Try this benchmark comparing various classification schemes:
> ---------------------------------

Original program... 3 4 1
10x as many iterations... 36 47 19

I think this benchmark is flawed...

1) Apparently there are rounding issues... assuming linear scaling, 1.9 ms yields ~1ms? Probably better to use usecs...

2) Table lookups will nearly always win in isolated 1 function tests... but not necessarily in real apps.
April 01, 2014
01-Apr-2014 22:35, Walter Bright пишет:
> Try this benchmark comparing various classification schemes:

> int f0()
> {
>      int x;
>      for (uint u = 0; u < 0x100; ++u)
>      {
>          x += isIdentifierChar0(cast(ubyte)u);
>      }
>      return x;
> }
>
> int f1()
> {
>      int x;
>      for (uint u = 0; u < 0x100; ++u)
>      {
>          x += isIdentifierChar1(cast(ubyte)u);
>      }
>      return x;
> }
>
> int f2()
> {
>      int x;
>      for (uint u = 0; u < 0x100; ++u)
>      {
>          x += isIdentifierChar2(cast(ubyte)u);
>      }
>      return x;
> }

Would strongly suggest to use 2 kinds of data - randomized and some text. And, of course, it must be loaded at run-time.

-- 
Dmitry Olshansky
« First   ‹ Prev
1 2 3 4 5 6 7