April 07, 2014
On Monday, 7 April 2014 at 11:35:37 UTC, Martin Nowak wrote:
> On 04/01/2014 08:35 PM, Walter Bright wrote:
>> Try this benchmark comparing various classification schemes:
>
> Counter example:
> https://github.com/D-Programming-Language/phobos/commit/8599d6b4acde0eff886876a56f54ecd8e6b528e9

It's all a matter of how complicated the "non-table" implementation is I guess. table-lookup has a "steep entry cost", but is constant, regardless of complexity.

Table lookups were "recently" removed from std.ascii too. The implementation walter is benching "against" is particularly complicated here.
April 07, 2014
On 4/7/2014 2:41 AM, "Ola Fosheim Grøstad" <ola.fosheim.grostad+dlang@gmail.com>" wrote:
>> on.  Unfortunately for your small program, gcc is too smart and
>> optimises everything away.
>
> Haha, that's funny. But why don't you put the data in a separate compilation unit?

True, it's pretty easy to defeat that optimization.
April 07, 2014
On 4/7/2014 7:11 AM, monarch_dodra wrote:
> It's all a matter of how complicated the "non-table" implementation is I guess.

Yup. Memory lookups can be expensive. Once it's in the cache, it's cheap.
April 17, 2014
I added a lookup scheme of my own, its not as fast as Walters (in fact its the slowest without -inline - release -O) but it uses 1 bit per entry in the table instead of a whole byte so you can have lots and lots of different tables. I'm even reasonably sure that it works correctly!


===============================
import core.stdc.stdlib;
import core.stdc.string;

import std.algorithm;
import std.array;
import std.ascii;
import std.datetime;
import std.range;
import std.stdio;
import std.traits;

bool isIdentifierChar0(ubyte c)
{
    return isAlphaNum(c) || c == '_' || c == '$';
}

bool isIdentifierChar1(ubyte c)
{
    return ((c >= '0' || c == '$') &&
            (c <= '9' || c >= 'A')  &&
            (c <= 'Z' || c >= 'a' || c == '_') &&
            (c <= 'z'));
}

immutable bool[256] tab2;
static this()
{
    for (size_t u = 0; u < 0x100; ++u)
    {
        tab2[u] = isIdentifierChar0(cast(ubyte)u);
    }
}

bool isIdentifierChar2(ubyte c)
{
    return tab2[c];
}

immutable ulong[4] tab3;
static this()
{
    for (size_t u = 0; u < 0x100; ++u)
    {
        if (isIdentifierChar0(cast(ubyte)u))
        {
            auto sub = u >>> 6;
            auto b = u & 0x3f;
            auto mask = 0x01L << b;
            tab3[sub] |= mask;
        }
    }
}

bool isIdentifierChar3(ubyte c)
{
	auto sub = c >>> 6;
	c &= 0x3f;
	auto mask = 0x01L << c;
	return (tab3[sub] & mask) > 0;
}

int f0()
{
    int x;
    for (uint u = 0; u < 0x100; ++u)
    {
        x += isIdentifierChar0(cast(ubyte)u);
    }
    return x;
}

int f1()
{
    int x;
    for (uint u = 0; u < 0x100; ++u)
    {
        x += isIdentifierChar1(cast(ubyte)u);
    }
    return x;
}

int f2()
{
    int x;
    for (uint u = 0; u < 0x100; ++u)
    {
        x += isIdentifierChar2(cast(ubyte)u);
    }
    return x;
}

int f3()
{
    int x;
    for (uint u = 0; u < 0x100; ++u)
    {
        x += isIdentifierChar3(cast(ubyte)u);
    }
    return x;
}

void main()
{
    auto r = benchmark!(f0, f1, f2, f3)(10_000);
    writefln("Milliseconds %s %s %s %s", r[0].msecs, r[1].msecs, r[2].msecs, r[3].msecs);
}
April 17, 2014
On Thursday, 17 April 2014 at 16:27:26 UTC, Alix Pexton wrote:
> I added a lookup scheme of my own, its not as fast as Walters (in fact its the slowest without -inline - release -O) but it uses 1 bit per entry in the table instead of a whole byte so you can have lots and lots of different tables. I'm even reasonably sure that it works correctly!
>
>
> ===============================
> import core.stdc.stdlib;
> import core.stdc.string;
>
> import std.algorithm;
> import std.array;
> import std.ascii;
> import std.datetime;
> import std.range;
> import std.stdio;
> import std.traits;
>
> bool isIdentifierChar0(ubyte c)
> {
>     return isAlphaNum(c) || c == '_' || c == '$';
> }
>
> bool isIdentifierChar1(ubyte c)
> {
>     return ((c >= '0' || c == '$') &&
>             (c <= '9' || c >= 'A')  &&
>             (c <= 'Z' || c >= 'a' || c == '_') &&
>             (c <= 'z'));
> }
>
> immutable bool[256] tab2;
> static this()
> {
>     for (size_t u = 0; u < 0x100; ++u)
>     {
>         tab2[u] = isIdentifierChar0(cast(ubyte)u);
>     }
> }
>
> bool isIdentifierChar2(ubyte c)
> {
>     return tab2[c];
> }
>
> immutable ulong[4] tab3;
> static this()
> {
>     for (size_t u = 0; u < 0x100; ++u)
>     {
>         if (isIdentifierChar0(cast(ubyte)u))
>         {
>             auto sub = u >>> 6;
>             auto b = u & 0x3f;
>             auto mask = 0x01L << b;
>             tab3[sub] |= mask;
>         }
>     }
> }
>
> bool isIdentifierChar3(ubyte c)
> {
> 	auto sub = c >>> 6;
> 	c &= 0x3f;
> 	auto mask = 0x01L << c;
> 	return (tab3[sub] & mask) > 0;
> }
>
> int f0()
> {
>     int x;
>     for (uint u = 0; u < 0x100; ++u)
>     {
>         x += isIdentifierChar0(cast(ubyte)u);
>     }
>     return x;
> }
>
> int f1()
> {
>     int x;
>     for (uint u = 0; u < 0x100; ++u)
>     {
>         x += isIdentifierChar1(cast(ubyte)u);
>     }
>     return x;
> }
>
> int f2()
> {
>     int x;
>     for (uint u = 0; u < 0x100; ++u)
>     {
>         x += isIdentifierChar2(cast(ubyte)u);
>     }
>     return x;
> }
>
> int f3()
> {
>     int x;
>     for (uint u = 0; u < 0x100; ++u)
>     {
>         x += isIdentifierChar3(cast(ubyte)u);
>     }
>     return x;
> }
>
> void main()
> {
>     auto r = benchmark!(f0, f1, f2, f3)(10_000);
>     writefln("Milliseconds %s %s %s %s", r[0].msecs, r[1].msecs, r[2].msecs, r[3].msecs);
> }

I feel like there must be a way of making a fast bit look up but my version is only moderate in speed. You can get all the bits you need on two 64 bit registers or one SSE register. I haven't tried bt, does that work with a 64 bit register?
April 17, 2014
On Thursday, 17 April 2014 at 18:07:24 UTC, ixid wrote:
> I feel like there must be a way of making a fast bit look up but my version is only moderate in speed. You can get all the bits you need on two 64 bit registers or one SSE register. I haven't tried bt, does that work with a 64 bit register?

http://dlang.org/phobos/core_bitop.html#.bt

?

Note it can be applied to the table in general, rather than the byte themselves. EG:

ubyte[256] buf;
auto b = bt(buf.ptr, 428);
April 18, 2014
On 17/04/2014 8:41 PM, monarch_dodra wrote:
> On Thursday, 17 April 2014 at 18:07:24 UTC, ixid wrote:
>> I feel like there must be a way of making a fast bit look up but my
>> version is only moderate in speed. You can get all the bits you need
>> on two 64 bit registers or one SSE register. I haven't tried bt, does
>> that work with a 64 bit register?
>
> http://dlang.org/phobos/core_bitop.html#.bt
>
> ?
>
> Note it can be applied to the table in general, rather than the byte
> themselves. EG:
>
> ubyte[256] buf;
> auto b = bt(buf.ptr, 428);

I didn't think to look in core.bitop for a faster way to check bits, I'll check if it closes the gap to Walter's version.

A...
April 18, 2014
I tested this against the others...
(with "-inline -release -O" of course)

===

uint[8] tab4; // bitop function only work on uints
static this()
{
    for (size_t u = 0; u < 0x100; ++u)
    {
        if (isIdentifierChar0(cast(ubyte)u))
        {
            bts(tab4.ptr, u);
        }
    }
}

bool isIdentifierChar4(ubyte c)
{
	return bt(tab4.ptr, c) != 0;
}

===

it takes about 3 times as long (about the same as the original isIdentifierChar1, that used lots of &&s and ||s).

So 2 shifts, 2 ands and a compare to zero, beats core.bitop.

for clarity, the output of my benchmark for all 5 version was

Milliseconds 21 15 2 5 15

the 2 (which on some runs was a 3) is Walter's table of bools, the 5 is my table of ulongs and the 15 on the end is core.bitop.

Short of delving into the asm, which I'm trying to avoid, any other suggestions for speed-ups?

A...
April 18, 2014
PS

I discovered that core.bitop.bts() won't accept a pointer to immutable even when inside a static this. Not sure if that counts as a bug or not ><
1 2 3 4 5 6 7
Next ›   Last »