Thread overview
built-in string hash ?
Aug 28, 2010
Kevin Bailey
Aug 28, 2010
torhu
Aug 28, 2010
bearophile
Aug 28, 2010
Pelle
Aug 29, 2010
bearophile
Aug 29, 2010
BCS
August 28, 2010
So I have a class containing two strings:

class Foo
{
	string s1, s2;

	...

and I'd like to add a toHash() member, but I can't find
the built-in string hash function. These don't work:

s1.toHash()
s1.toHash
s1.hash
s1.hash()
hash(s1)

yet strings can clearly be the key in a map.

I see that xml.d writes its own string hash function
but that just doesn't seem right.

Is there a way to get to the built-in ?
August 28, 2010
On 28.08.2010 16:37, Kevin Bailey wrote:
> So I have a class containing two strings:
>
> class Foo
> {
> 	string s1, s2;
>
> 	...
>
> and I'd like to add a toHash() member, but I can't find
> the built-in string hash function. These don't work:
>
> s1.toHash()
> s1.toHash
> s1.hash
> s1.hash()
> hash(s1)
>
> yet strings can clearly be the key in a map.
>
> I see that xml.d writes its own string hash function
> but that just doesn't seem right.
>
> Is there a way to get to the built-in ?

You can get to it through TypeInfo:
http://www.digitalmars.com/d/2.0/phobos/object.html#getHash

string a = "abc";

auto hash = typeid(a).getHash(&a);
August 28, 2010
torhu:
> string a = "abc";
> auto hash = typeid(a).getHash(&a);

If higher performance is necessary, you may pre-compute part of that:

void main() {
    string a = "abc";
    auto hash1 = typeid(a).getHash(&a);
    auto stringHash = &(typeid(a).getHash);
    auto hash2 = stringHash(&a);
    assert(hash1 == hash2);
}

Bye,
bearophile
August 28, 2010
On 08/28/2010 10:25 PM, bearophile wrote:
> torhu:
>> string a = "abc";
>> auto hash = typeid(a).getHash(&a);
>
> If higher performance is necessary, you may pre-compute part of that:
>
> void main() {
>      string a = "abc";
>      auto hash1 = typeid(a).getHash(&a);
>      auto stringHash =&(typeid(a).getHash);
>      auto hash2 = stringHash(&a);
>      assert(hash1 == hash2);
> }
>
> Bye,
> bearophile

I doubt that gives any performance gains. typeid(a).getHash should be a constant expression anyway, and I don't see any gains in my tiny benchmark test.

Perhaps it works better if a was an Object, since typeid for objects does more.
August 29, 2010
Pelle:
> I doubt that gives any performance gains. typeid(a).getHash should be a constant expression anyway, and I don't see any gains in my tiny benchmark test.

My code shows a limited time difference:

import std.stdio: writeln;
import std.date: getUTCtime, ticksPerSecond;

void main() {
    enum int NLOOP = 30_000_000;
    string s = "and I'd like to add a toHash() member, but I can't find the built-in string hash function.";

    {
        auto t0 = getUTCtime();
        auto stringHash =&(typeid(s).getHash);
        hash_t tot;
        for (int i; i < NLOOP; i++)
            tot += stringHash(&s);
        auto t1 = getUTCtime();
        writeln(tot, " ", (t1 - t0) / cast(double)ticksPerSecond);
    }

    {
        auto t0 = getUTCtime();
        hash_t tot;
        for (int i; i < NLOOP; i++)
            tot += typeid(s).getHash(&s);
        auto t1 = getUTCtime();
        writeln(tot, " ", (t1 - t0) / cast(double)ticksPerSecond);
    }
}


And the asm shows the first loop to contain one less instruction (dmd 2.048, -O -release -inline), so the difference is small:

L42:    lea ECX,01Ch[ESP]
        mov EAX,02Ch[ESP]
        push    ECX
        call    EDI
        add ESI,EAX
        inc EBX
        cmp EBX,01C9C380h
        jb  L42


LA7:    lea EDX,01Ch[ESP]
        mov ECX,_D12TypeInfo_Aya6__initZ
        mov EAX,offset FLAT:_D12TypeInfo_Aya6__initZ
        push    EDX
        call    dword ptr 018h[ECX]
        add EDI,EAX
        inc EBX
        cmp EBX,01C9C380h
        jb  LA7

Bye,
bearophile
August 29, 2010
Hello bearophile,

> Pelle:
> 
>> I doubt that gives any performance gains. typeid(a).getHash should be
>> a constant expression anyway, and I don't see any gains in my tiny
>> benchmark test.
>> 
> My code shows a limited time difference:
> 
> import std.stdio: writeln;
> import std.date: getUTCtime, ticksPerSecond;
> void main() {
> enum int NLOOP = 30_000_000;
> string s = "and I'd like to add a toHash() member, but I can't
> find the built-in string hash function.";
> {
> auto t0 = getUTCtime();
> auto stringHash =&(typeid(s).getHash);
> hash_t tot;
> for (int i; i < NLOOP; i++)
> tot += stringHash(&s);
> auto t1 = getUTCtime();
> writeln(tot, " ", (t1 - t0) / cast(double)ticksPerSecond);
> }
> {
> auto t0 = getUTCtime();
> hash_t tot;
> for (int i; i < NLOOP; i++)
> tot += typeid(s).getHash(&s);
> auto t1 = getUTCtime();
> writeln(tot, " ", (t1 - t0) / cast(double)ticksPerSecond);
> }
> }
> And the asm shows the first loop to contain one less instruction (dmd
> 2.048, -O -release -inline), so the difference is small:
> 
> L42:    lea ECX,01Ch[ESP]
> mov EAX,02Ch[ESP]
> push    ECX
> call    EDI
> add ESI,EAX
> inc EBX
> cmp EBX,01C9C380h
> jb  L42
> LA7:    lea EDX,01Ch[ESP]
> mov ECX,_D12TypeInfo_Aya6__initZ

DMD seems to do a bad job of dead assignment removal at the ASM level. I've seen this befor where half the asm in large blocks was dead.

> mov EAX,offset FLAT:_D12TypeInfo_Aya6__initZ
> push    EDX
> call    dword ptr 018h[ECX]
> add EDI,EAX
> inc EBX
> cmp EBX,01C9C380h
> jb  LA7
> Bye,
> bearophile
-- 
... <IXOYE><