Thread overview
Array toHash()
Nov 27, 2014
David Held
Nov 27, 2014
Ali Çehreli
Nov 29, 2014
David Held
Nov 29, 2014
Ali Çehreli
Nov 30, 2014
David Held
November 27, 2014
I have a class which contains an int[] and some other stuff.  I want to use my class as the key for an AA, so I am overriding toHash().  But the int[] is the only part which should produce the hash code.  I know that int[].toHash() is defined somehow, because I can put int[] directly into an AA without writing any hash functions.  But I don't know how to spell this explicitly or force the compiler to generate it for me so that I can forward to it in my toHash().  For illustration:

class Foo
{
    override
    size_t toHash() @trusted pure const nothrow
    {
        // error: no property 'toHash' for type 'int[]'
        return importantStuff.toHash();
    }

    // override opEquals() too...

    int[] importantStuff;
    bool  notImportant;
    int   ignoreMe;
}

Any way to avoid re-implementing this hash function?

Dave
November 27, 2014
On 11/26/2014 04:25 PM, David Held wrote:

> class Foo
> {
>      override
>      size_t toHash() @trusted pure const nothrow
>      {
>          // error: no property 'toHash' for type 'int[]'
>          return importantStuff.toHash();
>      }

The getHash() member function of the particular TypeInfo can be used. However, it is not currently pure, so you must comment that out from your toHash:

    override
    size_t toHash() @trusted /* pure */ const nothrow
    {
        return typeid(importantStuff).getHash(&importantStuff);

    }

If a function can safely be casted to pure, you can use the following yet-missing-in-phobos function template:

import std.traits;

auto assumePure(T)(T t)
    if (isFunctionPointer!T || isDelegate!T)
{
    enum attrs = functionAttributes!T | FunctionAttribute.pure_;
    return cast(SetFunctionAttributes!(T, functionLinkage!T, attrs)) t;
}

// ...

    override
    size_t toHash() @trusted pure const nothrow
    {
        auto func = assumePure(&(typeid(importantStuff).getHash));
        return func(&importantStuff);
    }

Note that now your toHash can be pure.

Ali

November 29, 2014
On 11/26/2014 4:40 PM, Ali Çehreli wrote:
> [...]
>      override
>      size_t toHash() @trusted pure const nothrow
>      {
>          auto func = assumePure(&(typeid(importantStuff).getHash));
>          return func(&importantStuff);
>      }

Very helpful, thanks!  Am I right in assuming that there is some runtime cost here, as we are calling through a function pointer?  Is the lack of inlining the only cost, or is getting at the typeinfo also costly?

Dave


November 29, 2014
On 11/29/2014 12:45 PM, David Held wrote:

> On 11/26/2014 4:40 PM, Ali Çehreli wrote:
>> [...]
>>      override
>>      size_t toHash() @trusted pure const nothrow
>>      {
>>          auto func = assumePure(&(typeid(importantStuff).getHash));
>>          return func(&importantStuff);
>>      }
>

> Am I right in assuming that there is some runtime
> cost here, as we are calling through a function pointer?

I think so. However, I think the compiler can eliminate dereferencing the function pointer if it can prove that it is not necessary.

> Is the lack of inlining the only cost, or is getting at the
> typeinfo also costly?

typeid() is a runtime function. I think it will be costly every time toHash is called. The function pointer can be initialized once.

    // I've "deduced" the type from an error message. ;)
    static const ulong delegate(const(void*)) const pure nothrow @trusted func;

    // This will be executed once per thread:
    static this() {
        func = assumePure(&(typeid(Foo.init.importantStuff).getHash));
    }

    override
    size_t toHash() @trusted pure const nothrow
    {
        return func(&importantStuff);
    }

Now there is no TypeInfo lookup after thread initialization time.

Ali

November 30, 2014
On 11/29/2014 3:59 PM, Ali Çehreli wrote:
> [...]
> typeid() is a runtime function. I think it will be costly every time
> toHash is called. The function pointer can be initialized once.
>
>      // I've "deduced" the type from an error message. ;)
>      static const ulong delegate(const(void*)) const pure nothrow
> @trusted func;
>
>      // This will be executed once per thread:
>      static this() {
>          func = assumePure(&(typeid(Foo.init.importantStuff).getHash));
>      }
>
>      override
>      size_t toHash() @trusted pure const nothrow
>      {
>          return func(&importantStuff);
>      }
>
> Now there is no TypeInfo lookup after thread initialization time.

Nice!  This is getting very fancy, but also a bit unfortunate.  It would be appropriate if the type of importantStuff were arbitrary, but it's not.  It's int[].  In fact, the fact that I can get a compiler-generated hash function for any type (I presume) surprises me.  Not entirely, since the compiler can just treat types as binary blobs, and define a hash function that operates as if it were byte[].

Although it is nice that we can get at stuff through reflection, this is not quite the same as the ostensibly static [and inlinable] call to int[].toHash() which logically occurs when int[] is used directly as a key.

Also, would it be possible to avoid spelling the delegate type by making func a function static and using an auto return on a named method?

Finally, I find it a little surprising that there's no standard module which implements a quality hash function builder (using something like MurmurHash or CityHash).  This seems like it would be a fairly common requirement for a lot of users.  Are there any non-standard libraries which do this?

Dave