Thread overview
MurmurHash3 behaviour
Aug 19, 2016
Cauterite
Aug 19, 2016
Seb
Aug 19, 2016
Cauterite
Aug 20, 2016
Seb
Aug 20, 2016
Ilya Yaroshenko
Aug 20, 2016
Ilya Yaroshenko
August 19, 2016
Regarding the MurmurHash3 implementation in core.internal.hash, it is my understanding that:
    // assuming a and b are uints
    bytesHash([a, b], 0) == bytesHash([b], bytesHash([a], 0))
Is this correct?
I'm just not quite certain of this property when I try to read the code myself, and I don't know much about hash algorithms.
August 19, 2016
On Friday, 19 August 2016 at 20:28:13 UTC, Cauterite wrote:
> Regarding the MurmurHash3 implementation in core.internal.hash, it is my understanding that:
>     // assuming a and b are uints
>     bytesHash([a, b], 0) == bytesHash([b], bytesHash([a], 0))
> Is this correct?
> I'm just not quite certain of this property when I try to read the code myself, and I don't know much about hash algorithms.

Phobos nightly contains MurmurHash3 in std.digest (I don't know why there are two implementations though), but I think this one will be easier to use ;-)

http://dlang.org/phobos-prerelease/std_digest_murmurhash.html

The API for bytesHash is

size_t bytesHash(const(void)* buf, size_t len, size_t seed)

so shouldn't be?

auto arr = [a,b];
bytesHash(&arr, arr.length, 42);


> I'm just not quite certain of this property when I try to read

AFAIK the output of a hash isn't generally equal to the its upcoming seed as pre & postpreprocessing may apply. In the case of Murmurhash3 its post-processing (h1 ^= len):

void main() {
    import std.stdio;
    import core.internal.hash;
    uint a = 4, b = 23;
    auto arr = [a, b];

    auto b1 = bytesHash(&arr, arr.length, 42);

    auto l = arr[0..1], r = arr[1..$];
    auto b2a = bytesHash(&l, 1, 42);
    auto b2b= bytesHash(&r, 1, b2a);
    assert(b1 == b2b);
}

However with std.digest you can "save" the current state of a hash function and continue to give new items to the hasher:

void main()
{

    import std.digest.murmurhash; // required Phobos nightly
    uint a = 4, b = 23;
    auto arr = [a, b];
    auto l = arr[0..1], r = arr[1..$];

    MurmurHash3!32 hasher;
    hasher.put(cast(ubyte[]) l);
    hasher.put(cast(ubyte[]) r);

    auto v1 = hasher.finish();

    hasher = MurmurHash3!32();
    hasher.put(cast(ubyte[]) arr);
    auto v2 = hasher.finish();

    assert(v1 == v2);
}


August 19, 2016
On Friday, 19 August 2016 at 21:03:22 UTC, Seb wrote:
> http://dlang.org/phobos-prerelease/std_digest_murmurhash.html

Ah great, I just finished writing my own murmurhash digest module ( https://github.com/Cauterite/phobos/blob/murmur/std/digest/murmur.d ), and now I discover someone's already done it.
Glad I wasted the last 3 hours of my life ;_;
August 20, 2016
On Friday, 19 August 2016 at 21:29:43 UTC, Cauterite wrote:
> On Friday, 19 August 2016 at 21:03:22 UTC, Seb wrote:
>> http://dlang.org/phobos-prerelease/std_digest_murmurhash.html
>
> Ah great, I just finished writing my own murmurhash digest module ( https://github.com/Cauterite/phobos/blob/murmur/std/digest/murmur.d ), and now I discover someone's already done it.
> Glad I wasted the last 3 hours of my life ;_;

Well, we all learn for life?
For example you could start to keep your fork up to date with upstream. It was merged two months ago ;-)
August 20, 2016
On Friday, 19 August 2016 at 20:28:13 UTC, Cauterite wrote:
> Regarding the MurmurHash3 implementation in core.internal.hash, it is my understanding that:
>     // assuming a and b are uints
>     bytesHash([a, b], 0) == bytesHash([b], bytesHash([a], 0))
> Is this correct?
> I'm just not quite certain of this property when I try to read the code myself, and I don't know much about hash algorithms.

DRuntime has Murmurhash2, not 3.
August 20, 2016
On Saturday, 20 August 2016 at 09:15:00 UTC, Ilya Yaroshenko wrote:
> On Friday, 19 August 2016 at 20:28:13 UTC, Cauterite wrote:
>> Regarding the MurmurHash3 implementation in core.internal.hash, it is my understanding that:
>>     // assuming a and b are uints
>>     bytesHash([a, b], 0) == bytesHash([b], bytesHash([a], 0))
>> Is this correct?
>> I'm just not quite certain of this property when I try to read the code myself, and I don't know much about hash algorithms.
>
> DRuntime has Murmurhash2, not 3.

Oh, maybe I am wrong. Anyway 128bit Murmurhash3 hash from std.digest would much faster on 64 bit CPUs.