Struct hash issues with string fields

Re: Struct hash issues with string fields

May 26, 2012

Jonathan M Davis

May 26, 2012

May 26, 2012

May 26, 2012

May 26, 2012

Jun 04, 2012

On Saturday, May 26, 2012 21:53:07 Andrej Mitrovic wrote: > I don't understand this: > > import std.stdio; > > struct Symbol { string val; } > > void main() > { > int[string] hash1; > hash1["1".idup] = 1; > hash1["1".idup] = 2; > writeln(hash1); // writes "["1":2]" > > int[Symbol] hash2; > Symbol sym1 = Symbol("1".idup); > Symbol sym2 = Symbol("1".idup); > hash2[sym1] = 1; > hash2[sym2] = 1; > writeln(hash2); // writes "[Symbol("1"):1, Symbol("1"):1]" > } > > Why are sym1 and sym2 unique keys in hash2? Because the hash implementation checks the array pointer instead of its contents? But then why does hash1 not have the same issue? > > I can't override toHash() in a struct, so what am I supposed to do in order to make "sym1" and "sym2" be stored into the same hash key? Why can't you have a toHash in your struct? Your struct should be able to have a toHash just like it can have a toString. If anything, I find it disturbing that the code compiles _without_ a definition for opHash in the struct. Now, that aside, the results with hash2 definitely look like a bug to me. It's probably just the result of one more of the many issues with the current AA implementation. - Jonathan M Davis

On 5/27/12, Jonathan M Davis <jmdavisProg@gmx.com> wrote: > Why can't you have a toHash in your struct? I mean it doesn't seem to make any difference: import std.stdio; struct Foo { string x; size_t toHash() { return 1; } } void main() { int[Foo] hash; Foo foo1 = Foo("a".idup); Foo foo2 = Foo("a".idup); hash[foo1] = 1; hash[foo2] = 2; writeln(hash); // "[Foo("a"):2, Foo("a"):1]" } > If anything, I find it disturbing > that the code compiles _without_ a definition for opHash in the struct. Wait, is it toHash or opHash? I can't find any documentation of opHash, all I see is a mention of toHash in object.d.

On Saturday, 26 May 2012 at 22:02:10 UTC, Jonathan M Davis wrote: > Now, that aside, the results with hash2 definitely look like a bug to me. It's probably just the result of one more of the many issues with the current AA implementation. This is what I'm guessing too. I've made toHashes in my own version testing this, but the AA problem remains. Problem goes away when not idup-ing, but likely that is the compiler saving space and assigning the same pointer address (which makes sense).

On Sunday, May 27, 2012 00:08:01 Andrej Mitrovic wrote: > On 5/27/12, Jonathan M Davis <jmdavisProg@gmx.com> wrote: > > Why can't you have a toHash in your struct? > > I mean it doesn't seem to make any difference: Yeah. I don't know what the deal is. There's definitely at least one bug here, if not several. > import std.stdio; > > struct Foo > { > string x; > size_t toHash() { return 1; } > } > > void main() > { > int[Foo] hash; > Foo foo1 = Foo("a".idup); > Foo foo2 = Foo("a".idup); > hash[foo1] = 1; > hash[foo2] = 2; > > writeln(hash); // "[Foo("a"):2, Foo("a"):1]" > } > > > If anything, I find it disturbing > > that the code compiles _without_ a definition for opHash in the struct. > > Wait, is it toHash or opHash? I can't find any documentation of opHash, all I see is a mention of toHash in object.d. toHash. Sorry. All the others are op, so it's easy to forget. - Jonathan M Davis

On 5/27/12, Era Scarecrow <rtcvb32@yahoo.com> wrote: > Problem goes > away when not idup-ing, but likely that is the compiler saving > space and assigning the same pointer address (which makes sense). Yes, the .idup was done on purpose here for demonstration.

June 04, 2012

Re: Struct hash issues with string fields

Posted by H. S. Teoh

Permalink

H. S. Teoh

Permalink

On Sat, May 26, 2012 at 09:53:07PM +0200, Andrej Mitrovic wrote:
> I don't understand this:
> 
> import std.stdio;
> 
> struct Symbol { string val; }
> 
> void main()
> {
>     int[string] hash1;
>     hash1["1".idup] = 1;
>     hash1["1".idup] = 2;
>     writeln(hash1);  // writes "["1":2]"
> 
>     int[Symbol] hash2;
>     Symbol sym1 = Symbol("1".idup);
>     Symbol sym2 = Symbol("1".idup);
>     hash2[sym1] = 1;
>     hash2[sym2] = 1;
>     writeln(hash2);  // writes "[Symbol("1"):1, Symbol("1"):1]"
> }
> 
> Why are sym1 and sym2 unique keys in hash2? Because the hash implementation checks the array pointer instead of its contents? But then why does hash1 not have the same issue?

Sorry for the very late reply (I'm on vacation and haven't had time to reply to emails), but this bug is one of the infelicities of the current AA implementation. The problem is that strings have a custom hash function that's distinct from the generic hashing function used for arrays.

Furthermore, the default struct hash function hashes the binary representation of the struct, _not_ the contents of its fields.  For reference types like string, the struct hash function only hashes the string pointer and length, not the string contents.

So there are multiple things wrong on multiple levels here. Taken individually, I can see why things are this way: the string hash function uses a faster hashing algorithm that takes advantage of the assumption that strings contain unicode data, not generic binary data. Struct hash functions are hashed only on the binary representation of the struct, since, in general, structs are supposed to be small value types, so it's faster to just hash the binary representation and be done with it, than to hash member-by-member.

However, taken as a whole, this is inconsistent and doesn't make any sense. When the struct contains reference types like strings, then the hash function becomes inconsistent.

> I can't override toHash() in a struct, so what am I supposed to do in order to make "sym1" and "sym2" be stored into the same hash key?

You should be able to simply define toHash() in the struct and it should
work (I think?). But there may be bugs in this area as well that causes
it not to work.

T

-- 
Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. -- Brian W. Kernighan

Forums