Thread overview
how hash_t toHash() works?
Apr 28, 2013
gedaiu
Apr 28, 2013
Ivan Kazmenko
Apr 29, 2013
gedaiu
Apr 29, 2013
Ivan Kazmenko
Apr 30, 2013
gedaiu
Apr 30, 2013
Ivan Kazmenko
May 04, 2013
gedaiu
May 04, 2013
Ivan Kazmenko
April 28, 2013
hi,

I have a class which I want to use as key in an assoc array like
this:

string["KeyString"] myArray;

What i want is to preserve the order in the array. I want always
to have "1" before "2" if the string is a numeric value.

Can anyone help me to understand how const hash_t toHash() should
work?

Here is the KeyString class:

struct KeyString {
	string str;
	
	KeyString opAssign(string val) {
		str = val;
		return this;
	}
	
	const hash_t toHash() {
		try {
			return to!long(str);
		} catch(Exception e) {
			hash_t hash; foreach (char c; str) hash = (hash * 9) + c;
			return hash;
		}
	}
	
	const int opCmp(ref const KeyString s) {
		//return std.string.cmp(this.str, s.str);
		
		try {
			auto val1 = to!long(this.str);
			auto val2 = to!long(s.str);
			
			if(val1 < val2) {
				return -1;
			}
			
			if(val1 == val2) {
				return 0;
			}
			
		} catch(Exception e) {
			return std.string.cmp(str, s.str);
		}
		
		return 1;
	}
}


Thanks,
Bogdan
April 28, 2013
Hi,

> I have a class which I want to use as key in an assoc array like
> this:
>
> string["KeyString"] myArray;
>
> What i want is to preserve the order in the array. I want always
> to have "1" before "2" if the string is a numeric value.
>
> Can anyone help me to understand how const hash_t toHash() should
> work?

Associative arrays in D are implemented as hash tables.  Hashing sacrifices ordering capabilities for greater speed.  Creating a hash table will most certainly take your toHash value modulo some integer you can not control (small for small tables, large for larger ones).  It would be crippling, if at all possible, to have foreach on an associative array always produce a sorted result.

I'd suggest using std.container.RedBlackTree instead.  It doesn't provide syntactic sugar (like "container[key] = value") but preserves the ordering.

Also note that sorting items as strings should be separated from sorting as integers.  For example, a value convertible to integer should be always less (or always greater) than a non-convertible value.  Otherwise, it would quickly lead to inconsistent comparisons, as in "22" <s "22a" <s "4" <n "22".  Here, "<s" compares values as strings and "<n" as integers.

Below is an example of using RedBlackTree close to what you might need:

-----
import std.container;
import std.conv;
import std.stdio;

bool doConv(out long val, string s) {
	try {
		val = to!long(s);
		return true;
	}
	catch(ConvException) {
		return false;
	}
}

struct KeyString {
	string str;

	KeyString opAssign(string val) {
		str = val;
		return this;
	}

	const int opCmp(ref const KeyString s) {
		long v1, v2;
		auto b1 = doConv(v1, str);
		auto b2 = doConv(v2, s.str);
		if (b1 != b2) return b2 - b1;
		if (b1 && b2) return (v1 > v2) - (v1 < v2);
		return std.string.cmp(str, s.str);
	}
}

struct MyRecord {
	KeyString key;
	int val;

	this(string nkey, int nval) {
		key = nkey;
		val = nval;
	}
}

void main () {
	auto cont = redBlackTree !("a.key < b.key", true, MyRecord) ();
	cont.insert(MyRecord("1", 1));
	cont.insert(MyRecord("111", 2));
	cont.insert(MyRecord("25", 3));
	cont.insert(MyRecord("53", 4));
	cont.insert(MyRecord("a", 5));
	cont.insert(MyRecord(" ", 6));
	cont.insert(MyRecord("1234567890123456789", 7));
	cont.insert(MyRecord("12345678901234567890", 8));
	foreach (ref const cur; cont) {
		writefln("cont[\"%s\"] = %s", cur.key.str, cur.val);
	}
}
-----

The example prints:

-----
cont["1"] = 1
cont["25"] = 3
cont["53"] = 4
cont["111"] = 2
cont["1234567890123456789"] = 7
cont[" "] = 6
cont["12345678901234567890"] = 8
cont["a"] = 5
-----

The last three entries are strings not convertible to a long, so they are ordered lexicographically.

Ivan Kazmenko.
April 29, 2013
On Sunday, 28 April 2013 at 13:18:04 UTC, Ivan Kazmenko wrote:
> Hi,
>
>> I have a class which I want to use as key in an assoc array like
>> this:
>>
>> string["KeyString"] myArray;
>>
>> What i want is to preserve the order in the array. I want always
>> to have "1" before "2" if the string is a numeric value.
>>
>> Can anyone help me to understand how const hash_t toHash() should
>> work?
>
> Associative arrays in D are implemented as hash tables.  Hashing sacrifices ordering capabilities for greater speed.  Creating a hash table will most certainly take your toHash value modulo some integer you can not control (small for small tables, large for larger ones).  It would be crippling, if at all possible, to have foreach on an associative array always produce a sorted result.
>
> I'd suggest using std.container.RedBlackTree instead.  It doesn't provide syntactic sugar (like "container[key] = value") but preserves the ordering.
>
> Also note that sorting items as strings should be separated from sorting as integers.  For example, a value convertible to integer should be always less (or always greater) than a non-convertible value.  Otherwise, it would quickly lead to inconsistent comparisons, as in "22" <s "22a" <s "4" <n "22".  Here, "<s" compares values as strings and "<n" as integers.
>
> Below is an example of using RedBlackTree close to what you might need:
>
> -----
> import std.container;
> import std.conv;
> import std.stdio;
>
> bool doConv(out long val, string s) {
> 	try {
> 		val = to!long(s);
> 		return true;
> 	}
> 	catch(ConvException) {
> 		return false;
> 	}
> }
>
> struct KeyString {
> 	string str;
>
> 	KeyString opAssign(string val) {
> 		str = val;
> 		return this;
> 	}
>
> 	const int opCmp(ref const KeyString s) {
> 		long v1, v2;
> 		auto b1 = doConv(v1, str);
> 		auto b2 = doConv(v2, s.str);
> 		if (b1 != b2) return b2 - b1;
> 		if (b1 && b2) return (v1 > v2) - (v1 < v2);
> 		return std.string.cmp(str, s.str);
> 	}
> }
>
> struct MyRecord {
> 	KeyString key;
> 	int val;
>
> 	this(string nkey, int nval) {
> 		key = nkey;
> 		val = nval;
> 	}
> }
>
> void main () {
> 	auto cont = redBlackTree !("a.key < b.key", true, MyRecord) ();
> 	cont.insert(MyRecord("1", 1));
> 	cont.insert(MyRecord("111", 2));
> 	cont.insert(MyRecord("25", 3));
> 	cont.insert(MyRecord("53", 4));
> 	cont.insert(MyRecord("a", 5));
> 	cont.insert(MyRecord(" ", 6));
> 	cont.insert(MyRecord("1234567890123456789", 7));
> 	cont.insert(MyRecord("12345678901234567890", 8));
> 	foreach (ref const cur; cont) {
> 		writefln("cont[\"%s\"] = %s", cur.key.str, cur.val);
> 	}
> }
> -----
>
> The example prints:
>
> -----
> cont["1"] = 1
> cont["25"] = 3
> cont["53"] = 4
> cont["111"] = 2
> cont["1234567890123456789"] = 7
> cont[" "] = 6
> cont["12345678901234567890"] = 8
> cont["a"] = 5
> -----
>
> The last three entries are strings not convertible to a long, so they are ordered lexicographically.
>
> Ivan Kazmenko.

Thanks!

it looks great!

one more questionWhat is the type of cont?

auto cont = redBlackTree !("a.key < b.key", true, MyRecord) ();

I want to use this as a property in a class and i can't use there auto keyword... I tried different types but it did not work.

Bogdan
April 29, 2013
> one more question
> What is the type of cont?
>
> auto cont = redBlackTree !("a.key < b.key", true, MyRecord) ();
>
> I want to use this as a property in a class and i can't use there auto keyword... I tried different types but it did not work.

For me, the following declaration works:

-----
import std.functional;
...
	RedBlackTree !(MyRecord, binaryFun!"a.key < b.key", true) cont;
...
	cont = redBlackTree !("a.key < b.key", true, MyRecord) ();
-----

I admit it could have been easier to figure out.  For example, "writeln (typeof (cont).stringof);" just prints "RedBlackTree" which is not enough for a proper declaration.  I've inferred the right declaration from the following lines in container.d:

-----
/++ Ditto +/
auto redBlackTree(alias less, bool allowDuplicates, E)(E[] elems...)
    if(is(typeof(binaryFun!less(E.init, E.init))))
{
    //We shouldn't need to instantiate less here, but for some reason,
    //dmd can't handle it if we don't (even though the template which
    //takes less but not allowDuplicates works just fine).
    return new RedBlackTree!(E, binaryFun!less, allowDuplicates)(elems);
}
-----
April 30, 2013
On Monday, 29 April 2013 at 16:01:15 UTC, Ivan Kazmenko wrote:
>> one more question
>> What is the type of cont?
>>
>> auto cont = redBlackTree !("a.key < b.key", true, MyRecord) ();
>>
>> I want to use this as a property in a class and i can't use there auto keyword... I tried different types but it did not work.
>
> For me, the following declaration works:
>
> -----
> import std.functional;
> ...
> 	RedBlackTree !(MyRecord, binaryFun!"a.key < b.key", true) cont;
> ...
> 	cont = redBlackTree !("a.key < b.key", true, MyRecord) ();
> -----
>
> I admit it could have been easier to figure out.  For example, "writeln (typeof (cont).stringof);" just prints "RedBlackTree" which is not enough for a proper declaration.  I've inferred the right declaration from the following lines in container.d:
>
> -----
> /++ Ditto +/
> auto redBlackTree(alias less, bool allowDuplicates, E)(E[] elems...)
>     if(is(typeof(binaryFun!less(E.init, E.init))))
> {
>     //We shouldn't need to instantiate less here, but for some reason,
>     //dmd can't handle it if we don't (even though the template which
>     //takes less but not allowDuplicates works just fine).
>     return new RedBlackTree!(E, binaryFun!less, allowDuplicates)(elems);
> }
> -----

Error: template instance RedBlackTree!(ValueRecord, binaryFun, true) RedBlackTree!(ValueRecord, binaryFun, true) does not match template declaration RedBlackTree(T, alias less = "a < b", bool allowDuplicates = false) if (is(typeof(binaryFun!(less)(T.init, T.init))))
Error: RedBlackTree!(ValueRecord, binaryFun, true) is used as a type

Do you know what this error means?

Thank,
Bogdan
April 30, 2013
>> -----
>> import std.functional;
>> ...
>> 	RedBlackTree !(MyRecord, binaryFun!"a.key < b.key", true) cont;
>> ...
>> 	cont = redBlackTree !("a.key < b.key", true, MyRecord) ();
>> -----

> Error: template instance RedBlackTree!(ValueRecord, binaryFun, true) RedBlackTree!(ValueRecord, binaryFun, true) does not match template declaration RedBlackTree(T, alias less = "a < b", bool allowDuplicates = false) if (is(typeof(binaryFun!(less)(T.init, T.init))))
> Error: RedBlackTree!(ValueRecord, binaryFun, true) is used as a type

I am able to reproduce it if I write
 	RedBlackTree !(MyRecord, binaryFun, true)
instead of
 	RedBlackTree !(MyRecord, binaryFun!"a.key < b.key", true)

If you are using a plain regular function instead of "a.key < b.key" there, consider the following form:

-----
bool less (T) (auto ref T a, auto ref T b)
{
	return a.key < b.key;
}
...
	RedBlackTree !(MyRecord, less, true) cont;
...
	cont = redBlackTree !(less, true, MyRecord) ();
-----

Note that the straightforward notation does *not* work yet in 2.062 if you want ref parameters:

-----
bool less (ref MyRecord a, ref MyRecord b)
{
	return a.key < b.key;
}
-----

The current condition of binaryFun is too tight.  So, for now, we have to create a non-ref version too to pass it.  See (and perhaps comment) the following issue in BugZilla:  http://d.puremagic.com/issues/show_bug.cgi?id=9513

Ivan Kazmenko.
May 04, 2013
On Tuesday, 30 April 2013 at 17:48:08 UTC, Ivan Kazmenko wrote:
>>> -----
>>> import std.functional;
>>> ...
>>> 	RedBlackTree !(MyRecord, binaryFun!"a.key < b.key", true) cont;
>>> ...
>>> 	cont = redBlackTree !("a.key < b.key", true, MyRecord) ();
>>> -----
>
>> Error: template instance RedBlackTree!(ValueRecord, binaryFun, true) RedBlackTree!(ValueRecord, binaryFun, true) does not match template declaration RedBlackTree(T, alias less = "a < b", bool allowDuplicates = false) if (is(typeof(binaryFun!(less)(T.init, T.init))))
>> Error: RedBlackTree!(ValueRecord, binaryFun, true) is used as a type
>
> I am able to reproduce it if I write
>  	RedBlackTree !(MyRecord, binaryFun, true)
> instead of
>  	RedBlackTree !(MyRecord, binaryFun!"a.key < b.key", true)
>
> If you are using a plain regular function instead of "a.key < b.key" there, consider the following form:
>
> -----
> bool less (T) (auto ref T a, auto ref T b)
> {
> 	return a.key < b.key;
> }
> ...
> 	RedBlackTree !(MyRecord, less, true) cont;
> ...
> 	cont = redBlackTree !(less, true, MyRecord) ();
> -----
>
> Note that the straightforward notation does *not* work yet in 2.062 if you want ref parameters:
>
> -----
> bool less (ref MyRecord a, ref MyRecord b)
> {
> 	return a.key < b.key;
> }
> -----
>
> The current condition of binaryFun is too tight.  So, for now, we have to create a non-ref version too to pass it.  See (and perhaps comment) the following issue in BugZilla:  http://d.puremagic.com/issues/show_bug.cgi?id=9513
>
> Ivan Kazmenko.

One more question... why associative arrays in D can't be implemented like here?

http://svn.php.net/viewvc/php/php-src/trunk/Zend/zend_hash.h?view=markup

it seems that php arrays uses hash tables too but they preserve orders.

Thanks,
Bogdan
May 04, 2013
> One more question... why associative arrays in D can't be implemented like here?
>
> http://svn.php.net/viewvc/php/php-src/trunk/Zend/zend_hash.h?view=markup
>
> it seems that php arrays uses hash tables too but they preserve orders.

From what I see there, this code traverses a hash table in the order in which the elements were inserted, not any other particular order (such as lexicographical).  Much like Java's LinkedHashSet: http://docs.oracle.com/javase/6/docs/api/java/util/LinkedHashSet.html

This however has a cost of maintaining a linked list which is at least one pointer per entry in size.  If that was the implemented behavior, the performance would suffer for users who don't need any particular order.

Slightly off topic:  by the way, polynomial hashing modulo a power of two (as in the link you gave) is fast but has a rather general corner case regardless of the multiplier.  And multiplying by small integers such as 33 gives another easy way to produce plenty of strings with identical hashes, this time regardless of the modulo.  So, hashing like "hash(i) = hash(i-1) * 33 + str[i]" is a piece of cake for an adversary who wants your running time to be quadratic instead of linear.

Ivan Kazmenko.