View mode: basic / threaded / horizontal-split · Log in · Help
November 11, 2012
Re: Performance of hashes and associative arrays
On 11/11/2012 01:11 AM, Joseph Rushton Wakeling wrote:

> So where classes are concerned there's a definite need to
> override opHash if you want the _values_ to be the basis of the
> association, rather than the object itself ... this would also have
> potential implications for memory blow-up where a class is used as the
> key type, no?

I am not sure that I understand.

The associative array would still contain just class variables, not 
class objects. Everytime the hash needed to be calculated, the AA would 
call toHash() on the variable, which in turn would use the values of the 
members of the object.

As another case, a single class object is owned by the runtime but many 
class variables of it can exist in multiple AAs.

> The more pressing reason for avoiding associative arrays as it happens
> was simply that I was getting unacceptably large memory usage as a
> result: I'd got some data stored in the form of byte[size_t] (really
> this should have been a set of size_t's, but this seemed theoretically
> an acceptable stop gap while there's no set class in Phobos),

I think std.container.RedBlackTree can take that responsibility but if 
you don't need the elements to be in any particular order, then it may 
be seen as an overkill as well.

> and for
> some reason that was generating memory consumption twice that of having
> instead a size_t[] regular array. I guess perhaps because although a
> byte should mean just that, in practice each value in the byte[size_t]
> associative array was taking up a whole word?

I don't know the actual implementation of AAs by dmd, but the hash 
buckets are probably singly-linked lists. Even if your data had no 
collisions, each byte would have to live in a singly-linked list.

> It's a shame, because functionally it was useful to be able to do things
> of the form if(i in list), but memory-wise it just didn't work for the
> size of data I'm dealing with.

If all you need is 'if(i in list)', then a HashSet may help. I haven't 
used one myself but apparently there are such data structures in other 
languages like Java and C#.

Ali
November 11, 2012
Re: Performance of hashes and associative arrays
On 11/11/2012 07:40 PM, Ali Çehreli wrote:
> I think std.container.RedBlackTree can take that responsibility but if you don't
> need the elements to be in any particular order, then it may be seen as an
> overkill as well.

I know, and I probably should give that a go, but as I've got working code right 
now I'm moving on to something else and will come back to it later.

> If all you need is 'if(i in list)', then a HashSet may help. I haven't used one
> myself but apparently there are such data structures in other languages like
> Java and C#.

Indeed, and I was wondering whether it might be worth writing an implementation 
for std.containers.  As it stands I don't really feel familiar enough with the 
underlying concepts, but I may revisit that idea once I'm done with a few other 
things.
November 11, 2012
Re: Performance of hashes and associative arrays
=?UTF-8?B?QWxpIMOHZWhyZWxp?= wrote:

> but the hash buckets are probably singly-linked lists

They are unbalanced binary search trees, which indeed turn to 
singly-linked lists on sorted input.

-manfred
November 17, 2012
Re: Performance of hashes and associative arrays
Le 11/11/2012 12:38, Manfred Nowak a écrit :
> Jakse wrote:
>
>> It would also be interesting to have ideas for the general
>> case
>
> Yes, ideas might be interesting. :-)
>
> A root of "good" hashing is incorporated in algorithm2 of your
> posting:
>
> spread the values produced by the hash function uniformly over
> the interval  size_t.min .. size_t.max.
>
> Therefore:
>
> Compute the hash value for a given key by multiplying each byte
> of the key with a factor, which increases from the byte with
> lowest significance to the byte with highest significance; of
> course add the results.
>
> Remarks:
> a)
> Because of the general case: assume that the original key
> contains much redundance. Therefore do not use the bytes of the
> original key but compute a ( at least close to) lossless
> compression of the original key.
>
> b)
> The factors in the factor list should be primes for spreading
> the hash value uniformly over the intended range
>
> c)
> The quotint of two consecutives primes in the factor list should
> be greater than 256, because the used key might not contain any
> reduundancy.
>
> -manfred
>

Thanks for this interesting answer.
Is compressing for performance reasons ? is it more interesting to 
compress and then hash than just hash ?
November 17, 2012
Re: Performance of hashes and associative arrays
Raphaël Jakse wrote:

> Is compressing for performance reasons?
Hopefully. Because in the general case the distribution of the 
keys is unknown, no function used for computing the hash value 
can be guarenteed to indeed spread the hash values uniformly 
over the hash interval.

Compressing would have a negative effect on performance if more 
time for compressing and decrompessing would be needed than time 
was lost for resolving conflicting hash values. 

> is it more interesting to compress and then hash than just
> hash ?
The only purpose of compressing is to create the special case of 
close to no redundancy in the keys used for hashing. If this 
special case is already confirmed or it is known that no 
compression will deminish the computing time: just hash.

-manfred
Next ›   Last »
1 2
Top | Discussion index | About this forum | D home