Jump to page: 1 2 3
Thread overview
I submitted my container library to code.dlang.org
Mar 29, 2015
w0rp
Mar 29, 2015
Martin Nowak
Mar 29, 2015
w0rp
Mar 30, 2015
Johannes Pfau
Mar 31, 2015
Martin Nowak
Mar 31, 2015
Martin Nowak
Apr 01, 2015
ketmar
Apr 01, 2015
Johannes Pfau
Apr 02, 2015
Kagamin
Apr 03, 2015
Martin Nowak
Apr 04, 2015
w0rp
Apr 04, 2015
Kagamin
Apr 05, 2015
Martin Nowak
Mar 30, 2015
thedeemon
Mar 31, 2015
Martin Nowak
Mar 31, 2015
Martin Nowak
Apr 01, 2015
thedeemon
Apr 03, 2015
w0rp
Apr 03, 2015
w0rp
Apr 03, 2015
thedeemon
Apr 03, 2015
Martin Nowak
Apr 04, 2015
w0rp
Mar 31, 2015
Martin Nowak
March 29, 2015
I just submitted my container library I was working on ages ago to code.dlang.org. It's currently waiting in a queue, but it should be available from there soon enough.

I submitted my library with the version "0.1.0" to indicate, "Do not expect this to work properly." I have a number of unit tests available for it, and documentation for the library hosted on my website.

https://w0rp.com/project/dstruct/

It is available on GitHub here.

https://github.com/w0rp/dstruct

If anyone is interested in this stuff at all, let me know what you think. If you hate it and want to burn me alive, let me know. A few points follow.

1. This library uses the GC a lot. std.allocator still hasn't made it out into the real world, so I can't use that yet. I would like to offer different allocation schemes at some point with that, but for now the GC default will do. If you absolutely hate the GC and can't use it, this library isn't for you. (For now.)
2. Again, I don't guarantee that everything will work. I would really like people to submit issues so I can cover any errors with more tests.
3. The API for just about anything can be subject to change at this point. I need more feedback from actual usage of my library.
4. I ended up writing my own library hashmap, which when I tested ages ago competed with the standard associative array in terms of performance. This allows me to mark many things @safe pure nothrow. Also, I was able to implement setDefault in a hopefully efficient manner. I have tried to push setDefault as an awesome thing a few times, but I've never been able to explain why it's good eloquently.

Destroy!
March 29, 2015
On 03/29/2015 05:19 PM, w0rp wrote:
> 4. I ended up writing my own library hashmap, which when I tested ages ago competed with the standard associative array in terms of performance. This allows me to mark many things @safe pure nothrow.
> 
> Destroy!

Nice docs.

Always use open addressing when implementing a hash table. https://github.com/D-Programming-Language/dmd/pull/4088 https://github.com/higgsjs/Higgs/pull/170

> Also, I was able to implement setDefault in a hopefully efficient manner. I have tried to push setDefault as an awesome thing a few times, but I've never been able to explain why it's good eloquently.

I guess that's the counterpart to `get(key, default)` that also inserts the default value. That's a very much needed primitive for our AA, because currently you end up doing at least 1 unnecessary lookup.

Often I even write this.

auto p = key in aa;
if (p is null)
{
    aa[key] = default;
    p = key in aa;
}

It's called getOrElseUpdate in Scala, but I'd prefer getOrSet. http://www.scala-lang.org/api/2.10.1-RC1/scala/collection/mutable/HashMap.html#getOrElseUpdate(A,⇒B):B
March 29, 2015
On Sunday, 29 March 2015 at 22:32:34 UTC, Martin Nowak wrote:
> On 03/29/2015 05:19 PM, w0rp wrote:
>> 4. I ended up writing my own library hashmap, which when I tested ages
>> ago competed with the standard associative array in terms of
>> performance. This allows me to mark many things @safe pure nothrow.
>> 
>> Destroy!
>
> Nice docs.
>
> Always use open addressing when implementing a hash table.
> https://github.com/D-Programming-Language/dmd/pull/4088
> https://github.com/higgsjs/Higgs/pull/170

You probably beat my implementation in terms of speed now. I think the only thing
I did quite differently from the standard implementation was use & 2 for computing the hash index instead of a divide, as that was the method OpenJDK used for its hashmap.

>> Also, I was able to implement setDefault in a hopefully efficient
>> manner. I have tried to push setDefault as an awesome thing a few times,
>> but I've never been able to explain why it's good eloquently.
>
> I guess that's the counterpart to `get(key, default)` that also inserts
> the default value. That's a very much needed primitive for our AA,
> because currently you end up doing at least 1 unnecessary lookup.
>
> Often I even write this.
>
> auto p = key in aa;
> if (p is null)
> {
>     aa[key] = default;
>     p = key in aa;
> }
>
> It's called getOrElseUpdate in Scala, but I'd prefer getOrSet.
> http://www.scala-lang.org/api/2.10.1-RC1/scala/collection/mutable/HashMap.html#getOrElseUpdate(A,⇒B):B

Yeah, that's the one. I personally don't really care what the name is. 'setdefault' is the Python name for it. Only we can do better than Python, because we have the lazy keyword. For which I'll say, "One magical day, the compiler can generate special code for this." As you say, it's special in that it has the ability to only compute the hash once when implemented with direct access to the hashtable.
March 30, 2015
Am Mon, 30 Mar 2015 00:32:12 +0200
schrieb Martin Nowak <code+news.digitalmars@dawg.eu>:

> On 03/29/2015 05:19 PM, w0rp wrote:
> > 4. I ended up writing my own library hashmap, which when I tested ages ago competed with the standard associative array in terms of performance. This allows me to mark many things @safe pure nothrow.
> > 
> > Destroy!
> 
> Nice docs.
> 
> Always use open addressing when implementing a hash table. https://github.com/D-Programming-Language/dmd/pull/4088 https://github.com/higgsjs/Higgs/pull/170
> 

Another thing to worry about with hash tables is this:

http://events.ccc.de/congress/2012/Fahrplan/events/5152.en.html https://www.youtube.com/watch?v=wGYj8fhhUVA
March 30, 2015
On Sunday, 29 March 2015 at 22:32:34 UTC, Martin Nowak wrote:
> Always use open addressing when implementing a hash table.
> https://github.com/D-Programming-Language/dmd/pull/4088
> https://github.com/higgsjs/Higgs/pull/170


Here's a variant of a open addressing hash table (Robin Hood one) that uses std.container.Array instead of relying on GC:
https://bitbucket.org/infognition/robinhood/src/
(see rbhash.d, the rest is just tests)
Not documented yet, sadly.

Here are some reflections in this, describing the approach and comparing with other implementations (built-in AAs and linear probing hashtable from vibe.d):
http://www.infognition.com/blog/2014/on_robin_hood_hashing.html

Apparently with default hash functions for strings such hash tables get very slow due to clustering. With other key types they are pretty fast.
March 31, 2015
On Monday, 30 March 2015 at 09:31:26 UTC, Johannes Pfau wrote:
> Another thing to worry about with hash tables is this:
>
> http://events.ccc.de/congress/2012/Fahrplan/events/5152.en.html
> https://www.youtube.com/watch?v=wGYj8fhhUVA

Mmh, that's a topic for specialized hash tables IMO.
A generic hash table should do a best effort to avoid collisions, but you can't expect it to use a cryptographic quality hash by default if that is considerably slower.
Anyone benchmarked that SipHash?
March 31, 2015
On Monday, 30 March 2015 at 11:23:13 UTC, thedeemon wrote:
> Here's a variant of a open addressing hash table (Robin Hood one) that uses std.container.Array instead of relying on GC:
> https://bitbucket.org/infognition/robinhood/src/
> (see rbhash.d, the rest is just tests)
> Not documented yet, sadly.

You shouldn't write when you read. Robin Hood sounds like a good idea, but it really isn't. Keep your load factor reasonable and distribute values evenly, then you don't need a LRU lookup.

> Apparently with default hash functions for strings such hash tables get very slow due to clustering. With other key types they are pretty fast.

That's one reason why you should use quadratic probing for hash tables, but you should also use a good hash function. MurmurHash2 (not 3) is particularly fast, even for very small keys.

The simplest and most efficient way to do quadratic probing is to use triangular numbers and a power of 2 sized hash table.
March 31, 2015
On Tuesday, 31 March 2015 at 21:17:04 UTC, Martin Nowak wrote:
> The simplest and most efficient way to do quadratic probing is to use triangular numbers and a power of 2 sized hash table.

I'd be glad if someone implemented that for our current AA.
https://issues.dlang.org/show_bug.cgi?id=14385
March 31, 2015
On Sunday, 29 March 2015 at 22:32:34 UTC, Martin Nowak wrote:
> I guess that's the counterpart to `get(key, default)` that also inserts
> the default value. That's a very much needed primitive for our AA,
> because currently you end up doing at least 1 unnecessary lookup.

https://issues.dlang.org/show_bug.cgi?id=14386
March 31, 2015
On 03/31/2015 11:12 PM, Martin Nowak wrote:
> Anyone benchmarked that SipHash?

No way we use this for druntime. https://github.com/rurban/smhasher#readme

I think we should have a flexible Hash in std.container though, that should allow to use a customized hash function.
« First   ‹ Prev
1 2 3