I submitted my container library to code.dlang.org - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » I submitted my container library to code.dlang.org

Thread overview

I submitted my container library to code.dlang.org
Mar 29, 2015 w0rp
Mar 29, 2015 Martin Nowak
Mar 29, 2015 w0rp
Mar 30, 2015 Johannes Pfau
Mar 31, 2015 Martin Nowak
Mar 31, 2015 Martin Nowak
Apr 01, 2015 ketmar
Apr 01, 2015 Johannes Pfau
Apr 02, 2015 Kagamin
Apr 03, 2015 Martin Nowak
Apr 04, 2015 w0rp
Apr 04, 2015 Kagamin
Apr 05, 2015 Martin Nowak
Mar 30, 2015 thedeemon
Mar 31, 2015 Martin Nowak
Mar 31, 2015 Martin Nowak
Apr 01, 2015 thedeemon
Apr 03, 2015 w0rp
Apr 03, 2015 w0rp
Apr 03, 2015 thedeemon
Apr 03, 2015 Martin Nowak
Apr 04, 2015 w0rp
Mar 31, 2015 Martin Nowak

March 29, 2015

I submitted my container library to code.dlang.org

Posted by w0rp

w0rp

I just submitted my container library I was working on ages ago to code.dlang.org. It's currently waiting in a queue, but it should be available from there soon enough.

I submitted my library with the version "0.1.0" to indicate, "Do not expect this to work properly." I have a number of unit tests available for it, and documentation for the library hosted on my website.

https://w0rp.com/project/dstruct/

It is available on GitHub here.

https://github.com/w0rp/dstruct

If anyone is interested in this stuff at all, let me know what you think. If you hate it and want to burn me alive, let me know. A few points follow.

1. This library uses the GC a lot. std.allocator still hasn't made it out into the real world, so I can't use that yet. I would like to offer different allocation schemes at some point with that, but for now the GC default will do. If you absolutely hate the GC and can't use it, this library isn't for you. (For now.)
2. Again, I don't guarantee that everything will work. I would really like people to submit issues so I can cover any errors with more tests.
3. The API for just about anything can be subject to change at this point. I need more feedback from actual usage of my library.
4. I ended up writing my own library hashmap, which when I tested ages ago competed with the standard associative array in terms of performance. This allows me to mark many things @safe pure nothrow. Also, I was able to implement setDefault in a hopefully efficient manner. I have tried to push setDefault as an awesome thing a few times, but I've never been able to explain why it's good eloquently.

Destroy!

March 29, 2015

Re: I submitted my container library to code.dlang.org

Posted by Martin Nowak
in reply to w0rp

Martin Nowak

Posted in reply to w0rp

On 03/29/2015 05:19 PM, w0rp wrote:
> 4. I ended up writing my own library hashmap, which when I tested ages ago competed with the standard associative array in terms of performance. This allows me to mark many things @safe pure nothrow.
> 
> Destroy!

Nice docs.

Always use open addressing when implementing a hash table. https://github.com/D-Programming-Language/dmd/pull/4088 https://github.com/higgsjs/Higgs/pull/170

> Also, I was able to implement setDefault in a hopefully efficient manner. I have tried to push setDefault as an awesome thing a few times, but I've never been able to explain why it's good eloquently.

I guess that's the counterpart to `get(key, default)` that also inserts the default value. That's a very much needed primitive for our AA, because currently you end up doing at least 1 unnecessary lookup.

Often I even write this.

auto p = key in aa;
if (p is null)
{
    aa[key] = default;
    p = key in aa;
}

It's called getOrElseUpdate in Scala, but I'd prefer getOrSet. http://www.scala-lang.org/api/2.10.1-RC1/scala/collection/mutable/HashMap.html#getOrElseUpdate(A,⇒B):B

March 29, 2015

Re: I submitted my container library to code.dlang.org

Posted by w0rp
in reply to Martin Nowak

w0rp

Posted in reply to Martin Nowak

On Sunday, 29 March 2015 at 22:32:34 UTC, Martin Nowak wrote:
> On 03/29/2015 05:19 PM, w0rp wrote:
>> 4. I ended up writing my own library hashmap, which when I tested ages
>> ago competed with the standard associative array in terms of
>> performance. This allows me to mark many things @safe pure nothrow.
>> 
>> Destroy!
>
> Nice docs.
>
> Always use open addressing when implementing a hash table.
> https://github.com/D-Programming-Language/dmd/pull/4088
> https://github.com/higgsjs/Higgs/pull/170

You probably beat my implementation in terms of speed now. I think the only thing
I did quite differently from the standard implementation was use & 2 for computing the hash index instead of a divide, as that was the method OpenJDK used for its hashmap.

>> Also, I was able to implement setDefault in a hopefully efficient
>> manner. I have tried to push setDefault as an awesome thing a few times,
>> but I've never been able to explain why it's good eloquently.
>
> I guess that's the counterpart to `get(key, default)` that also inserts
> the default value. That's a very much needed primitive for our AA,
> because currently you end up doing at least 1 unnecessary lookup.
>
> Often I even write this.
>
> auto p = key in aa;
> if (p is null)
> {
>     aa[key] = default;
>     p = key in aa;
> }
>
> It's called getOrElseUpdate in Scala, but I'd prefer getOrSet.
> http://www.scala-lang.org/api/2.10.1-RC1/scala/collection/mutable/HashMap.html#getOrElseUpdate(A,⇒B):B

Yeah, that's the one. I personally don't really care what the name is. 'setdefault' is the Python name for it. Only we can do better than Python, because we have the lazy keyword. For which I'll say, "One magical day, the compiler can generate special code for this." As you say, it's special in that it has the ability to only compute the hash once when implemented with direct access to the hashtable.

March 30, 2015

Re: I submitted my container library to code.dlang.org

Posted by Johannes Pfau
in reply to Martin Nowak

Johannes Pfau

Posted in reply to Martin Nowak

Am Mon, 30 Mar 2015 00:32:12 +0200
schrieb Martin Nowak <code+news.digitalmars@dawg.eu>:

> On 03/29/2015 05:19 PM, w0rp wrote:
> > 4. I ended up writing my own library hashmap, which when I tested ages ago competed with the standard associative array in terms of performance. This allows me to mark many things @safe pure nothrow.
> > 
> > Destroy!
> 
> Nice docs.
> 
> Always use open addressing when implementing a hash table. https://github.com/D-Programming-Language/dmd/pull/4088 https://github.com/higgsjs/Higgs/pull/170
> 

Another thing to worry about with hash tables is this:

http://events.ccc.de/congress/2012/Fahrplan/events/5152.en.html https://www.youtube.com/watch?v=wGYj8fhhUVA

March 30, 2015

Re: I submitted my container library to code.dlang.org

Posted by thedeemon
in reply to Martin Nowak

thedeemon

Posted in reply to Martin Nowak

On Sunday, 29 March 2015 at 22:32:34 UTC, Martin Nowak wrote:
> Always use open addressing when implementing a hash table.
> https://github.com/D-Programming-Language/dmd/pull/4088
> https://github.com/higgsjs/Higgs/pull/170

Here's a variant of a open addressing hash table (Robin Hood one) that uses std.container.Array instead of relying on GC:
https://bitbucket.org/infognition/robinhood/src/
(see rbhash.d, the rest is just tests)
Not documented yet, sadly.

Here are some reflections in this, describing the approach and comparing with other implementations (built-in AAs and linear probing hashtable from vibe.d):
http://www.infognition.com/blog/2014/on_robin_hood_hashing.html

Apparently with default hash functions for strings such hash tables get very slow due to clustering. With other key types they are pretty fast.

March 31, 2015

Re: I submitted my container library to code.dlang.org

Posted by Martin Nowak
in reply to Johannes Pfau

Martin Nowak

Posted in reply to Johannes Pfau

On Monday, 30 March 2015 at 09:31:26 UTC, Johannes Pfau wrote:
> Another thing to worry about with hash tables is this:
>
> http://events.ccc.de/congress/2012/Fahrplan/events/5152.en.html
> https://www.youtube.com/watch?v=wGYj8fhhUVA

Mmh, that's a topic for specialized hash tables IMO.
A generic hash table should do a best effort to avoid collisions, but you can't expect it to use a cryptographic quality hash by default if that is considerably slower.
Anyone benchmarked that SipHash?

March 31, 2015

Re: I submitted my container library to code.dlang.org

Posted by Martin Nowak
in reply to thedeemon

Martin Nowak

Posted in reply to thedeemon

On Monday, 30 March 2015 at 11:23:13 UTC, thedeemon wrote:
> Here's a variant of a open addressing hash table (Robin Hood one) that uses std.container.Array instead of relying on GC:
> https://bitbucket.org/infognition/robinhood/src/
> (see rbhash.d, the rest is just tests)
> Not documented yet, sadly.

You shouldn't write when you read. Robin Hood sounds like a good idea, but it really isn't. Keep your load factor reasonable and distribute values evenly, then you don't need a LRU lookup.

> Apparently with default hash functions for strings such hash tables get very slow due to clustering. With other key types they are pretty fast.

That's one reason why you should use quadratic probing for hash tables, but you should also use a good hash function. MurmurHash2 (not 3) is particularly fast, even for very small keys.

The simplest and most efficient way to do quadratic probing is to use triangular numbers and a power of 2 sized hash table.

March 31, 2015

Re: I submitted my container library to code.dlang.org

Posted by Martin Nowak
in reply to Martin Nowak

Martin Nowak

Posted in reply to Martin Nowak

On Tuesday, 31 March 2015 at 21:17:04 UTC, Martin Nowak wrote:
> The simplest and most efficient way to do quadratic probing is to use triangular numbers and a power of 2 sized hash table.

I'd be glad if someone implemented that for our current AA.
https://issues.dlang.org/show_bug.cgi?id=14385

March 31, 2015

Re: I submitted my container library to code.dlang.org

Posted by Martin Nowak
in reply to Martin Nowak

Martin Nowak

Posted in reply to Martin Nowak

On Sunday, 29 March 2015 at 22:32:34 UTC, Martin Nowak wrote:
> I guess that's the counterpart to `get(key, default)` that also inserts
> the default value. That's a very much needed primitive for our AA,
> because currently you end up doing at least 1 unnecessary lookup.

https://issues.dlang.org/show_bug.cgi?id=14386

March 31, 2015

Re: I submitted my container library to code.dlang.org

Posted by Martin Nowak
in reply to Martin Nowak

Martin Nowak

Posted in reply to Martin Nowak

On 03/31/2015 11:12 PM, Martin Nowak wrote:
> Anyone benchmarked that SipHash?

No way we use this for druntime. https://github.com/rurban/smhasher#readme

I think we should have a flexible Hash in std.container though, that should allow to use a customized hash function.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation