Thread overview | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
|
March 08, 2016 Memory Efficient HashSet | ||||
---|---|---|---|---|
| ||||
Has anybody put together a memory-efficient D-implementation of a HashSet Something like sparse_hash_set<> contained in https://github.com/sparsehash/sparsehash but in D. |
March 08, 2016 Re: Memory Efficient HashSet | ||||
---|---|---|---|---|
| ||||
Posted in reply to Nordlöw | On Tuesday, 8 March 2016 at 08:12:04 UTC, Nordlöw wrote:
> sparse_hash_set<> contained in
>
> https://github.com/sparsehash/sparsehash
It appears to be very slow?
What do you need it for?
|
March 08, 2016 Re: Memory Efficient HashSet | ||||
---|---|---|---|---|
| ||||
Posted in reply to Nordlöw | On Tuesday, 8 March 2016 at 08:12:04 UTC, Nordlöw wrote: > Has anybody put together a memory-efficient D-implementation of a HashSet > > Something like > > sparse_hash_set<> contained in > > https://github.com/sparsehash/sparsehash > > but in D. There is an implementation in: code.dlang.org/packages/emsi_containers But to be honest I got stuck trying to use it (copy constructor disabled), so I used this very minimal wrapper around associative array: private struct HashSet(E) { // TODO switch to proper implementation (not using AA) bool put( E el ) { if ( el in set ) return false; set[el] = set.length; return true; } size_t[E] set; } (I only needed put, since I used it to check whether we already encountered a value before in a lazy/non sorted implementation of uniq) |
March 09, 2016 Re: Memory Efficient HashSet | ||||
---|---|---|---|---|
| ||||
Posted in reply to Ola Fosheim Grøstad | On Tuesday, 8 March 2016 at 12:25:36 UTC, Ola Fosheim Grøstad wrote:
> On Tuesday, 8 March 2016 at 08:12:04 UTC, Nordlöw wrote:
>> sparse_hash_set<> contained in
>>
>> https://github.com/sparsehash/sparsehash
>
> It appears to be very slow?
> What do you need it for?
My knowledge database engine I'm building cannot afford the memory overhead of D's builtin associative arrays.
For instance the following program
void main(string[] args)
{
alias T = uint;
bool[T] x; // emulates a HashSet with significant memory overhead
import std.range : iota;
const n = 10_000_000;
foreach (const i; iota(0, n))
{
x[i] = true; // indicate that it's stored
}
import std.stdio : writeln, readln;
writeln("Press return: ");
readln;
}
consumes 842.m MiB on my Ubuntu.
The consumption with Google's sparsehash unordered_set is about 50 MiB.
|
March 10, 2016 Re: Memory Efficient HashSet | ||||
---|---|---|---|---|
| ||||
Posted in reply to Nordlöw | On Wednesday, 9 March 2016 at 22:31:50 UTC, Nordlöw wrote: [...] > foreach (const i; iota(0, n)) > { > x[i] = true; // indicate that it's stored > } > > import std.stdio : writeln, readln; > writeln("Press return: "); > readln; > } > > consumes 842.m MiB on my Ubuntu. > > The consumption with Google's sparsehash unordered_set is about 50 MiB. May be the discussion associated with this post, can give help??? https://forum.dlang.org/post/mailman.217.1457590242.26339.digitalmars-d-bugs@puremagic.com I tried your little program and played around. I tried x.rehash, but this resulted in even more memory consumption. 80 bytes for 4 bytes key and one bit storage seams a lot. With smaller n it comes down to app. 60 Bytes. With how many data sets at the end, you will have to deal? |
March 10, 2016 Re: Memory Efficient HashSet | ||||
---|---|---|---|---|
| ||||
Posted in reply to Nordlöw | On Wednesday, 9 March 2016 at 22:31:50 UTC, Nordlöw wrote: > The consumption with Google's sparsehash unordered_set is about 50 MiB. See also: http://smerity.com/articles/2015/google_sparsehash.html |
March 10, 2016 Re: Memory Efficient HashSet | ||||
---|---|---|---|---|
| ||||
Posted in reply to Nordlöw | On Wednesday, 9 March 2016 at 22:31:50 UTC, Nordlöw wrote:
> My knowledge database engine I'm building cannot afford the memory overhead of D's builtin associative arrays.
Sounds like a cool project! You could also look into using a trie.
|
March 10, 2016 Re: Memory Efficient HashSet | ||||
---|---|---|---|---|
| ||||
Posted in reply to Martin Tschierschke | On Thursday, 10 March 2016 at 10:15:10 UTC, Martin Tschierschke wrote:
> With how many data sets at the end, you will have to deal?
I need something close to the memory overhead of a sorted array of values. That is why I'm considering converting SparseHash's sparse_hash_set to D.
|
March 10, 2016 Re: Memory Efficient HashSet | ||||
---|---|---|---|---|
| ||||
Posted in reply to Nordlöw | On Wednesday, 9 March 2016 at 22:31:50 UTC, Nordlöw wrote: > consumes 842.m MiB on my Ubuntu. Here's mine: https://bitbucket.org/infognition/robinhood/src (you just need one file rbhash.d to use in your apps) The following test takes ~130 MB and can take less with some tweaks in the settings: import std.stdio, rbhash; alias Set(T) = RHHash!(T, void); void main() { auto set = new Set!uint; foreach(i; 0..10_000_000) set.add(i); writeln(set.contains(444)); readln; } It's a GC-free Robin Hood hash table implementation with special treatment for void as value type, i.e. sets. |
March 10, 2016 Re: Memory Efficient HashSet | ||||
---|---|---|---|---|
| ||||
Posted in reply to Nordlöw | On 3/8/16 3:12 AM, Nordlöw wrote:
> Has anybody put together a memory-efficient D-implementation of a HashSet
>
> Something like
>
> sparse_hash_set<> contained in
>
> https://github.com/sparsehash/sparsehash
>
> but in D.
D needs a way to use allocators for hash nodes.
In Dcollections, both the performance and memory usage I was able to optimize by using a specialized allocator that allocates in pages and then divides up the pages.
-Steve
|
Copyright © 1999-2021 by the D Language Foundation