December 27, 2013
Am 27.12.2013 19:25, schrieb Daniel Kozak:
> using OrderedAA improve speed 3x
> https://github.com/Kozzi11/Trash/tree/master/util
>

A possible downside of this implementation is though, that due to the fact that you are using a double linked list per index, there will be more chache misses during a read operation compared to a linear probing hashmap. Did you try using a array per index instead of a linked list, and measure if that makes any difference?

Kind Regards
Benjamin Thaut
December 27, 2013
On Friday, 27 December 2013 at 17:26:42 UTC, Benjamin Thaut wrote:
> Also, gordon whats is keeping you from converting the textual representation to a binary one? I mean, if you need to read that file in often, you can easly preprocess it into a binary blob. You could even modify my hashmap implementation to support binary serialization, by just writing the entire array to a binary file and loading it again.
> That would be fast...

This is intended to be a general-purpose CLI program to handle certain text files (from other sources), not a in-house application where I can restructure my data.

Text files are still the most ubiquitous way to share data (especially without limiting the languages/tools other will use to process them).

But out of curiosity - how would you serialize it?
December 27, 2013
Am 27.12.2013 20:55, schrieb Gordon:
> On Friday, 27 December 2013 at 17:26:42 UTC, Benjamin Thaut wrote:
>> Also, gordon whats is keeping you from converting the textual
>> representation to a binary one? I mean, if you need to read that file
>> in often, you can easly preprocess it into a binary blob. You could
>> even modify my hashmap implementation to support binary serialization,
>> by just writing the entire array to a binary file and loading it again.
>> That would be fast...
>
> This is intended to be a general-purpose CLI program to handle certain
> text files (from other sources), not a in-house application where I can
> restructure my data.
>
> Text files are still the most ubiquitous way to share data (especially
> without limiting the languages/tools other will use to process them).
>
> But out of curiosity - how would you serialize it?

Given my hashmap implementation I would do something like

// method of hashmap
void serializeTo(FILE* f)
{
  fwrite(&m_FullCount, typeof(m_FullCount).sizeof, 1, f);
  ulong len = m_Data.length;
  fwrite(&len, typeof(len).sizeof, 1, f);
  fwrite(m_Data.ptr, Pair.sizeof, len, f);
}

// construct from file
void this(FILE* f)
{
  fread(&m_FullCount, typeof(m_FullCount).sizeof, 1, f);
  ulong len;
  fread(&len, typeof(len).sizeof, 1, f);
  m_Data = (cast(Pair*)malloc(Pair.sizeof * len))[0..len];
  fread(m_Data.ptr, Pair.sizeof, len, f);
}

The deserialization would be just one allocation and a big binary read from a file, which should be quite fast.

Kind Regards
Benjamin Thaut
December 28, 2013
Because I was always curious to do some hashmap profiling with real data, I did some more. Here are the results:

My implementation. Power of two size (25% free):
building hashmap: 8 seconds 28880605 ticks
looking entries up: 0 seconds 31685 ticks

My implementation. Prime number size (25% free):
building hashmap: 8 seconds 26802252 ticks
looking entries up: 0 seconds 27719 ticks

My implementation. Prime number size (10% free):
building hashmap: 8 seconds 29323952 ticks
looking entries up: 0 seconds 32768 ticks

My implementation. Prime number size (20% free):
26877474 entries
building hashmap: 8 seconds 28895211 ticks
sum 74128
looking entries up: 0 seconds 33222 ticks

My implementation. Prime number size (50% free):
building hashmap: 8 seconds 27738475 ticks
looking entries up: 0 seconds 25696 ticks

OrderedAA (implementation of Daniel Kozak):
building array: 13 seconds 45663219 ticks
lookup: 0 seconds 236323 ticks

Builtin AA:
building array: 14 seconds 47175035 ticks
lookup: 0 seconds 33692 ticks


You can see, that both my implementation and daniel kozaks outperform the builtin AA when filling the AA. Both my and daniel kozaks version preallocate the internal array. Although daniel kozaks version does one additional allocation per entry to build the double linked lists.
When not preallocating my implementation still outperforms the builtin AA.

When looking up data, my implementation outperforms both other implementations. My implementation is only slightly faster then the builtin one. Daniel Kozaks version is 8 times slower during lookup compared to my implementation. I think this is mostly due to cache misses caused by the linked list storage of the entries.
It is also interresting to note, that using prime number sizes for the internal array of the AA improved the lookup performance slightly in my implementation. A certain portion of all entries in the internal array are always kept free. Reducing this amount leads to worse performance during looup, increasing it, gives better performance. You can gain a few percent speed if you are willing to waste 50% memory. My measurements show that 25% free entries seem to be the sweet spot.
December 30, 2013
On Saturday, 28 December 2013 at 20:20:36 UTC, Benjamin Thaut wrote:
> Because I was always curious to do some hashmap profiling with real data, I did some more. Here are the results:
>
> My implementation. Power of two size (25% free):
> building hashmap: 8 seconds 28880605 ticks
> looking entries up: 0 seconds 31685 ticks
>
> My implementation. Prime number size (25% free):
> building hashmap: 8 seconds 26802252 ticks
> looking entries up: 0 seconds 27719 ticks
>
> My implementation. Prime number size (10% free):
> building hashmap: 8 seconds 29323952 ticks
> looking entries up: 0 seconds 32768 ticks
>
> My implementation. Prime number size (20% free):
> 26877474 entries
> building hashmap: 8 seconds 28895211 ticks
> sum 74128
> looking entries up: 0 seconds 33222 ticks
>
> My implementation. Prime number size (50% free):
> building hashmap: 8 seconds 27738475 ticks
> looking entries up: 0 seconds 25696 ticks
>
> OrderedAA (implementation of Daniel Kozak):
> building array: 13 seconds 45663219 ticks
> lookup: 0 seconds 236323 ticks
>
> Builtin AA:
> building array: 14 seconds 47175035 ticks
> lookup: 0 seconds 33692 ticks
>
>
> You can see, that both my implementation and daniel kozaks outperform the builtin AA when filling the AA. Both my and daniel kozaks version preallocate the internal array. Although daniel kozaks version does one additional allocation per entry to build the double linked lists.
> When not preallocating my implementation still outperforms the builtin AA.
>
> When looking up data, my implementation outperforms both other implementations. My implementation is only slightly faster then the builtin one. Daniel Kozaks version is 8 times slower during lookup compared to my implementation. I think this is mostly due to cache misses caused by the linked list storage of the entries.
> It is also interresting to note, that using prime number sizes for the internal array of the AA improved the lookup performance slightly in my implementation. A certain portion of all entries in the internal array are always kept free. Reducing this amount leads to worse performance during looup, increasing it, gives better performance. You can gain a few percent speed if you are willing to waste 50% memory. My measurements show that 25% free entries seem to be the sweet spot.

This is cool!

Can you post somewhere your code and data which you use for this test? I really like to test it on my new AA implementation :)
December 30, 2013
Am 30.12.2013 15:06, schrieb Daniel Kozak:
>
> This is cool!
>
> Can you post somewhere your code and data which you use for this test? I
> really like to test it on my new AA implementation :)

Here is the post that describes how to get the data: http://forum.dlang.org/post/yglwnmawrvbhpszdsiqs@forum.dlang.org

And here the post how to convert it to tabular text data:
http://forum.dlang.org/post/rhyycuwlxsoqbtpzpgzd@forum.dlang.org

After dumping the "relationship" table from mysql I just removed the first column because it only contains the ID of the entry. I then used the remaining two columns for profiling. You will have to get the data yourself, as I can't repost it here (because its not my data).

Here is the post that describes my implementation:
http://forum.dlang.org/post/l9kcm2$2i4i$1@digitalmars.com

For profiling the lookups, I generated 100_000 random numbers in the range range (0, 200_000] and saved them to a file. I then profile the lookup of those 100_000 random numbers. Out of all these numbers rughly 75% are valid indices and the rest are invalid indices to profile both the lookup of valid and invalid keys.

For profiling your implementation I just used the code you posted.

Kind Regards
Benjamin Thaut
1 2 3 4
Next ›   Last »