December 03, 2005
Manfred Nowak wrote:
> Sean Kelly wrote:
> 
> [...]
>>> why word counting any example
>>> from a natural language could be a measure for the performance
>>> of an AA in the general case.
>> Only because it's an example of typical use, so it seems a
>> reasonable metric to use.
> 
> What other examples of typical usage do you know and how should the measures taken for those example be weighted?

I suppose I should have qualified that.  It's an example of how *I* typically use hashtables.  For an informal test, I'd rather have one that was meaningful to me.


Sean
December 03, 2005
Sean Kelly wrote:

[...]
> It's an example of how *I* typically use hashtables.
[...]

Thx.

You suggest a word count for a text taken from a natural language.
Dave suggests some examples from the shootout.
I suggest an abstract sizeable problem.
Walter surely has some more examples.


Then how should I build up a scenario consisting of examples individuals personally prefer and weight them to get performance differences most agree upon?

-manfred
December 03, 2005
Manfred Nowak wrote:
> Sean Kelly wrote:
> 
> [...]
>> If you have a succinct test example I could run
>> it on I'll give it a shot, but it'll have a bit of an unfair
>> advantage until I get around to finishing it.
> 
> I dont understand you fully. If you mean to give your implemetation a try on my framweork, here it is:

That's what I was asking for :-)  I'll try to produce a C++ equivalent this weekend.  And for what it's worth, I mocked up a quick word-count test for std::map vs. my AA implementation.  Here are the results of a run on a 5+ meg text file (the KJ Bible).  Assume my unordered_map implementation is fixed at using 4097 hash buckets:

Testing std::map...
Execution Time: 1.752000
Unique Words: 61918
Page Faults: 607
Memory Used: 2490368

Testing unordered_map...
Execution Time: 0.621000
Unique Words: 61918
Min Bucket Size: 3
Max Bucket Size: 34
Average Bucket Size: 15
Page Faults: 338
Memory Used: 1384448

Not bad at almost 1/3 the time of std::map, and it would be better if I added more buckets.  For what it's worth, the main test program is available here:

http://home.f4.ca/sean/d/aa_test.cpp


Sean
December 03, 2005
Sean Kelly wrote:

[...]
> Not bad at almost 1/3 the time of std::map
[...]

The test looks biased because the burden of allocating memory to the process might lay on the call for std::map.

-manfred
December 03, 2005
Manfred Nowak wrote:
> Sean Kelly wrote:
> 
> [...]
>> Not bad at almost 1/3 the time of std::map
> [...]
> 
> The test looks biased because the burden of allocating memory to the process might lay on the call for std::map.

I just tried reversing the tests so the std::map test is performed second, and the results were the same.  But as I said earlier, my AA implementation has a somewhat unfair advantage in that it doesn't rehash automatically.


Sean
December 04, 2005
Manfred Nowak wrote:

[...]
> Preliminary results after some tuning of the parameters of my scalable implementation:

Some more results for using variable sized keys:

P.size Ins Acc t.W t.me Cal                 Gain
4 16 9999984 4873 4773 3495               -7,26%
5 32 4902 5032 3646                       10,35%
6 64 5098 5174 3775                        5,74%
7 130 9999870 5237 5410 4000              13,99%
10 1000 9999000 5897 6065 4477            11,83%
11 2000 6271 6398 4733                     8,26%
12 4100 6429 6502 4827                     4,56%
13 8200 6774 6821 5051                     2,73%
14 16000 9984000 7225 7139 5279           -4,42%
17 130000 9870000 13056 8606 5844        -61,70%
20 970000 9030000 17572 10993 6300       -58,37%
22 2400000 7600000 44757 13709 6517      -81,19%
24 4900000 5100000 138197 30409 7011     -82,16%
26 8000000 2000000 245284 165341 7275    -33,59%


As one can see, this implementation is up to 10000 elements up to 15% slower and collapses at about 8,200,000 elements.

In the range from 20,000 to 8,000,000 elements it is significantly faster.

How to interpret this outcome if fine tuning will not bring any advance in the range up to 10,000 elements?

-manfred
December 04, 2005
Manfred Nowak wrote:
> Preliminary results after some tuning of the parameters of my scalable implementation:

After fixing some bugs the gain table looks like this:

size	Gain
4	-31,57%
5	-28,42%
6	-29,10%
7	-14,63%
10	-14,15%
11	-22,89%
12	-24,09%
13	-22,29%
14	-24,36%
17	-68,07%
20	-61,68%
22	-82,17%
24	-82,36%
26	-33,69%

An additional problem is left for discussion:

Does it suffice to rely on the swapping algorithm of the underlaying OS or is it the role of the language to provide a solution when memory gets tight?

Usually out-of-memory errors and the like are thrown, but this forces every implementator to not trust the built-in language features and build a custom implementation.

-manfred
December 04, 2005
Manfred Nowak wrote:
> 
> An additional problem is left for discussion:
> 
> Does it suffice to rely on the swapping algorithm of the underlaying OS or is it the role of the language to provide a solution when memory gets tight?

Language-level functionality should provide the basic or weak exception guarantees, depending on context.  Frankly, so should library code.  I ran into this problem with my unordered_map implementation--the rehash function is required to provide the weak guarantee (that the contents of the table will be unchanged if an exception is thrown) and since I was relying on using a std::list to hold all the table elements, this meant that I would have to temporarily double memory usage by creating copies of all stored values to ensure I could fall back on the original list if an insertion failed.  The alternate would be to use a custom linked-list where I could manipulate node pointers directly and therefore avoid ctor  calls for creating/copying stored objects.

> Usually out-of-memory errors and the like are thrown, but this forces every implementator to not trust the built-in language features and build a custom implementation. 

See above.  I consider this to be a documentation issue, as there should be explicit guarantees about data structure stability.  However, since D objects are accessed via handles, the only issue should be OutOfMemory exceptions, and it's generally not too hard to preallocate needed memory  before modifying data.


Sean
December 04, 2005
In article <Xns9722ABBE9EA2Fsvv1999hotmailcom@63.105.9.61>, Manfred Nowak says...
>
>Manfred Nowak wrote:
>> Preliminary results after some tuning of the parameters of my scalable implementation:
>
>After fixing some bugs the gain table looks like this:
>

Fixed key length like the 1st results, or variable like the second? If variable, what does the set look like? Is it a 'word frequency' test similiar to Sean's test?

How about some numbers on the shootout benchmarks - Is it too time consuming to run those? No need to instrument them to isolate just the AA operations because it's important and meaningful to see how your implementation runs in the context of the other runtime overhead as well (you mentioned that it consumes quite a bit more memory than the current AA impl).

The code was posted publically earlier so everyone (or is it just you, me and Sean interested in this?) in this group could look at the numbers *and* the code and get a better idea of exactly what the benefits of your implementation are for different sets.

>size	Gain
>4	-31,57%
>5	-28,42%
>6	-29,10%
>7	-14,63%
>10	-14,15%
>11	-22,89%
>12	-24,09%
>13	-22,29%
>14	-24,36%
>17	-68,07%
>20	-61,68%
>22	-82,17%
>24	-82,36%
>26	-33,69%
>
>An additional problem is left for discussion:
>
>Does it suffice to rely on the swapping algorithm of the underlaying OS or is it the role of the language to provide a solution when memory gets tight?
>
>Usually out-of-memory errors and the like are thrown, but this forces every implementator to not trust the built-in language features and build a custom implementation.
>

Does your impl. rely on the GC now? If so, I'd say you can leave these concerns to the language and GC implementors, which would be consistent with the rest of the D RT lib.

>-manfred


December 05, 2005
Dave wrote:

[...]
> Fixed key length like the 1st results, or variable like the second? If variable, what does the set look like? Is it a 'word frequency' test similiar to Sean's test?

Variable length keys.

The framework used is posted here: http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/30849

For the variable length keys test I replaced the assignments

!    r= rand();
!    aaa[ (r)%tsmall]= i;

with

!    r= rand();
!    sprintf( buf, toStringz("%d"), r%tsmall);
!    aaa[ toString( buf)]= i;

and then adjusted the other calls and declarations.

The results of the framework show, that the underlying abstract problem, which is choosen without any respect to a particular implementation, smoothes very well from mostly accesses to mostly insertions.


> How about some numbers on the shootout benchmarks - Is it too time consuming to run those? No need to instrument them to isolate just the AA operations because it's important and meaningful to see how your implementation runs in the context of the other runtime overhead as well (you mentioned that it consumes quite a bit more memory than the current AA impl).

I will do shootout benchmarks next. The test runs show, that the
average memory consumption increases by about factor 7 while the time
consumption decreases up to divisor 5. Therefore even under the
optimization criteria S*T**2 it seems to be a gain ;-)

[...]
> Does your impl. rely on the GC now? If so, I'd say you can leave these concerns to the language and GC implementors, which would be consistent with the rest of the D RT lib.

I do not understand this remark. Are you talking about management of the real memory whilst I was asking for the management of the virtual memory?

Hmmmm... AA's are the core of management of the virtual memory, aren't they?

-manfred