View mode: basic / threaded / horizontal-split · Log in · Help
November 30, 2005
Refining AA's
As walter stated, the current implementation is good for general 
usage.

But what are the parameters for general usage?

I ask because I just implemented a faster version according to the 
parameters suggested by me.

-manfred
November 30, 2005
Re: Refining AA's
On Wed, 30 Nov 2005 08:02:18 +0000 (UTC), Manfred Nowak
<svv1999@hotmail.com> wrote:
>As walter stated, the current implementation is good for general 
>usage.
>
>But what are the parameters for general usage?
>
>I ask because I just implemented a faster version according to the 
>parameters suggested by me.
>
>-manfred

I believe he said his hashtable was good enough for general use, but
it also scales up better than most. I don't recall for sure though,
and I haven't searched.

Chris
November 30, 2005
Re: Refining AA's
Manfred Nowak wrote:
> As walter stated, the current implementation is good for general 
> usage.
> 
> But what are the parameters for general usage?

I would say it's insertion and lookup of datasets of reaonable size 
(less than perhaps 50MB as a shot in the dark).  For example, I 
typically use a word occurrence count for AA performance testing, and my 
"large" dataset is the full text of the bible (as it's the largest 
non-random text file I could find).


Sean
November 30, 2005
Re: Refining AA's
In article <dmknqb$11aa$1@digitaldaemon.com>, Sean Kelly says...
>
>Manfred Nowak wrote:
>> As walter stated, the current implementation is good for general 
>> usage.
>> 
>> But what are the parameters for general usage?
>
>I would say it's insertion and lookup of datasets of reaonable size 
>(less than perhaps 50MB as a shot in the dark).  For example, I 
>typically use a word occurrence count for AA performance testing, and my 
>"large" dataset is the full text of the bible (as it's the largest 
>non-random text file I could find).
>

Sean,

On a whim, I tried looking for something bigger. As it turns out, "War and
Peace" weighs in at 3.13MB (uncompressed ascii) over at Project Gutenberg:

http://www.gutenberg.org/etext/2600

But it looses out to the complete King James Bible (4.24 MB):

http://www.gutenberg.org/etext/10

Alas, there's no way to search based on the size of an ebook.

I also just learned that they support texts in different languages and
encodings.  The have an "online recoding service" that can change the ASCII
into UTF-8 or some random ISO codepage and so forth.  Provided their converter
is accurate, it could make for some nice test data.

- EricAnderton at yahoo
November 30, 2005
Re: Refining AA's
"Manfred Nowak" <svv1999@hotmail.com> wrote in message 
news:Xns971E5BF112D92svv1999hotmailcom@63.105.9.61...
> As walter stated, the current implementation is good for general
> usage.
>
> But what are the parameters for general usage?
>
> I ask because I just implemented a faster version according to the
> parameters suggested by me.
>
> -manfred

Attached are the "Shootout Benchmark" tests involving AA's, both past and 
present. They've been written to minimize GC thrashing so are a good test of 
just the AA implementation for common AA's, and generally emphasize 
different parts of the ABI so as to give a good indication of overall 
performance, I think.

Plus, by looking at how D currently does compared to other languages on the 
Shootout site, we could extrapulate how your new implementation looks 
against the other languages too.

Look over at: http://shootout.alioth.debian.org/ to find out how they are 
currently run and for input files.

Please post your results too.

(P.S.: v138 and v139 of phobos were apparently compiled without 
optimizations and/or with debug symbols, so you need to use v140 for a fair 
comparision).

Thanks,

- Dave
November 30, 2005
Re: Refining AA's
"Sean Kelly" <sean@f4.ca> wrote in message 
news:dmknqb$11aa$1@digitaldaemon.com...
> Manfred Nowak wrote:
>> As walter stated, the current implementation is good for general usage.
>>
>> But what are the parameters for general usage?
>
> I would say it's insertion and lookup of datasets of reaonable size (less 
> than perhaps 50MB as a shot in the dark).  For example, I typically use a 
> word occurrence count for AA performance testing, and my "large" dataset 
> is the full text of the bible (as it's the largest non-random text file I 
> could find).
>
>
> Sean

That's a good one as a 'stress test', IMO.

FWIW, the way I generally use them is at most 100 or so insertions followed 
by lots and lots of lookups and/or value updates. That would be a good 
"common case" test, maybe, and also stress the most important part - lookups 
(since obviously lookups are used for insertions and updates too) and leave 
the GC out of the results.
November 30, 2005
Re: Refining AA's
pragma wrote:
> 
> On a whim, I tried looking for something bigger. As it turns out, "War and
> Peace" weighs in at 3.13MB (uncompressed ascii) over at Project Gutenberg:
> 
> http://www.gutenberg.org/etext/2600
> 
> But it looses out to the complete King James Bible (4.24 MB):
> 
> http://www.gutenberg.org/etext/10
> 
> Alas, there's no way to search based on the size of an ebook.
> 
> I also just learned that they support texts in different languages and
> encodings.  The have an "online recoding service" that can change the ASCII
> into UTF-8 or some random ISO codepage and so forth.  Provided their converter
> is accurate, it could make for some nice test data.

Yeah, the King James Bible from Project Gutenberg is my standard test 
input :-)  Interesting about the UTF conversion though.  I hadn't 
realized they offered that feature.


Sean
December 02, 2005
Re: Refining AA's
Dave wrote:

[...]
> Please post your results too.

Preliminary results after some tuning of the parameters of my 
scalable implementation:

size	ins    	acc    	t.W    	t.me    	gain	
4	16	    	9999984	1338    	1137    	-15,02%	
7	130	    	9999870	1345    	1136		-15,54%	
10	1000	    	9999000	1395    	1147		-17,78%	
14	16000	    	9984000	1554    	1140		-26,64%	
17	130000	9870000	2092    	1256		-39,96%	
20	970000	9030000	3354    	1763		-47,44%	
22	2400000	7600000	7367    	2183		-70,37%	
24	4900000	5100000	21227    	3282		-84,54%	
26	8000000	2000000	58396    	15691		-73,13%	
27	8900000	1100000	108875	51929		-52,30%	
28	9400000	600000	112160	193036	 72,11%*

legend:
size:= binary logarithm of the size of the abstract problem
ins:= number of insertions into the AA
acc:= number of accesses to elements of the AA
t.W:= total time in ms  needed by Walters implementation
t.me:= total time in ms needed by my implementation
gain:= relative time gain after subtracting 770 ms, which is the 
time needed with no accesses to AA's
*:= data insufficient

comments:

1. My implementation uses more space than Walters. Therefore the 
time gain starts to stall at about 5M elements inserted and it 
collapses when about 9M elements are inserted, based on 2GB RAM of 
my machine.

2. One try with an extreme setting of the parameters of the 
implementation gave the hint, that a time reduction to about 15% of 
Walters implementation up to 5M elements might be possible, but dmd 
needed nearly an hour to produce the executable for that one try
:-(

3. According to the answers in the thread my implementation 
currently is not suitable for general use, because the abstract 
problem I used was restricted to fixed length keys. I am 
experimenting with an appropriate extension.

-manfred
December 02, 2005
Re: Refining AA's
Sean Kelly wrote:

[...]
> (as it's the largest non-random text file I could
> find). 

I do not understand, why randomness is not suitable for general 
usage.

-manfred
December 02, 2005
Re: Refining AA's
Sean Kelly wrote:

[...]
> the King James Bible from Project Gutenberg is my standard
> test input
[...]

Standard wc-example yields about 100 ms spent for the operations in 
the AA for the King James Bible. What are your criteria for 
performance, if this is indeed the largest possible test data?

-manfred
« First   ‹ Prev
1 2 3 4
Top | Discussion index | About this forum | D home