Jump to page: 1 24  
Page
Thread overview
Refining AA's
Nov 30, 2005
Manfred Nowak
Nov 30, 2005
Chris
Nov 30, 2005
Sean Kelly
Nov 30, 2005
pragma
Nov 30, 2005
Sean Kelly
Dec 02, 2005
Manfred Nowak
Dec 02, 2005
Sean Kelly
Nov 30, 2005
Dave
Dec 02, 2005
Manfred Nowak
Dec 02, 2005
Sean Kelly
Dec 03, 2005
Manfred Nowak
Dec 03, 2005
Sean Kelly
Dec 03, 2005
Manfred Nowak
Dec 03, 2005
Sean Kelly
Dec 03, 2005
Manfred Nowak
Nov 30, 2005
Dave
Dec 02, 2005
Manfred Nowak
Dec 02, 2005
Sean Kelly
Dec 02, 2005
Manfred Nowak
Dec 03, 2005
Sean Kelly
Dec 03, 2005
Manfred Nowak
Dec 03, 2005
Sean Kelly
Dec 02, 2005
Dave
Dec 02, 2005
Manfred Nowak
Dec 03, 2005
Dave
Dec 04, 2005
Manfred Nowak
Dec 04, 2005
Manfred Nowak
Dec 04, 2005
Sean Kelly
Dec 05, 2005
Manfred Nowak
Dec 05, 2005
Sean Kelly
Dec 04, 2005
Dave
Dec 05, 2005
Manfred Nowak
Dec 05, 2005
Dave
Dec 05, 2005
Manfred Nowak
Dec 05, 2005
Sean Kelly
Dec 06, 2005
Dave
Dec 07, 2005
Manfred Nowak
Dec 08, 2005
Oskar Linde
Dec 08, 2005
Manfred Nowak
Dec 08, 2005
Oskar Linde
November 30, 2005
As walter stated, the current implementation is good for general usage.

But what are the parameters for general usage?

I ask because I just implemented a faster version according to the parameters suggested by me.

-manfred
November 30, 2005
On Wed, 30 Nov 2005 08:02:18 +0000 (UTC), Manfred Nowak
<svv1999@hotmail.com> wrote:
>As walter stated, the current implementation is good for general usage.
>
>But what are the parameters for general usage?
>
>I ask because I just implemented a faster version according to the parameters suggested by me.
>
>-manfred

I believe he said his hashtable was good enough for general use, but it also scales up better than most. I don't recall for sure though, and I haven't searched.

Chris
November 30, 2005
Manfred Nowak wrote:
> As walter stated, the current implementation is good for general usage.
> 
> But what are the parameters for general usage?

I would say it's insertion and lookup of datasets of reaonable size (less than perhaps 50MB as a shot in the dark).  For example, I typically use a word occurrence count for AA performance testing, and my "large" dataset is the full text of the bible (as it's the largest non-random text file I could find).


Sean
November 30, 2005
In article <dmknqb$11aa$1@digitaldaemon.com>, Sean Kelly says...
>
>Manfred Nowak wrote:
>> As walter stated, the current implementation is good for general usage.
>> 
>> But what are the parameters for general usage?
>
>I would say it's insertion and lookup of datasets of reaonable size (less than perhaps 50MB as a shot in the dark).  For example, I typically use a word occurrence count for AA performance testing, and my "large" dataset is the full text of the bible (as it's the largest non-random text file I could find).
>

Sean,

On a whim, I tried looking for something bigger. As it turns out, "War and Peace" weighs in at 3.13MB (uncompressed ascii) over at Project Gutenberg:

http://www.gutenberg.org/etext/2600

But it looses out to the complete King James Bible (4.24 MB):

http://www.gutenberg.org/etext/10

Alas, there's no way to search based on the size of an ebook.

I also just learned that they support texts in different languages and encodings.  The have an "online recoding service" that can change the ASCII into UTF-8 or some random ISO codepage and so forth.  Provided their converter is accurate, it could make for some nice test data.

- EricAnderton at yahoo
November 30, 2005
"Manfred Nowak" <svv1999@hotmail.com> wrote in message news:Xns971E5BF112D92svv1999hotmailcom@63.105.9.61...
> As walter stated, the current implementation is good for general usage.
>
> But what are the parameters for general usage?
>
> I ask because I just implemented a faster version according to the parameters suggested by me.
>
> -manfred

Attached are the "Shootout Benchmark" tests involving AA's, both past and present. They've been written to minimize GC thrashing so are a good test of just the AA implementation for common AA's, and generally emphasize different parts of the ABI so as to give a good indication of overall performance, I think.

Plus, by looking at how D currently does compared to other languages on the Shootout site, we could extrapulate how your new implementation looks against the other languages too.

Look over at: http://shootout.alioth.debian.org/ to find out how they are currently run and for input files.

Please post your results too.

(P.S.: v138 and v139 of phobos were apparently compiled without optimizations and/or with debug symbols, so you need to use v140 for a fair comparision).

Thanks,

- Dave



November 30, 2005
"Sean Kelly" <sean@f4.ca> wrote in message news:dmknqb$11aa$1@digitaldaemon.com...
> Manfred Nowak wrote:
>> As walter stated, the current implementation is good for general usage.
>>
>> But what are the parameters for general usage?
>
> I would say it's insertion and lookup of datasets of reaonable size (less than perhaps 50MB as a shot in the dark).  For example, I typically use a word occurrence count for AA performance testing, and my "large" dataset is the full text of the bible (as it's the largest non-random text file I could find).
>
>
> Sean

That's a good one as a 'stress test', IMO.

FWIW, the way I generally use them is at most 100 or so insertions followed by lots and lots of lookups and/or value updates. That would be a good "common case" test, maybe, and also stress the most important part - lookups (since obviously lookups are used for insertions and updates too) and leave the GC out of the results.


November 30, 2005
pragma wrote:
> 
> On a whim, I tried looking for something bigger. As it turns out, "War and
> Peace" weighs in at 3.13MB (uncompressed ascii) over at Project Gutenberg:
> 
> http://www.gutenberg.org/etext/2600
> 
> But it looses out to the complete King James Bible (4.24 MB):
> 
> http://www.gutenberg.org/etext/10
> 
> Alas, there's no way to search based on the size of an ebook.
> 
> I also just learned that they support texts in different languages and
> encodings.  The have an "online recoding service" that can change the ASCII
> into UTF-8 or some random ISO codepage and so forth.  Provided their converter
> is accurate, it could make for some nice test data.

Yeah, the King James Bible from Project Gutenberg is my standard test input :-)  Interesting about the UTF conversion though.  I hadn't realized they offered that feature.


Sean
December 02, 2005
Dave wrote:

[...]
> Please post your results too.

Preliminary results after some tuning of the parameters of my scalable implementation:

size	ins    	acc    	t.W    	t.me    	gain
4	16	    	9999984	1338    	1137    	-15,02%
7	130	    	9999870	1345    	1136		-15,54%
10	1000	    	9999000	1395    	1147		-17,78%
14	16000	    	9984000	1554    	1140		-26,64%
17	130000	9870000	2092    	1256		-39,96%
20	970000	9030000	3354    	1763		-47,44%
22	2400000	7600000	7367    	2183		-70,37%
24	4900000	5100000	21227    	3282		-84,54%
26	8000000	2000000	58396    	15691		-73,13%
27	8900000	1100000	108875	51929		-52,30%
28	9400000	600000	112160	193036	 72,11%*

legend:
size:= binary logarithm of the size of the abstract problem
ins:= number of insertions into the AA
acc:= number of accesses to elements of the AA
t.W:= total time in ms  needed by Walters implementation
t.me:= total time in ms needed by my implementation
gain:= relative time gain after subtracting 770 ms, which is the
time needed with no accesses to AA's
*:= data insufficient

comments:

1. My implementation uses more space than Walters. Therefore the time gain starts to stall at about 5M elements inserted and it collapses when about 9M elements are inserted, based on 2GB RAM of my machine.

2. One try with an extreme setting of the parameters of the implementation gave the hint, that a time reduction to about 15% of Walters implementation up to 5M elements might be possible, but dmd needed nearly an hour to produce the executable for that one try :-(

3. According to the answers in the thread my implementation currently is not suitable for general use, because the abstract problem I used was restricted to fixed length keys. I am experimenting with an appropriate extension.

-manfred
December 02, 2005
Sean Kelly wrote:

[...]
> (as it's the largest non-random text file I could
> find).

I do not understand, why randomness is not suitable for general usage.

-manfred
December 02, 2005
Sean Kelly wrote:

[...]
> the King James Bible from Project Gutenberg is my standard test input
[...]

Standard wc-example yields about 100 ms spent for the operations in the AA for the King James Bible. What are your criteria for performance, if this is indeed the largest possible test data?

-manfred
« First   ‹ Prev
1 2 3 4