Thread overview | ||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
November 30, 2005 Refining AA's | ||||
---|---|---|---|---|
| ||||
As walter stated, the current implementation is good for general usage. But what are the parameters for general usage? I ask because I just implemented a faster version according to the parameters suggested by me. -manfred |
November 30, 2005 Re: Refining AA's | ||||
---|---|---|---|---|
| ||||
Posted in reply to Manfred Nowak | On Wed, 30 Nov 2005 08:02:18 +0000 (UTC), Manfred Nowak
<svv1999@hotmail.com> wrote:
>As walter stated, the current implementation is good for general usage.
>
>But what are the parameters for general usage?
>
>I ask because I just implemented a faster version according to the parameters suggested by me.
>
>-manfred
I believe he said his hashtable was good enough for general use, but it also scales up better than most. I don't recall for sure though, and I haven't searched.
Chris
|
November 30, 2005 Re: Refining AA's | ||||
---|---|---|---|---|
| ||||
Posted in reply to Manfred Nowak | Manfred Nowak wrote:
> As walter stated, the current implementation is good for general usage.
>
> But what are the parameters for general usage?
I would say it's insertion and lookup of datasets of reaonable size (less than perhaps 50MB as a shot in the dark). For example, I typically use a word occurrence count for AA performance testing, and my "large" dataset is the full text of the bible (as it's the largest non-random text file I could find).
Sean
|
November 30, 2005 Re: Refining AA's | ||||
---|---|---|---|---|
| ||||
Posted in reply to Sean Kelly | In article <dmknqb$11aa$1@digitaldaemon.com>, Sean Kelly says... > >Manfred Nowak wrote: >> As walter stated, the current implementation is good for general usage. >> >> But what are the parameters for general usage? > >I would say it's insertion and lookup of datasets of reaonable size (less than perhaps 50MB as a shot in the dark). For example, I typically use a word occurrence count for AA performance testing, and my "large" dataset is the full text of the bible (as it's the largest non-random text file I could find). > Sean, On a whim, I tried looking for something bigger. As it turns out, "War and Peace" weighs in at 3.13MB (uncompressed ascii) over at Project Gutenberg: http://www.gutenberg.org/etext/2600 But it looses out to the complete King James Bible (4.24 MB): http://www.gutenberg.org/etext/10 Alas, there's no way to search based on the size of an ebook. I also just learned that they support texts in different languages and encodings. The have an "online recoding service" that can change the ASCII into UTF-8 or some random ISO codepage and so forth. Provided their converter is accurate, it could make for some nice test data. - EricAnderton at yahoo |
November 30, 2005 Re: Refining AA's | ||||
---|---|---|---|---|
| ||||
Posted in reply to Manfred Nowak Attachments: | "Manfred Nowak" <svv1999@hotmail.com> wrote in message news:Xns971E5BF112D92svv1999hotmailcom@63.105.9.61... > As walter stated, the current implementation is good for general usage. > > But what are the parameters for general usage? > > I ask because I just implemented a faster version according to the parameters suggested by me. > > -manfred Attached are the "Shootout Benchmark" tests involving AA's, both past and present. They've been written to minimize GC thrashing so are a good test of just the AA implementation for common AA's, and generally emphasize different parts of the ABI so as to give a good indication of overall performance, I think. Plus, by looking at how D currently does compared to other languages on the Shootout site, we could extrapulate how your new implementation looks against the other languages too. Look over at: http://shootout.alioth.debian.org/ to find out how they are currently run and for input files. Please post your results too. (P.S.: v138 and v139 of phobos were apparently compiled without optimizations and/or with debug symbols, so you need to use v140 for a fair comparision). Thanks, - Dave |
November 30, 2005 Re: Refining AA's | ||||
---|---|---|---|---|
| ||||
Posted in reply to Sean Kelly | "Sean Kelly" <sean@f4.ca> wrote in message news:dmknqb$11aa$1@digitaldaemon.com... > Manfred Nowak wrote: >> As walter stated, the current implementation is good for general usage. >> >> But what are the parameters for general usage? > > I would say it's insertion and lookup of datasets of reaonable size (less than perhaps 50MB as a shot in the dark). For example, I typically use a word occurrence count for AA performance testing, and my "large" dataset is the full text of the bible (as it's the largest non-random text file I could find). > > > Sean That's a good one as a 'stress test', IMO. FWIW, the way I generally use them is at most 100 or so insertions followed by lots and lots of lookups and/or value updates. That would be a good "common case" test, maybe, and also stress the most important part - lookups (since obviously lookups are used for insertions and updates too) and leave the GC out of the results. |
November 30, 2005 Re: Refining AA's | ||||
---|---|---|---|---|
| ||||
Posted in reply to pragma | pragma wrote:
>
> On a whim, I tried looking for something bigger. As it turns out, "War and
> Peace" weighs in at 3.13MB (uncompressed ascii) over at Project Gutenberg:
>
> http://www.gutenberg.org/etext/2600
>
> But it looses out to the complete King James Bible (4.24 MB):
>
> http://www.gutenberg.org/etext/10
>
> Alas, there's no way to search based on the size of an ebook.
>
> I also just learned that they support texts in different languages and
> encodings. The have an "online recoding service" that can change the ASCII
> into UTF-8 or some random ISO codepage and so forth. Provided their converter
> is accurate, it could make for some nice test data.
Yeah, the King James Bible from Project Gutenberg is my standard test input :-) Interesting about the UTF conversion though. I hadn't realized they offered that feature.
Sean
|
December 02, 2005 Re: Refining AA's | ||||
---|---|---|---|---|
| ||||
Posted in reply to Dave | Dave wrote:
[...]
> Please post your results too.
Preliminary results after some tuning of the parameters of my scalable implementation:
size ins acc t.W t.me gain
4 16 9999984 1338 1137 -15,02%
7 130 9999870 1345 1136 -15,54%
10 1000 9999000 1395 1147 -17,78%
14 16000 9984000 1554 1140 -26,64%
17 130000 9870000 2092 1256 -39,96%
20 970000 9030000 3354 1763 -47,44%
22 2400000 7600000 7367 2183 -70,37%
24 4900000 5100000 21227 3282 -84,54%
26 8000000 2000000 58396 15691 -73,13%
27 8900000 1100000 108875 51929 -52,30%
28 9400000 600000 112160 193036 72,11%*
legend:
size:= binary logarithm of the size of the abstract problem
ins:= number of insertions into the AA
acc:= number of accesses to elements of the AA
t.W:= total time in ms needed by Walters implementation
t.me:= total time in ms needed by my implementation
gain:= relative time gain after subtracting 770 ms, which is the
time needed with no accesses to AA's
*:= data insufficient
comments:
1. My implementation uses more space than Walters. Therefore the time gain starts to stall at about 5M elements inserted and it collapses when about 9M elements are inserted, based on 2GB RAM of my machine.
2. One try with an extreme setting of the parameters of the implementation gave the hint, that a time reduction to about 15% of Walters implementation up to 5M elements might be possible, but dmd needed nearly an hour to produce the executable for that one try :-(
3. According to the answers in the thread my implementation currently is not suitable for general use, because the abstract problem I used was restricted to fixed length keys. I am experimenting with an appropriate extension.
-manfred
|
December 02, 2005 Re: Refining AA's | ||||
---|---|---|---|---|
| ||||
Posted in reply to Sean Kelly | Sean Kelly wrote:
[...]
> (as it's the largest non-random text file I could
> find).
I do not understand, why randomness is not suitable for general usage.
-manfred
|
December 02, 2005 Re: Refining AA's | ||||
---|---|---|---|---|
| ||||
Posted in reply to Sean Kelly | Sean Kelly wrote:
[...]
> the King James Bible from Project Gutenberg is my standard test input
[...]
Standard wc-example yields about 100 ms spent for the operations in the AA for the King James Bible. What are your criteria for performance, if this is indeed the largest possible test data?
-manfred
|
Copyright © 1999-2021 by the D Language Foundation