September 01, 2014
On Mon, 1 Sep 2014 12:38:52 +0300
ketmar via Digitalmars-d-learn <digitalmars-d-learn@puremagic.com>
wrote:

> i found that slowdown is from _aaRange()
and i'm wrong again. adding `_aaFrontKey()` and `_aaFrontValue()` makes
no difference at all.

that's 'cause both hooks need to go thru bucket array to find first entry. unless we add some ponter to 'first used bucket', getting 'first' key and value will be slow.

so, to make this fast, we need to cache info about 'first used bucket'.

i hacked in VERY simple caching, and... voila: execution time for 'byKey/byValue' variant drops from 3 seconds to 1.5 seconds. seems that it's as far as i can get without hurting AA performance in other cases.


September 01, 2014
On Mon, 01 Sep 2014 09:16:04 +0000
bearophile via Digitalmars-d-learn <digitalmars-d-learn@puremagic.com>
wrote:

> https://issues.dlang.org/show_bug.cgi?id=13410
i added 'quick-hack-patch' to your report which halves execution time without significantly hurting performance in other AA uses.

now there is no difference between `long v = aa.byValue.front;` and `long v = aa[k];`.

yet aa.remove() still hurts badly, and i don't think that we can do any
better with it.


September 01, 2014
V Mon, 1 Sep 2014 12:38:52 +0300
ketmar via Digitalmars-d-learn <digitalmars-d-learn@puremagic.com>
napsáno:

> On Mon, 01 Sep 2014 09:22:50 +0000
> bearophile via Digitalmars-d-learn <digitalmars-d-learn@puremagic.com>
> wrote:
> 
> > In theory the best solution is to improve the performance of the "byKey.front" and "byValue.front" idioms.
> i found that slowdown is from _aaRange(), not from delegates. the following code is slow even w/o delegate creation:
> 
>     import core.stdc.stdio;
> 
>     ref K firstKey(T : V[K], V, K)(T aa) @property
>     {
>         return *cast(K*)_aaRangeFrontKey(_aaRange(cast(void*)aa));
>     }
> 
>     ref V firstVal(T : V[K], V, K)(T aa) @property
>     {
>         return *cast(V*)_aaRangeFrontValue(_aaRange(cast(void*)aa));
>     }
> 
> 
>     int main() {
>         long[int] aa;
>         for (int i = 0; i < 50000; i++)
>             aa[i] = i;
> 
>         long total = 0;
> 
>         while (aa.length) {
>             //int k = aa.byKey.front;
>             int k = aa.firstKey;
>             //long v = aa.byValue.front;
>             long v = aa.firstVal;
>             aa.remove(k);
>             total += k + v;
>         }
> 
>         printf("%lld\n", total);
>         return 0;
>     }
> 
> 
> seems that we need two more hooks for AAs: `_aaFrontKey()` and
> `_aaFrontValue()`.
> 
> > x.pop() sounds nicer :-)
> ah, sure. i'm not very good in inventing names. ;-)
> 

Yep, I found out something similar. with this ugly code

for (i = 0; i < W; ++i) {
		y = null;
		while (x.length) {

			foreach(lj, lp ; x) {
				j = lj;
				p = lp;
				x.remove(cast(int)j);
				break;
			}

			...
	}

The problem is with searching first element

September 01, 2014
On Mon, 1 Sep 2014 12:51:58 +0200
Daniel Kozak via Digitalmars-d-learn
<digitalmars-d-learn@puremagic.com> wrote:

> The problem is with searching first element
yes. AAs aren't very good in "give me some so-called 'first' element", especially current druntime AAs. i'm not sure it worth heavy-fixing though. i sped it up a little (see bugzilla), but it will remain painfully slow, alas. AAs just don't fit in such use cases.


September 01, 2014
On Mon, 01 Sep 2014 09:16:04 +0000
bearophile via Digitalmars-d-learn <digitalmars-d-learn@puremagic.com>
wrote:

> https://issues.dlang.org/show_bug.cgi?id=13410
added another version of the patch. it doesn't significantly improves your original code execution time, but makes syntetic tests runs with the same speed.

aa.remove() is little slower now, but i don't think that it's even noticable. but now we can use AAs in 'take first item and immideately remove it' scenarios without performance hit.


September 01, 2014
The following D code runs over 2x faster than the C++ code (comparing dmd no options to g++ no options.) Its not a fair comparison because it changes the order of operations.

import core.stdc.stdio;

const uint H = 9, W = 12;

const uint[3][6] g = [[7, 0, H - 3],
                      [1 + (1 << H) + (1 << (2 * H)), 0, H - 1],
                      [3 + (1 << H), 0, H - 2],
                      [3 + (2 << H), 0, H - 2],
                      [1 + (1 << H) + (2 << H), 0, H - 2],
                      [1 + (1 << H) + (1 << (H - 1)), 1, H - 1]];

int main() {
    ulong p, i, k;
    ulong[uint] x, y;
    uint l;
    x[0] = 1;

    for (i = 0; i < W; ++i) {
        y = null;
        while (x.length)
        foreach (j; x.keys) {
            p = x[j];
            x.remove(j);

            for (k = 0; k < H; ++k)
                if ((j & (1 << k)) == 0)
                    break;

            if (k == H)
                y[j >> H] += p;
            else
                for (l = 0; l < 6; ++l)
                    if (k >= g[l][1] && k <= g[l][2])
                        if ((j & (g[l][0] << k)) == 0)
                            x[j + (g[l][0] << k)] += p;
        }
        x = y;
    }

    printf("%lld\n", y[0]);
    return 0;
}
September 01, 2014
On Mon, 01 Sep 2014 09:16:04 +0000
bearophile via Digitalmars-d-learn <digitalmars-d-learn@puremagic.com>
wrote:

> https://issues.dlang.org/show_bug.cgi?id=13410
ah, silly me. i should stop publish patches until i rewrote 'em at least three times.

we have a winner now: new patch should not hurt AA performance in normal use cases yet our cases are now significantly faster (3 to 3000 times, yeah! ;-).


October 01, 2014
You're comparing front removal on an ordered vs. an unordered container.
Anyhow at least my C++ lib also caches the first used bucket.
1 2
Next ›   Last »