Robustness of the built in associative array (page 2)

In article <e06hpc$vb6$1@digitaldaemon.com>, Dave says... > >Just curious - what portion of the perf. diff. was that? I don't remember exactly. I did a btree implementation, but when I saw that it was slightly slower than the (simpler) list implementation (and definitely not faster), I just scrapped it. Note that from a theoretical view point, the btrees only make sense when the average bucket size is larger than 2. This practically only happens when the lookup table is too small compared to the number of elements, or when you use a bad hash function. Both probably will occur "in the wild", though. The DMD AA should keep the btrees. I knew my data beforehand, so I didn't need them. >What data types? The speed tests were done with char[] keys and int values. > And in your estimation what accounts for the rest of the diff. For lookups, hard to say. I found that using a simple power-of-two table size and a bitmask for lookups in was just as good as messing about with prime numbers, but only gave a miniscule performance increase. Perhaps templated code just means less overhead. In any case, the difference was very small, and the builtin AA is very fast. Inserts, however, are a different story. The builting AA starts with a small table size (97), and resizes (rehashes) at regular intervals as you insert elements. Being able to specify the approximate number of elements _in advance_ (thus avoiding the rehashes) gave me a 3x-5x performance increase when inserting. >(e.g. are you pre-allocating a bunch of space for the data (as opposed to the > key) or not using the GC)? Nope, the default allocater is the GC, and one new node struct is new'ed for each insert. (But with a malloc or region allocator it becomes much faster.) Nick

In article <e08fi6$ouf$1@digitaldaemon.com>, Nick says... > >In article <e06hpc$vb6$1@digitaldaemon.com>, Dave says... >> >>Just curious - what portion of the perf. diff. was that? > <snip, see: http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/35954> > >Nick Thanks for that info.

"Oskar Linde" <oskar.lindeREM@OVEgmail.com> wrote in message news:e00v0m$te2$3@digitaldaemon.com... > It turns out it is neither. It is definitely not a value type. The above function will in most cases alter the original associative array and work as expected. But it is not a reference type either -- in the few cases where adding a new key to the AA invokes a rehash, all hell breaks loose. The original associative array is suddenly *corrupted*! (The way to make the function work as expected is to make the map an inout argument, but that is beside my point.) I hadn't realized that could happen, and it clearly needs to be fixed. > A simple fix is to change the AA implementation so that AAs become MyAA* instead of aaA*[], with MyAA defined as: > > struct MyAA { > AANode[] buckets; > uint numberOfNodes; > } > > You get a very slight additional lookup cost (double dereferencing to get to the buckets), but make the AA a pure reference type without any caveats and problems. I'll make that change.

This is just a perfect demonstration of the problem with D value types. A way to solve this would be to extend the inout keyword to function return and variables, such that [] would look like: inout T opIndex(size_t i) Hong In article <e0169e$25hq$1@digitaldaemon.com>, Sean Kelly says... > >Nick wrote: >> In article <e00v0m$te2$3@digitaldaemon.com>, Oskar Linde says... >>> Hello, >>> >>> The latest "Implicit fn template instantiation" thread contains code >>> that contains a hidden and possibly hard to find bug. >>> [...] >> >> Good catch! > >Indeed. It seems roughly similar to how arrays are handled: in place modifications persist, but anything causing a realloc does not. But the original should never be rendered unusable. > >>> 2) Why even keep the builtin AA? With the addition of an opIn() overload, you could get *exactly* the same functionality from a library implementation >> >> Nope, in fact, you couldn't. The kicker is that the builtin AA uses some syntax that is currently _impossible_ to duplicate in custom types. Eg. the following: >> >> # int[int] ii; >> # ii[10]++; > >Yup, not having a bona fide reference type in D is problematic. Though you could always use pointers :-p > >> I find it rather confusing and frustrating. In my current project I ended up making my own template AA to get around some of the deficiencies of the builtin type. Mostly I wanted to avoid double lookups, and easily allow custom hashers without encapsulating the key in a struct/class (think case-insensitive char[] keys.) The result was also faster and more memory efficient than the builtin AA. > >This would also *almost* allow us to get rid of opCmp(Object) and opEquals(Object) in the Object base class, which would be very nice. I definately do like having built in AAs, but I do find some of the consequences irritating. > > >Sean

March 28, 2006

Re: Robustness of the built in associative array

Posted by Wolfgang Draxinger
in reply to Nick

Permalink

Wolfgang Draxinger

Posted in reply to Nick

Permalink

Nick wrote:

> I think the STL containers do a very good job at being both flexible and usable, covering most normal cases of use. The thing missing from D to make a STL-like library would be the reference return type.

Well, there's just the little problem, that C++ '&' reference types being results of functions are not safe either. A C++ reference is IMHO nothing better than a strongly typed pointer.

example

class foo
{
foo(int bar) { a = new int[bar]; }
int *a;

// ignore the possible bounds error ;-)
int & operator [] (int i){ return a[i]; }

int & something ()
{
        int hello = 123;
        return hello;
}

};

This isn't a bit better than using a pointer instead a reference.

The real problem currently is, that the index expression can be used as a lvalue, but using opIndex has not yet support for it.

But why not solve it this way: The code

int[] array;
aa[i]++;

compiles into execution of

aa.opIndexAssign(aa.opIndex(i)+1, i);

Now for the support of user defined AA I'd like to suggest a new overload of opIndex and opIndexAssign or new operator functions opKey, opKeyAssign, which take a parameter of the key type as first parameter. Also I'd introduce a base AA type, so that one can easyly derive from it. To use such a user defined AA class I'd like to suggest the following syntax;

int[char[]]!MyAA an_AA;

The idea is, that any sort of AA is in fact an implcit instanciation of the builtin AA template, i.e.

int[char[]] an_AA;

is the same like

int[char[]]!__builtinAA an_AA;

Instead of a struct I'd use classes for this. AAs are dynamic and thus allocated on the heap anyway. But making them a class would easyly allow to catch such ambigous things, like Oskar had found.

So logically an AA class would be declared as

class __builtinAA(T, K)
{
};

And a derived class

class MyAA(T, K) : __builtinAA!(T, K)
{
};

or even

class IntToStringAA : __builtinAA(char[], int)
{
};

Eventually derive __builtinAA from a common base, non templated AA type, so that RTTI clearly recognizes it as AA, and not derive from a templated class. How about this?

Wolfgang Draxinger

Forums