Thread overview
assoc array performance
Apr 17, 2004
Ben Hinkle
Apr 18, 2004
Kris
Apr 18, 2004
Ben Hinkle
Apr 18, 2004
Kris
Apr 18, 2004
Scott Egan
Apr 18, 2004
Kris
April 17, 2004
well, based on Serge's eariler post I've been playing around with different assoc array implementations and believe it or not the current one is the only one that behaves well on both Windows and Linux under various tests that vary the table size and GC interactions.

The only trick that significantly improved performance was preallocating the node pointer array. I hadn't realized this before but you can already do this as follows:

 int[char[]] X; // an assoc array of ints indexed by strings
 void*[] Y;
 Y.length = 1000000;     // allocate array of void*
 X = cast(int[char[]])Y; // voila, a pre-allocated assoc array

Another performance boost came by using stdup with malloc to duplicate strings instead of .dup but if you do that you have to manage the dup'ed strings so they don't leak.

I tried all kinds of combinations of pre-allocating the data nodes as well as the node pointers and messing with the hashing function but they all had drawbacks when mixed with either the GC or the default pointer array (the default assoc array allocates 10 pointers no matter how many entries you eventually put in). Go figure. Either Walter got lucky or he actually tested this stuff out ;-)

-Ben

April 18, 2004
Thanks Ben!

I've been tearing my hair out trying to figure a way to preallocate AA's. Was just about to post to the "What's needed for 1.0" thread regarding this. Would be nice/appropriate if there was a cleaner 'syntax' that the example below <g>

BTW, did you find a quick way to delete all entries?

- Kris


"Ben Hinkle" <bhinkle4@juno.com> wrote in message news:c5sgbp$2p8j$1@digitaldaemon.com...
> well, based on Serge's eariler post I've been playing around with different assoc array implementations and believe it or not the current one is the only one that behaves well on both Windows and Linux under various tests that vary the table size and GC interactions.
>
> The only trick that significantly improved performance was preallocating the node pointer array. I hadn't realized this before but you can already do this as follows:
>
>   int[char[]] X; // an assoc array of ints indexed by strings
>   void*[] Y;
>   Y.length = 1000000;     // allocate array of void*
>   X = cast(int[char[]])Y; // voila, a pre-allocated assoc array
>
> Another performance boost came by using stdup with malloc to duplicate strings instead of .dup but if you do that you have to manage the dup'ed strings so they don't leak.
>
> I tried all kinds of combinations of pre-allocating the data nodes as well as the node pointers and messing with the hashing function but they all had drawbacks when mixed with either the GC or the default pointer array (the default assoc array allocates 10 pointers no matter how many entries you eventually put in). Go figure. Either Walter got lucky or he actually tested this stuff out ;-)
>
> -Ben
>


April 18, 2004
> BTW, did you find a quick way to delete all entries?

either

  void*[] Y;
  Y.length = 0; // not really needed since Y has 0 length by default
  X = cast(int[char[]])Y;

to dump the node pointer array entirely. If you want to preserve the array (for instance you had preallocated the pointer array) and just clear out all the pointers try

  void*[] Y = cast(void*[])X;
  Y[] = null;   // set all the node pointers to null
  X = cast(int[char[]])Y;

Isolating code like that would be nice since it depends on implementation details of assoc arrays.

-Ben



April 18, 2004
Cheers! Really appreciate that.

Let's hope Walter will add some properties for doing this ...


"Ben Hinkle" <bhinkle4@juno.com> wrote in message news:c5sr12$7q4$1@digitaldaemon.com...
>
> > BTW, did you find a quick way to delete all entries?
>
> either
>
>    void*[] Y;
>    Y.length = 0; // not really needed since Y has 0 length by default
>    X = cast(int[char[]])Y;
>
> to dump the node pointer array entirely. If you want to preserve the array (for instance you had preallocated the pointer array) and just clear out all the pointers try
>
>    void*[] Y = cast(void*[])X;
>    Y[] = null;   // set all the node pointers to null
>    X = cast(int[char[]])Y;
>
> Isolating code like that would be nice since it depends on implementation details of assoc arrays.
>
> -Ben
>
>
>


April 18, 2004
Isn't this dangerous?  You're relying on the semantics of how an Assoc array is implemented by the compiler.

I would have thought the X.length = 100000 would work, but I'm obviously missing a subtle point.

Regards,

Scott


"Ben Hinkle" <bhinkle4@juno.com> wrote in message news:c5sgbp$2p8j$1@digitaldaemon.com...
> well, based on Serge's eariler post I've been playing around with different assoc array implementations and believe it or not the current one is the only one that behaves well on both Windows and Linux under various tests that vary the table size and GC interactions.
>
> The only trick that significantly improved performance was preallocating the node pointer array. I hadn't realized this before but you can already do this as follows:
>
>   int[char[]] X; // an assoc array of ints indexed by strings
>   void*[] Y;
>   Y.length = 1000000;     // allocate array of void*
>   X = cast(int[char[]])Y; // voila, a pre-allocated assoc array
>
> Another performance boost came by using stdup with malloc to duplicate strings instead of .dup but if you do that you have to manage the dup'ed strings so they don't leak.
>
> I tried all kinds of combinations of pre-allocating the data nodes as well as the node pointers and messing with the hashing function but they all had drawbacks when mixed with either the GC or the default pointer array (the default assoc array allocates 10 pointers no matter how many entries you eventually put in). Go figure. Either Walter got lucky or he actually tested this stuff out ;-)
>
> -Ben
>


April 18, 2004
"Scott Egan" <scotte@tpg.com.aux> wrote in message news:c5tels$175v$1@digitaldaemon.com...
> Isn't this dangerous?  You're relying on the semantics of how an Assoc
array
> is implemented by the compiler.

Sure it is, Scott. But at least this is a workaround, until Walter is "encouraged" by enough of us to add such properties to AAs <g>