Jump to page: 1 2
Thread overview
Cost of assoc array?
May 14, 2014
Chris
May 14, 2014
bearophile
May 14, 2014
Chris
May 14, 2014
dennis luehring
May 14, 2014
bearophile
May 14, 2014
John Colvin
May 14, 2014
Chris
May 14, 2014
dennis luehring
May 14, 2014
Chris
May 14, 2014
John Colvin
May 14, 2014
Chris
May 14, 2014
bearophile
May 14, 2014
John Colvin
May 14, 2014
I have code that uses the following:

string[][size_t] myArray;

1. myArray = [0:["t", "o", "m"], 1:["s", "m", "i", "th"]];

However, I've found out that I never need an assoc array and a "linear" array would be just fine, as in

2. myArray = [["t", "o", "m"], ["s", "m", "i", "th"]];

Is there any huge difference as regards performance and memory footprint between the two? Or is 2. basically 1. under the hood?
May 14, 2014
Chris:

> Is there any huge difference as regards performance and memory footprint between the two? Or is 2. basically 1. under the hood?

An associative array is a rather more complex data structure, so if you don't need it, use something simpler. There is difference in both the amount of memory used and performance (in many cases such difference doesn't matter). In D there are also differences in the way they are initialized from a null or fat null.

Bye,
bearophile
May 14, 2014
On Wednesday, 14 May 2014 at 10:20:51 UTC, bearophile wrote:
> Chris:
>
>> Is there any huge difference as regards performance and memory footprint between the two? Or is 2. basically 1. under the hood?
>
> An associative array is a rather more complex data structure, so if you don't need it, use something simpler. There is difference in both the amount of memory used and performance (in many cases such difference doesn't matter). In D there are also differences in the way they are initialized from a null or fat null.
>
> Bye,
> bearophile

Thanks. Do you mean the difference is negligible in many cases? I'm not sure, because changing it would be a breaking change in the old code, meaning I would have to change various methods.
May 14, 2014
Am 14.05.2014 12:33, schrieb Chris:
> On Wednesday, 14 May 2014 at 10:20:51 UTC, bearophile wrote:
>> Chris:
>>
>>> Is there any huge difference as regards performance and memory
>>> footprint between the two? Or is 2. basically 1. under the
>>> hood?
>>
>> An associative array is a rather more complex data structure,
>> so if you don't need it, use something simpler. There is
>> difference in both the amount of memory used and performance
>> (in many cases such difference doesn't matter). In D there are
>> also differences in the way they are initialized from a null or
>> fat null.
>>
>> Bye,
>> bearophile
>
> Thanks. Do you mean the difference is negligible in many cases?
> I'm not sure, because changing it would be a breaking change in
> the old code, meaning I would have to change various methods.

a simple array would be faster because no access is generate for your key - just plain access, just don't use assoc arrays if you don't need key based access

read more manuals about hashmaps and stuff and how to do benchmarking - helps alot in the future


May 14, 2014
Chris:

> Do you mean the difference is negligible in many cases?

Yes, but you have to profile the code (or reason about it well, with knowledge of the data structures and their usage patterns) if you want to know what your case is.

Bye,
bearophile
May 14, 2014
On Wednesday, 14 May 2014 at 09:38:10 UTC, Chris wrote:
> I have code that uses the following:
>
> string[][size_t] myArray;
>
> 1. myArray = [0:["t", "o", "m"], 1:["s", "m", "i", "th"]];
>
> However, I've found out that I never need an assoc array and a "linear" array would be just fine, as in
>
> 2. myArray = [["t", "o", "m"], ["s", "m", "i", "th"]];
>
> Is there any huge difference as regards performance and memory footprint between the two? Or is 2. basically 1. under the hood?

If you don't need the features of associative arrays, don't use them.

Normal arrays are much simpler, faster and (due to some outstanding problems with associative arrays in D) less bug-prone. Associative arrays, by definition, require a lot more work behind the scenes for both reading and writing.
May 14, 2014
On Wednesday, 14 May 2014 at 11:13:10 UTC, John Colvin wrote:
> On Wednesday, 14 May 2014 at 09:38:10 UTC, Chris wrote:
>> I have code that uses the following:
>>
>> string[][size_t] myArray;
>>
>> 1. myArray = [0:["t", "o", "m"], 1:["s", "m", "i", "th"]];
>>
>> However, I've found out that I never need an assoc array and a "linear" array would be just fine, as in
>>
>> 2. myArray = [["t", "o", "m"], ["s", "m", "i", "th"]];
>>
>> Is there any huge difference as regards performance and memory footprint between the two? Or is 2. basically 1. under the hood?
>
> If you don't need the features of associative arrays, don't use them.
>
> Normal arrays are much simpler, faster and (due to some outstanding problems with associative arrays in D) less bug-prone. Associative arrays, by definition, require a lot more work behind the scenes for both reading and writing.

The question is, if they are _much_ faster. With this type (string[][size_t]) I haven't encountered any bugs yet. On the other hand, it introduces some rather stilted logic sometimes, as in

foreach (size_t i; 0..myArray.length) {
  // do something with myArray[i];
}

because it's not sorted. This, or I sort it first. Anyway, there's always an overhead associated with associative arrays. I'll have to see how big this breaking change would be, and decide, if it's worth it.

Profiling is not really feasible, because for this to work properly, I would have to introduce the change first to be able to compare both. Nothing worse than carefully changing things only to find out, it doesn't really speed up things.
May 14, 2014
Am 14.05.2014 15:20, schrieb Chris:
> Profiling is not really feasible, because for this to work
> properly, I would have to introduce the change first to be able
> to compare both. Nothing worse than carefully changing things
> only to find out, it doesn't really speed up things.

why not using an alias for easier switch between the versions?

alias string[][size_t] my_array_type

or

alias string[][] my_array_type

an do an search&replace of "string[][size_t]" with "my_array_type" thats it - still too hard :)





May 14, 2014
On Wednesday, 14 May 2014 at 13:20:40 UTC, Chris wrote:
> On Wednesday, 14 May 2014 at 11:13:10 UTC, John Colvin wrote:
>> On Wednesday, 14 May 2014 at 09:38:10 UTC, Chris wrote:
>>> I have code that uses the following:
>>>
>>> string[][size_t] myArray;
>>>
>>> 1. myArray = [0:["t", "o", "m"], 1:["s", "m", "i", "th"]];
>>>
>>> However, I've found out that I never need an assoc array and a "linear" array would be just fine, as in
>>>
>>> 2. myArray = [["t", "o", "m"], ["s", "m", "i", "th"]];
>>>
>>> Is there any huge difference as regards performance and memory footprint between the two? Or is 2. basically 1. under the hood?
>>
>> If you don't need the features of associative arrays, don't use them.
>>
>> Normal arrays are much simpler, faster and (due to some outstanding problems with associative arrays in D) less bug-prone. Associative arrays, by definition, require a lot more work behind the scenes for both reading and writing.
>
> The question is, if they are _much_ faster. With this type (string[][size_t]) I haven't encountered any bugs yet. On the other hand, it introduces some rather stilted logic sometimes, as in
>
> foreach (size_t i; 0..myArray.length) {
>   // do something with myArray[i];
> }
>
> because it's not sorted. This, or I sort it first. Anyway, there's always an overhead associated with associative arrays. I'll have to see how big this breaking change would be, and decide, if it's worth it.
>
> Profiling is not really feasible, because for this to work properly, I would have to introduce the change first to be able to compare both. Nothing worse than carefully changing things only to find out, it doesn't really speed up things.

Yes, they are much faster. Normal array indexing is equivalent to *(myArray.ptr + index) plus an optional bounds check, whereas associative array indexing is a much, much larger job.

Why were you using associative arrays in the first place? Unless your keys are somehow sparse* or of a non-integer type there isn't any reason to.


* How I see that constraint in that context:

(maxKey - minKey) / nElements > 1 + epsilon
where epsilon is the maximum proportional wasted space you could afford in a normal array (emptyElements / usedElements). Bear in mind the memory overhead of associative arrays is itself non-zero.

Also, while normal arrays tend to be more cache friendly than associative arrays, this isn't true for very sparse arrays.
May 14, 2014
On Wednesday, 14 May 2014 at 13:31:53 UTC, dennis luehring wrote:
> Am 14.05.2014 15:20, schrieb Chris:
>> Profiling is not really feasible, because for this to work
>> properly, I would have to introduce the change first to be able
>> to compare both. Nothing worse than carefully changing things
>> only to find out, it doesn't really speed up things.
>
> why not using an alias for easier switch between the versions?
>
> alias string[][size_t] my_array_type
>
> or
>
> alias string[][] my_array_type
>
> an do an search&replace of "string[][size_t]" with "my_array_type" thats it - still too hard :)

Talking about not seeing the forest for the trees ... I'll try that. :)
« First   ‹ Prev
1 2