November 19, 2005
Manfred Nowak wrote:
> Sean Kelly wrote:
> 
> [...]
>> So AA operations should be O(1). 
> 
> A clear maybe.

Oops, I forgot the "amortized."

> Therefore the time bound for an access to an element in the AA is amortized O(1) if and only if the number of rehashes to the AA can be bound by a constant.

The number of rehashes/reallocations/whatever in any amortized constant time algorithm tend to be a (slowly growing) function of N (which isn't a constant), so by that rationale they can't be amortized O(1).  What am I missing here?


Sean
November 19, 2005
Sean Kelly wrote:
> 
> The number of rehashes/reallocations/whatever in any amortized constant time algorithm tend to be a (slowly growing) function of N (which isn't a constant), so by that rationale they can't be amortized O(1).  What am I missing here?

I take it back :p  There is a constant bounding those functions.  Should have given it some thought before posting.


Sean
November 19, 2005
In article <dlmhh2$2o35$1@digitaldaemon.com>, Sean Kelly says...
>
>Sean Kelly wrote:
>> 
>> The number of rehashes/reallocations/whatever in any amortized constant time algorithm tend to be a (slowly growing) function of N (which isn't a constant), so by that rationale they can't be amortized O(1).  What am I missing here?
>
>I take it back :p  There is a constant bounding those functions.  Should have given it some thought before posting.
>
>
>Sean

Hi,

I could be missing something here, but complexity information gives you a an idea of how *scaleable* an implementation is, not how *fast* it is.

It's not uncommon for lower order solutions to a problem to be really slow and higher order solutions to be very quick - it just depends how big N is. I might have a great O(1) time algorithm for some application, but maybe it takes four days to run. If you have an O(N^2) algorithm for the same thing that runs in lets say 10ms + 2ms * N^2, N is going to have to get pretty big before you decide to switch over to the O(1) implementation.

That's why you mostly only see the first order term given. The only point in going into more detail is to let you know about corner cases - it might just be that your particular application is going to push the wrong buttons and not end up with the typical performance (e.g you might get O(N) instead of O(1)) - extra terms are not useful for comparing execution speeds between two algorithms.

If raw runtime execution speed is what you're interested in, you're just going to have to profile the code and see what works fastest for your application/target platform.

Take a look at this performance comparision for example, it demonstrates that
even for hashmaps, the performance varies quite widely among implementations:
http://www.tommti-systems.de/go.html?http://www.tommti-systems.de/main-Dateien/reviews/languages/benchmarks.html

Clearly they're all O(1) algorithms, but surprisingly the C++ implementation is the slowest even though it has the potential to be fastest.

Hope that's useful

Cheers

Munch


November 19, 2005
Munchgreeble wrote:
[...]
> If raw runtime execution speed is what you're interested in, you're just going to have to profile the code and see what works fastest for your application/target platform.
[...]

Very true, but if and only if you know that platform.

This is not the case with a language that is intended to provide bare metal access to the underlying hardware, where even the types of the hardware in general are unknown.

Therefore there might be only two choices:
1) restrict the target harware or
2) provide a mechanism to adapt the implementation of the critical
language features to the hardware.

Number two is the way to go, but this feature is missing in D.

-manfred
November 19, 2005
Sean Kelly wrote:

[...]
> There is a constant bounding those functions.
[...]

I do not agree.

The amortized upper bound for a single access to an element in the AA is O( 1 + r/ae) as shown.

The number of rehashes is summed up from the number ra of automatic rehashes and the number rm of manual rehashes: r = ra + rm.

From the implementation no bound can be shown for rm. But even if it can be assumed that rm==0 from the implementation can be deduced that ra is lower bounded by Omega( log n).

From the implementation also no lower bound can be shown for ae except that an element must at least be inserted, therefore ae is lower bounded by Omega(1).

Because a quotient is upperbounded by the upper bound of its dividend divided by the lower bound of its divisor, the division ra/ae is upper bounded by O(log n).

To reach an amortized upper bound of O(1) the minimal number ae of accesses to an element must be shown to be lower bounded by Omega(log n), which is impossible from the implementation.

-manfred
November 19, 2005
Manfred Nowak wrote:
> Sean Kelly wrote:
> 
> [...]
>> There is a constant bounding those functions.
> [...]
> 
> I do not agree.
> 
> The amortized upper bound for a single access to an element in the AA is O( 1 + r/ae) as shown.
> 
> The number of rehashes is summed up from the number ra of automatic rehashes and the number rm of manual rehashes: r = ra + rm.
> 
> From the implementation no bound can be shown for rm. But even if it can be assumed that rm==0 from the implementation can be deduced that ra is lower bounded by Omega( log n).

Oops, I was unclear.  I meant that there is a constant bound for amortized constant time functions in general.  And while there are hashing algorithms that I'm fairly certain could be bounded, I don't know enough about the DMD implementation to say.  My question was about the concept in general, not DMD's AAs in particular.  So what controls when a rehashing occurs in this implementation?


Sean
November 19, 2005
Sean Kelly wrote:

[...]
> And while there are hashing algorithms that I'm fairly certain could be bounded,

Using Hashing, the time for the access of an element can be bounded by a constant if and only if the number of elements to store in the table is known before hand. This seems to be true for every machine that will exist ever, because the number of atoms in the universe seems to be bounded by a constant also as time goes by. But I do not believe, that you meant that bound.


> So what controls when a rehashing occurs in this implementation?

An automatic rehash occurs whenever the number of elements exceeds the table size by the factor four. Currently an adress space of 2**32 is assumed, which results in maximal 14 automatic rehashes.

-manfred
November 19, 2005
"Oskar Linde" <oskar.lindeREM@OVEgmail.com> wrote in message news:dll7fq$1nn2$1@digitaldaemon.com...
> Associative arrays are implemented as hash-tables, i.e. O(1).
> (IIRC, the hash table nodes are sorted for each bucket, so even if all
indices
> hash to the same value, you get O(log n). But doing binary search is
probably
> reducing the performance when you have a good hash function.)

The binary tree part is only for dealing with collisions. The rest is O(1),
and as you say, the better the hash function, the more O(1) it will be.


November 19, 2005
"Manfred Nowak" <svv1999@hotmail.com> wrote in message news:Xns9713199CF924Esvv1999hotmailcom@63.105.9.61...
> Although the
> unbalanced binary tree Walter has choosen for the resolution of
> collisions is bad as hell in theory it performs very well under the
> constraints given.

I've been using variations of that for 25 years. Every once in a while, I try out some new table method that is theoretically supposed to be faster. They always turn out to be slower. So back to the binary tree!

I've also found it scales very well, from a few entries to tens of thousands of entries.


November 20, 2005
Manfred Nowak wrote:
> Sean Kelly wrote:
> 
> [...]
>> And while there are hashing algorithms that I'm fairly certain
>> could be bounded,
> 
> Using Hashing, the time for the access of an element can be bounded by a constant if and only if the number of elements to store in the table is known before hand. This seems to be true for every machine that will exist ever, because the number of atoms in the universe seems to be bounded by a constant also as time goes by. But I do not believe, that you meant that bound. 

I was thinking that if the size of the hash table doubles every time a rehash occurs (ie. if there are logN rehashes for a table of size N) then the frequency of rehashes should approach zero as N approaches infinity.  Amortizing the cost of these rehashes over all N insertions seems like it should yield an amortized constant time insert, just as push_back operations on std::vector in C++ are said to occur in amortized constant time.


Sean