AA implementation algo should be mentioned

without looking into phobos or join irc channel people would never know what algo is used by AA. and that would make people feel AA is not reliable, why not mention the algo used by AA and the complexity on the page? i think that would make the doc looks better

Davidl@126.com wrote: > without looking into phobos or join irc channel people would > never know what algo is used by AA. and that would make people > feel AA is not reliable, why not mention the algo used by AA > and the complexity on the page? i think that would make the > doc looks better I personally love the C++ spec in this respect. Everything in the library is treated as an algorithm and complexity information is defined for all operations. However, I'm not sure that specifics about the algorithm should be put into the D spec for AAs beyond that they must be O(1) for lookups, O(N) for iteration via foreach, and possibly some constraints on insert/remove performance (possibly amortized O(1) for inserts and maybe O(N) for removals?). Sean

Sean Kelly wrote: > Davidl@126.com wrote: >> without looking into phobos or join irc channel people would >> never know what algo is used by AA. and that would make people >> feel AA is not reliable, why not mention the algo used by AA >> and the complexity on the page? i think that would make the >> doc looks better > > I personally love the C++ spec in this respect. Everything in the library is treated as an algorithm and complexity information is defined for all operations. However, I'm not sure that specifics about the algorithm should be put into the D spec for AAs beyond that they must be O(1) for lookups, O(N) for iteration via foreach, and possibly some constraints on insert/remove performance (possibly amortized O(1) for inserts and maybe O(N) for removals?). I agree it would be nice if this were specified. However, I don't think inserts are currently amortized O(1), are they? IIRC the Phobos implementation uses a hash table with binary trees for collision handling, which would mean /average/ O(1) for "good" toHash() implementations, but worst-case O(log N) (for "bad" toHash()s). Also, that would mean removals are currently O(log N) as well, but specifying O(N) would allow other standard library implementations to use linked lists instead of binary trees so perhaps that's for the best.

Frits van Bommel wrote: [about inserts] > IIRC the Phobos implementation uses a hash table with binary trees for collision handling, which would mean /average/ O(1) for "good" toHash() implementations, but worst-case O(log N) (for "bad" toHash()s). Oh sorry, forgot about resizing. What does that make it, average amortized O(1)? :P

March 17, 2007

Re: AA implementation algo should be mentioned

Posted by Sean Kelly
in reply to Frits van Bommel

Permalink

Sean Kelly

Posted in reply to Frits van Bommel

Permalink

Frits van Bommel wrote:
> Sean Kelly wrote:
>> Davidl@126.com wrote:
>>> without looking into phobos or join irc channel people would
>>> never know what algo is used by AA. and that would make people
>>> feel AA is not reliable, why not mention the algo used by AA
>>> and the complexity on the page? i think that would make the
>>> doc looks better
>>
>> I personally love the C++ spec in this respect.  Everything in the library is treated as an algorithm and complexity information is defined for all operations.  However, I'm not sure that specifics about the algorithm should be put into the D spec for AAs beyond that they must be O(1) for lookups, O(N) for iteration via foreach, and possibly some constraints on insert/remove performance (possibly amortized O(1) for inserts and maybe O(N) for removals?).
> 
> I agree it would be nice if this were specified.
> 
> However, I don't think inserts are currently amortized O(1), are they? 

Oops, you're right.  They're typically O(N) with a best case (omega?) of 1.  Double hashing is O(1) for inserts I think, but removals tend to be somewhat messy.

> IIRC the Phobos implementation uses a hash table with binary trees for collision handling, which would mean /average/ O(1) for "good" toHash() implementations, but worst-case O(log N) (for "bad" toHash()s).
> Also, that would mean removals are currently O(log N) as well, but specifying O(N) would allow other standard library implementations to use linked lists instead of binary trees so perhaps that's for the best.

Yup, which is correct IMO.  There should be no restriction on implementation other than that some form of hashing is desired :-)


Sean

Frits van Bommel wrote: > Frits van Bommel wrote: > [about inserts] >> IIRC the Phobos implementation uses a hash table with binary trees for collision handling, which would mean /average/ O(1) for "good" toHash() implementations, but worst-case O(log N) (for "bad" toHash()s). > > Oh sorry, forgot about resizing. What does that make it, average amortized O(1)? :P I think so. Sean

Sean Kelly wrote: > Double hashing is O(1) for inserts I think, but removals tend to be somewhat messy. I don't think so. AFAIK, it's just a technique that attempts to avoid the clustering effect that linear probing suffers from. However, in a full table it can still take O(N) probes before an empty slot is found. Wikipedia seems to back me up on this: "Like all other forms of open addressing, double hashing becomes linear as the hash table approaches maximum capacity" (http://en.wikipedia.org/wiki/Double_hashing) > Frits van Bommel wrote: >> IIRC the Phobos implementation uses a hash table with binary trees for collision handling, which would mean /average/ O(1) for "good" toHash() implementations, but worst-case O(log N) (for "bad" toHash()s). >> Also, that would mean removals are currently O(log N) as well, but specifying O(N) would allow other standard library implementations to use linked lists instead of binary trees so perhaps that's for the best. > > Yup, which is correct IMO. There should be no restriction on implementation other than that some form of hashing is desired :-) Right, I just wanted to mention that. (IIRC that's also the philosophy the C++ standard follows on this point)

Frits van Bommel wrote: > Sean Kelly wrote: >> Double hashing is O(1) for inserts I think, but removals tend to be somewhat messy. > > I don't think so. AFAIK, it's just a technique that attempts to avoid the clustering effect that linear probing suffers from. However, in a full table it can still take O(N) probes before an empty slot is found. > Wikipedia seems to back me up on this: "Like all other forms of open addressing, double hashing becomes linear as the hash table approaches maximum capacity" (http://en.wikipedia.org/wiki/Double_hashing) Could be, I suppose. Though I'd have to tell a Data Structures professor I know that his proof is incorrect. I suppose I could be mis-remembering the conclusion as well though. Hm... it may have been based on particular growth conditions, like never letting the table get more than half full before a rehash. Sean

On Sat, 17 Mar 2007 23:29:18 +0800, Davidl@126.com wrote: > without looking into phobos or join irc channel people would never know what algo is used by AA. and that would make people feel AA is not reliable, why not mention the algo used by AA and the complexity on the page? i think that would make the doc looks better Isn't the alogrithm used an implementation detail of the compiler and not the language? I would see that any description on the AA alogrithm should be placed in the docs for the compiler and not mandated into the language. -- Derek Parnell Melbourne, Australia "Justice for David Hicks!" skype: derek.j.parnell

Derek Parnell escribió: > On Sat, 17 Mar 2007 23:29:18 +0800, Davidl@126.com wrote: > >> without looking into phobos or join irc channel people would >> never know what algo is used by AA. and that would make people >> feel AA is not reliable, why not mention the algo used by AA >> and the complexity on the page? i think that would make the >> doc looks better > > Isn't the alogrithm used an implementation detail of the compiler and not > the language? I would see that any description on the AA alogrithm should > be placed in the docs for the compiler and not mandated into the language. I agree, but I think that time/space complexity IS mandatory in the language specification. -- Tom;

Forums