Thread overview
AA implementation algo should be mentioned
Mar 17, 2007
Davidl
Mar 17, 2007
Sean Kelly
Mar 17, 2007
Frits van Bommel
Mar 17, 2007
Frits van Bommel
Mar 17, 2007
Sean Kelly
Mar 17, 2007
Sean Kelly
Mar 17, 2007
Frits van Bommel
Mar 17, 2007
Sean Kelly
Mar 18, 2007
Derek Parnell
Mar 18, 2007
Tom
March 17, 2007
without looking into phobos or join irc channel people would
never know what algo is used by AA. and that would make people
feel AA is not reliable, why not mention the algo used by AA
and the complexity on the page? i think that would make the
doc looks better
March 17, 2007
Davidl@126.com wrote:
> without looking into phobos or join irc channel people would
> never know what algo is used by AA. and that would make people
> feel AA is not reliable, why not mention the algo used by AA
> and the complexity on the page? i think that would make the
> doc looks better

I personally love the C++ spec in this respect.  Everything in the library is treated as an algorithm and complexity information is defined for all operations.  However, I'm not sure that specifics about the algorithm should be put into the D spec for AAs beyond that they must be O(1) for lookups, O(N) for iteration via foreach, and possibly some constraints on insert/remove performance (possibly amortized O(1) for inserts and maybe O(N) for removals?).


Sean
March 17, 2007
Sean Kelly wrote:
> Davidl@126.com wrote:
>> without looking into phobos or join irc channel people would
>> never know what algo is used by AA. and that would make people
>> feel AA is not reliable, why not mention the algo used by AA
>> and the complexity on the page? i think that would make the
>> doc looks better
> 
> I personally love the C++ spec in this respect.  Everything in the library is treated as an algorithm and complexity information is defined for all operations.  However, I'm not sure that specifics about the algorithm should be put into the D spec for AAs beyond that they must be O(1) for lookups, O(N) for iteration via foreach, and possibly some constraints on insert/remove performance (possibly amortized O(1) for inserts and maybe O(N) for removals?).

I agree it would be nice if this were specified.

However, I don't think inserts are currently amortized O(1), are they? IIRC the Phobos implementation uses a hash table with binary trees for collision handling, which would mean /average/ O(1) for "good" toHash() implementations, but worst-case O(log N) (for "bad" toHash()s).
Also, that would mean removals are currently O(log N) as well, but specifying O(N) would allow other standard library implementations to use linked lists instead of binary trees so perhaps that's for the best.
March 17, 2007
Frits van Bommel wrote:
[about inserts]
> IIRC the Phobos implementation uses a hash table with binary trees for collision handling, which would mean /average/ O(1) for "good" toHash() implementations, but worst-case O(log N) (for "bad" toHash()s).

Oh sorry, forgot about resizing. What does that make it, average amortized O(1)? :P
March 17, 2007
Frits van Bommel wrote:
> Sean Kelly wrote:
>> Davidl@126.com wrote:
>>> without looking into phobos or join irc channel people would
>>> never know what algo is used by AA. and that would make people
>>> feel AA is not reliable, why not mention the algo used by AA
>>> and the complexity on the page? i think that would make the
>>> doc looks better
>>
>> I personally love the C++ spec in this respect.  Everything in the library is treated as an algorithm and complexity information is defined for all operations.  However, I'm not sure that specifics about the algorithm should be put into the D spec for AAs beyond that they must be O(1) for lookups, O(N) for iteration via foreach, and possibly some constraints on insert/remove performance (possibly amortized O(1) for inserts and maybe O(N) for removals?).
> 
> I agree it would be nice if this were specified.
> 
> However, I don't think inserts are currently amortized O(1), are they? 

Oops, you're right.  They're typically O(N) with a best case (omega?) of 1.  Double hashing is O(1) for inserts I think, but removals tend to be somewhat messy.

> IIRC the Phobos implementation uses a hash table with binary trees for collision handling, which would mean /average/ O(1) for "good" toHash() implementations, but worst-case O(log N) (for "bad" toHash()s).
> Also, that would mean removals are currently O(log N) as well, but specifying O(N) would allow other standard library implementations to use linked lists instead of binary trees so perhaps that's for the best.

Yup, which is correct IMO.  There should be no restriction on implementation other than that some form of hashing is desired :-)


Sean
March 17, 2007
Frits van Bommel wrote:
> Frits van Bommel wrote:
> [about inserts]
>> IIRC the Phobos implementation uses a hash table with binary trees for collision handling, which would mean /average/ O(1) for "good" toHash() implementations, but worst-case O(log N) (for "bad" toHash()s).
> 
> Oh sorry, forgot about resizing. What does that make it, average amortized O(1)? :P

I think so.


Sean
March 17, 2007
Sean Kelly wrote:
> Double hashing is O(1) for inserts I think, but removals tend to be somewhat messy.

I don't think so. AFAIK, it's just a technique that attempts to avoid the clustering effect that linear probing suffers from. However, in a full table it can still take O(N) probes before an empty slot is found.
Wikipedia seems to back me up on this: "Like all other forms of open addressing, double hashing becomes linear as the hash table approaches maximum capacity" (http://en.wikipedia.org/wiki/Double_hashing)

> Frits van Bommel wrote:
>> IIRC the Phobos implementation uses a hash table with binary trees for collision handling, which would mean /average/ O(1) for "good" toHash() implementations, but worst-case O(log N) (for "bad" toHash()s).
>> Also, that would mean removals are currently O(log N) as well, but specifying O(N) would allow other standard library implementations to use linked lists instead of binary trees so perhaps that's for the best.
> 
> Yup, which is correct IMO.  There should be no restriction on implementation other than that some form of hashing is desired :-)

Right, I just wanted to mention that. (IIRC that's also the philosophy the C++ standard follows on this point)
March 17, 2007
Frits van Bommel wrote:
> Sean Kelly wrote:
>> Double hashing is O(1) for inserts I think, but removals tend to be somewhat messy.
> 
> I don't think so. AFAIK, it's just a technique that attempts to avoid the clustering effect that linear probing suffers from. However, in a full table it can still take O(N) probes before an empty slot is found.
> Wikipedia seems to back me up on this: "Like all other forms of open addressing, double hashing becomes linear as the hash table approaches maximum capacity" (http://en.wikipedia.org/wiki/Double_hashing)

Could be, I suppose.  Though I'd have to tell a Data Structures professor I know that his proof is incorrect.  I suppose I could be mis-remembering the conclusion as well though.  Hm... it may have been based on particular growth conditions, like never letting the table get more than half full before a rehash.



Sean
March 18, 2007
On Sat, 17 Mar 2007 23:29:18 +0800, Davidl@126.com wrote:

> without looking into phobos or join irc channel people would never know what algo is used by AA. and that would make people feel AA is not reliable, why not mention the algo used by AA and the complexity on the page? i think that would make the doc looks better

Isn't the alogrithm used an implementation detail of the compiler and not the language? I would see that any description on the AA alogrithm should be placed in the docs for the compiler and not mandated into the language.

-- 
Derek Parnell
Melbourne, Australia
"Justice for David Hicks!"
skype: derek.j.parnell
March 18, 2007
Derek Parnell escribió:
> On Sat, 17 Mar 2007 23:29:18 +0800, Davidl@126.com wrote:
> 
>> without looking into phobos or join irc channel people would
>> never know what algo is used by AA. and that would make people
>> feel AA is not reliable, why not mention the algo used by AA
>> and the complexity on the page? i think that would make the
>> doc looks better
> 
> Isn't the alogrithm used an implementation detail of the compiler and not
> the language? I would see that any description on the AA alogrithm should
> be placed in the docs for the compiler and not mandated into the language.

I agree, but I think that time/space complexity IS mandatory in the language specification.

--
Tom;