Jump to page: 1 2 3
Thread overview
Re: AA with complex keytype?
Feb 09, 2007
Stewart Gordon
Feb 09, 2007
Manfred Nowak
Feb 09, 2007
Frits van Bommel
Feb 09, 2007
Manfred Nowak
Feb 09, 2007
Frits van Bommel
Feb 09, 2007
Manfred Nowak
Feb 10, 2007
Frits van Bommel
Feb 10, 2007
Manfred Nowak
Feb 10, 2007
Frits van Bommel
Feb 11, 2007
Manfred Nowak
Feb 10, 2007
Stewart Gordon
Feb 10, 2007
Frits van Bommel
Feb 10, 2007
Bill Baxter
Feb 13, 2007
Charles D Hixson
Feb 14, 2007
Joel C. Salomon
Feb 14, 2007
Manfred Nowak
Feb 14, 2007
Joel Salomon
Feb 14, 2007
Manfred Nowak
Feb 14, 2007
Frits van Bommel
Feb 14, 2007
Nicolai Waniek
Feb 14, 2007
Manfred Nowak
February 09, 2007
Manfred Nowak Wrote:

<snip>
> Why are classes required to implement opCmp( Object) for AA's to function properly?

Apparently because Walter doesn't like the idea of providing an alternative implementation of AAs for unordered classes, but hasn't to my knowledge given a truly convincing reason.

My library implementation

http://pr.stewartsplace.org.uk/d/sutil/

uses only toHash and opEquals, thus works on unordered types.

> Currently AA's misbehave if opCmp(Object) is not overwritten for a class, but no error is thrown.

Currently?  Not under DMD 1.005 as I try it.  Please supply a code example that demonstrates this behaviour.

> For the transitivity of an ordering it suffices for opCmp(Object) to always give "==" if the objects are equal, and give "<" if they are not.

So if x != y, then both x < y and y < x?  That wouldn't make sense at all.

> It should be easy to implement this as the default opCmp(Object).

OUAT, Object.opCmp was set up to compare the memory addresses, but this behaviour was removed to prepare for copying/compacting GC, and probably partly to eliminate the confusing behaviour it created.

Stewart.

February 09, 2007
Stewart Gordon wrote

>> Currently AA's misbehave if opCmp(Object) is not overwritten for
>> a class, but no error is thrown.
> Currently?  Not under DMD 1.005 as I try it.  Please supply a code example that demonstrates this behaviour.

Following code shows under 1.005 that no compile time error is given.

class C{
  hash_t toHash() {
    return 0;
  }
}
void main(){
  bool[C] map;
  map[ new C]= true;
  map[ new C]= false;
}

A runtime error shows up only, when a comparison is actually needed. This might be far too late. In this example the runtime error is forced by implementing a worthless toHash.


> So if x != y, then both x < y and y < x?  That wouldn't make sense at all.

Enough sense for an AA: all colliding elements are put into a linear list and searched in sequence on retrieval. Without a good toHash this would natrally lead to a bad runtime behaviour.

-manfred
February 09, 2007
Manfred Nowak wrote:
>> So if x != y, then both x < y and y < x?  That wouldn't make sense
>> at all. 
> 
> Enough sense for an AA: all colliding elements are put into a linear list and searched in sequence on retrieval. Without a good toHash this would natrally lead to a bad runtime behaviour.

If they'd be put into a linear list you wouldn't even *need* opCmp, so no opCmp would be "enough sence" as well ;).
But the fact is they are put into a binary tree to keep acceptable behavior (i.e. O(log N) instead of O(N)) when the hash function sucks. Since hash functions can be user-defined, and not all users are experts at hashing, you need to consider that use case.
February 09, 2007
Frits van Bommel wrote

> Since hash functions can be user-defined, and not all users are experts at hashing, you need to consider that use case.
Currently implemented AA's need to have for a good runtime beahviour at
least one of 1) or 2)
where:
(1a) toHash implements a good distribution and
(1b) opCmp==!opEquals or better
or
(2a) toHash worse than described by (1a) and
(2b) opCmp implements an ordering suitable for use in unbalanced binary
trees

There is no evidence, that users that are no experts at hashing will be experts at implementing a suitable ordering, especially if the elements do not have a natural ordering.

-manfred
February 09, 2007
Manfred Nowak wrote:
> Frits van Bommel wrote
> 
>> Since hash functions can be user-defined, and not all users
>> are experts at hashing, you need to consider that use case.
> Currently implemented AA's need to have for a good runtime beahviour at least one of 1) or 2)
> where:
> (1a) toHash implements a good distribution and
> (1b) opCmp==!opEquals or better

I haven't inspected the current AA implementation in detail. Are you sure that an opCmp that never returns "greater" (or, equivalently, never returns "less") will always work?
It may well be that the current implementation's binary trees degenerate into a linked list in that case, but it may also very well be that some elements become irretrievable.

> or
> (2a) toHash worse than described by (1a) and
> (2b) opCmp implements an ordering suitable for use in unbalanced binary trees
> 
> There is no evidence, that users that are no experts at hashing will be experts at implementing a suitable ordering, especially if the elements do not have a natural ordering.

Orderings are easier to get right than hash functions, I think. Pretty much everybody knows what an ordering looks like, but hash functions need to have a good distribution which can be a rather complicated topic.

Of course, since you specified an ordering suitable for an _unbalanced_ binary tree that complicates things a bit. But I don't think good behavior in there depends much on the comparison function. It's more related to in which order you insert the elements. Which orders are good and bad _are_ of course dependent on the comparison...

(do current AAs use unbalanced trees? I have no idea, I just know they use _some_ form of trees)
February 09, 2007
Frits van Bommel wrote

> Are
> you sure that an opCmp that never returns "greater" (or,
> equivalently, never returns "less") will always work?
Such an opCmp requires the absence of rebalancing operations. And there have to go some kudos to Walter for that being true.

> (do current AAs use unbalanced trees? I have no idea, I just know
> they use _some_ form of trees)
They are unbalanced. Look at phobos/internal/aaA.d:278 and following lines.

-manfred
February 10, 2007
Manfred Nowak wrote:
> Frits van Bommel wrote
> 
>> Are
>> you sure that an opCmp that never returns "greater" (or,
>> equivalently, never returns "less") will always work?
> Such an opCmp requires the absence of rebalancing operations. And there have to go some kudos to Walter for that being true. 

Doesn't it also require that objects are always passed in the same parameter order to the comparison function?
So when comparing two concrete objects a and b, you must always do compare(a, b), never compare(b, a). No matter which is the one you're looking for and which is the element in the tree.
February 10, 2007
Frits van Bommel wrote

> you must always do compare(a, b), never compare(b, a)

No. Because if you are at a node in the tree where a==b, you are done.
And if you are at a node in the tree where a!=b, you will go into the
direction dictated by opCmp, which is by construction of opCmp the same
for all permutations.
This holds for all operations on the tree.

-manfred
February 10, 2007
Manfred Nowak wrote:
> Frits van Bommel wrote
> 
>> you must always do compare(a, b), never compare(b, a)
> 
> No. Because if you are at a node in the tree where a==b, you are done. And if you are at a node in the tree where a!=b, you will go into the direction dictated by opCmp, which is by construction of opCmp the same for all permutations.
> This holds for all operations on the tree. 

Let's say a!=b. If compare(a,b) returns a!=b, as you proposed, that evaluates to true, which converts to integer 1, representing "greater". So it returns that a>b holds. So if b is the node in the tree, you would then continue searching in the *right* branch.
If on the other hand compare(b,a) is called, by the same process it returns 1 indicating b>a holds. If b is still the tree node, you would then continue your search in the *left* branch
So the implementation needs to be careful about parameter order, and I don't want to go dig into the source to find out if it is. Nor do I want to check every Phobos/Tango release I use to find out if it _still_ is. Since the API doesn't specify this, I think it would be a mistake to rely on it even if it _currently_ holds.
February 10, 2007
Manfred Nowak Wrote:

> Frits van Bommel wrote
<snip>
>> (do current AAs use unbalanced trees?  I have no idea, I just know they use _some_ form of trees)
> 
> They are unbalanced.  Look at phobos/internal/aaA.d:278 and following lines.

Balancing the trees on the fly would be inefficient.  But rehash ought to balance them while it's at it.

Stewart.
« First   ‹ Prev
1 2 3