Thread overview
object.d and hash_t confusion?
Jun 21, 2006
kris
Jun 26, 2006
James Pelcis
Jun 26, 2006
kris
Jun 27, 2006
James Pelcis
Jun 27, 2006
xs0
Jun 27, 2006
Lionello Lunesu
June 21, 2006
In object.d, there's an alias declaration for hash_t like so:

------------
alias size_t hash_t;
-----------

This indicates that the hash_t type will be 32bit on a 32bit system, and 64bit on that system; yes? Is this so that a pointer can be directly returned as a hash value?

Then, also in object.d, we have the decl for class Object:

-----------
class Object
{
    void print();
    char[] toString();
    uint toHash();
    int opCmp(Object o);
    int opEquals(Object o);
}
-----------

Notice that the toHash() method returns a uint? Is that supposed to be hash_t instead?

For the moment, let's suppose it is meant to be hash_t. The rest of this post is based upon that notion, so if I'm wrong here, no harm done :)

Using hash_t as the return type would mean the toHash() method returns a different type depending upon which platform it's compiled upon. This may have some ramifications, so let's explore what they might be:

1) because an alias is used, type-safety does not come into play. Thus, when someone overrides Object.toHash like so:

------------
override uint toHash() {...}
------------

a 32bit compiler will be unlikely to complain (remember, hash_t is an alias).

When this code is compiled in 64bit land, luckily, the compiler will probably complain about the uint/ulong mismatch. However, because the keyword "override" is not mandatory, most programmers will do this instead (in an class):

-----------
uint toHash() {....}
-----------

the result will perhaps be a good compile but a bogus override? Or will the compiler flag this as not being covariant? Either way, shouldn't this be handled in a more suitable manner?

I suppose one way to ensure consistency is to use a typedef instead of an alias ... but will that cause errors when the result is used in an arithmetic expression? In this situation, is typedef too type-safe and alias not sufficient?


2) It's generally not a great idea to change the signature/types of overridable methods when moving platforms. You have to ensure there's absolute consistency in the types used, otherwise the vaguely brittle nature of the override mechanism can be tripped.

So the question here is "why does toHash() need to change across platforms?". Isn't 32bits sufficient?

If the answer to that indicates a 64bit value being more applicable (even for avoiding type-conversion warnings), then it would seem to indicate a new integral-type is required? One that has type-safety (a la typedef) but can be used in arithmetic expression without warnings or errors? This new type would be equivalent to size_t vis-a-vis byte size.

I know D is supposed to have fixed-size basic integer types across platforms, and for good reason. Yet here's a situation where, it *seems* that the most fundamental class in the runtime is perhaps flaunting that? Perhaps there's a few other corners where similar concerns may crop up?

I will note a vague distaste for the gazilion C++ style meta-types anyway; D does the right thing in making almost all of them entirely redundant. But, if there is indeed a problem with toHash(), then I suspect we need a more robust solution. What say you?
June 26, 2006
kris wrote:
> Notice that the toHash() method returns a uint? Is that supposed to be hash_t instead?
Yes.  In the internal\object.d file, it is hash_t.  This is now Bugzilla 225.

> 1) because an alias is used, type-safety does not come into play. Thus, when someone overrides Object.toHash like so:
> 
> ------------
> override uint toHash() {...}
> ------------
> 
> a 32bit compiler will be unlikely to complain (remember, hash_t is an alias).
The compiler would be right, too.  It is the same type (for 32 bits).

> 
> When this code is compiled in 64bit land, luckily, the compiler will probably complain about the uint/ulong mismatch. However, because the keyword "override" is not mandatory, most programmers will do this instead (in an class):
> 
> -----------
> uint toHash() {....}
> -----------
> 
> the result will perhaps be a good compile but a bogus override? Or will the compiler flag this as not being covariant? Either way, shouldn't this be handled in a more suitable manner?
This is a programmer error, not a language error.  Fortunately, it would be marked as not being covariant.

> I suppose one way to ensure consistency is to use a typedef instead of an alias ... but will that cause errors when the result is used in an arithmetic expression? In this situation, is typedef too type-safe and alias not sufficient?
If a typedef was used, hash_t could still be used in expressions, but the result would need to be casted to go back to hash_t.

> 
> 2) It's generally not a great idea to change the signature/types of overridable methods when moving platforms. You have to ensure there's absolute consistency in the types used, otherwise the vaguely brittle nature of the override mechanism can be tripped.
> 
> So the question here is "why does toHash() need to change across platforms?". Isn't 32bits sufficient?
toHash definitely needs to change across platforms.  Here's the current implementation:

# hash_t toHash()
#    {
#	// BUG: this prevents a compacting GC from working, needs to be fixed
#	return cast(uint)cast(void *)this;
#    }

Ignoring the fact that the function won't currently work on 64-bit either (since it is marked as having a bug, although for a different reason), the result needs to be big enough to return a pointer.  32-bits won't always do that.

> If the answer to that indicates a 64bit value being more applicable (even for avoiding type-conversion warnings), then it would seem to indicate a new integral-type is required? One that has type-safety (a la typedef) but can be used in arithmetic expression without warnings or errors? This new type would be equivalent to size_t vis-a-vis byte size.
On some platforms and at some time, even 64-bits won't be enough to handle toHash.

> I know D is supposed to have fixed-size basic integer types across platforms, and for good reason. Yet here's a situation where, it *seems* that the most fundamental class in the runtime is perhaps flaunting that? Perhaps there's a few other corners where similar concerns may crop up?
> 
> I will note a vague distaste for the gazilion C++ style meta-types anyway; D does the right thing in making almost all of them entirely redundant. But, if there is indeed a problem with toHash(), then I suspect we need a more robust solution. What say you?
Since the only non-bug problem I noticed here was a programmer error (using uint instead of hash_t), why should it be changed?

If a change does need to be made though, the alias could be changed into a typedef.  That would check for the problem regardless of the platform.
June 26, 2006
James Pelcis wrote:
> Since the only non-bug problem I noticed here was a programmer error (using uint instead of hash_t), why should it be changed?

Well, the hope was that such an easy-to-make 'mistake' would be caught by the compiler :)

> If a change does need to be made though, the alias could be changed into a typedef.  That would check for the problem regardless of the platform.

Yep, but probably requires casting. Walter has noted on a number of ocassions that a cast is not exactly intended for general purposes. I just wonder if this should be considered a special-case or not
June 27, 2006
kris wrote:
> James Pelcis wrote:
>> Since the only non-bug problem I noticed here was a programmer error (using uint instead of hash_t), why should it be changed?
> 
> Well, the hope was that such an easy-to-make 'mistake' would be caught by the compiler :)
Alas, no.  It's similar to (for example) using ubyte instead of GLubyte.  Both are legal.  In fact, we don't normally even want the compiler to complain about it.

>> If a change does need to be made though, the alias could be changed into a typedef.  That would check for the problem regardless of the platform.
> 
> Yep, but probably requires casting. Walter has noted on a number of ocassions that a cast is not exactly intended for general purposes. I just wonder if this should be considered a special-case or not
Casting wouldn't be necessary when using a typedef'ed version of hash_t, but it would still be needed whenever it's assigned to a variable. Personally, I don't think it's necessary and it definitely isn't desirable to need to use casting for the Object class.  I vote to leave it as is (with the bug fixed).
June 27, 2006
> On some platforms and at some time, even 64-bits won't be enough to handle toHash.

Don't you think a hash of 64 (or even 32) bits should always be enough? If your hashing function is bad, no amount of bits will help, and if it's good, 32 bits is enough for most everything, and 64 is definitely enough for anything at all..


xs0
June 27, 2006
xs0 wrote:
> 
>> On some platforms and at some time, even 64-bits won't be enough to handle toHash.
> 
> Don't you think a hash of 64 (or even 32) bits should always be enough? If your hashing function is bad, no amount of bits will help, and if it's good, 32 bits is enough for most everything, and 64 is definitely enough for anything at all..

In fact, I think that a hash of 32-bit should indeed be enough for anything. Even a 64-bit pointer should be hashable in 32-bits, by using some logical operations (hi ^ lo?).

L.