Associative arrays in D and default comparators (page 3) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » Issues » Associative arrays in D and default comparators (page 3)

September 08, 2006

Re: Associative arrays in D and default comparators

Posted by Walter Bright
in reply to Sean Kelly

Walter Bright

Posted in reply to Sean Kelly

Sean Kelly wrote:
> Walter Bright wrote:
>> Can you give an example of a class that could not have a meaningful opCmp implementation that one would want to put into an AA?
> 
> The one I've been wrestling with is a Thread class, as it has no data that is static and unique for the life of the object.  The thread id is only guaranteed to be unique while the thread is active, and the id isn't even assigned until Thread.start is called.  Threads simply have no features which inherently support ordering, but it makes sense to compare two Thread objects for equality simply by comparing object addresses.

One way to solve this is to have the Thread constructor add a unique sequence number.

September 08, 2006

Re: Associative arrays in D and default comparators

Posted by Walter Bright
in reply to Sean Kelly

Walter Bright

Posted in reply to Sean Kelly

Sean Kelly wrote:
> Walter Bright wrote:
>> Can you give an example of a class that could not have a meaningful opCmp implementation that one would want to put into an AA?
> 
> The one I've been wrestling with is a Thread class, as it has no data that is static and unique for the life of the object.  The thread id is only guaranteed to be unique while the thread is active, and the id isn't even assigned until Thread.start is called.  Threads simply have no features which inherently support ordering, but it makes sense to compare two Thread objects for equality simply by comparing object addresses.

How do you do a hash for it if there's no constant data?

September 08, 2006

Re: Associative arrays in D and default comparators

Posted by Steve Horne
in reply to Oskar Linde

Steve Horne

Posted in reply to Oskar Linde

On Fri, 08 Sep 2006 11:15:55 +0200, Oskar Linde <olREM@OVEnada.kth.se> wrote:

>The advantage of having binary trees at the nodes is that with a poor hash function, the AA degrades gracefully into O(log n).

Ah yes - this is important.

One reason I'm less than keen on hashtables is that I know that 99% of custom hash functions are, well, less than good. And I also know that I'm no expert on hashing. This is not a good combination.

>You would still need to handle the case of identical object hashes.

And now I understand what is happening, I feel very silly for suggesting it.

-- 
Remove 'wants' and 'nospam' from e-mail.

September 08, 2006

Re: Associative arrays in D and default comparators

Posted by xs0
in reply to Walter Bright

xs0

Posted in reply to Walter Bright

Walter Bright wrote:
> Ivan Senji wrote:
>> Yes: here is a suggestion: remove opCmp from Object. I think the only reason it is there is that when AAs where first implemented templates weren't where they are now so there was no way to check if an object has opCmp. These days a template version of AAs would be much better, and it would (if I'm not mistaken) remove the need for opCmp to be in Object.
> 
> While it'd be fun to offer a templated version of AAs, I feel the core capabilities should be available to the user without needing templates. This is because many programmers are not comfortable with them.

But if you hide the implementation via AA syntax sugar, those many programmers won't have to deal with templates at all?

> Can you give an example of a class that could not have a meaningful opCmp implementation that one would want to put into an AA?

Object :)

OK, that may be a bit Java-ish, but the simplest way to obtain a unique key/identifier is simply "new Object()". It works in HashMaps, is equal only to itself, and that's about everything you need in some cases.

----

I think opCmp should be removed from Object, it causes nothing but grief:
- it errors at runtime if not implemented, instead of at compile time
- it requires you to cast the parameter
- can never be inlined

Further, I think toHash should also be removed from Object, because the current implementation is buggy wrt compacting GCs, and as Object has no inherent data except its possibly changing address, a correct default toHash can only return 0 (or another constant), making it useless (alternatively, we could acknowledge that a compacting GC for a systems language is practically impossible to do and stop calling the current implementation buggy).

Then, a templated version of AAs can check for existence of those two functions and implements itself accordingly. Worst case is obviously just a list/array of key/value pairs with linear scanning on lookup, but

- at least it works
- can be still useful/fast enough for maps with only a few keys

----

BTW, is there any particular reason for hash_t? I think it would be better to use uint and make it non-platform-dependent (the latter is bad, and the extra 32 bits don't help at all in any realistic case)

xs0

September 08, 2006

Re: Associative arrays in D and default comparators

Posted by Sean Kelly
in reply to Walter Bright

Sean Kelly

Posted in reply to Walter Bright

Walter Bright wrote:
> Sean Kelly wrote:
>> Walter Bright wrote:
>>> Can you give an example of a class that could not have a meaningful opCmp implementation that one would want to put into an AA?
>>
>> The one I've been wrestling with is a Thread class, as it has no data that is static and unique for the life of the object.  The thread id is only guaranteed to be unique while the thread is active, and the id isn't even assigned until Thread.start is called.  Threads simply have no features which inherently support ordering, but it makes sense to compare two Thread objects for equality simply by comparing object addresses.
> 
> How do you do a hash for it if there's no constant data?

Using address, currently :-p  But that obviously has to change.  What might make the most sense from a hash perspective would be something like this:

class C {
    hash_t hash;
    hash_t toHash() {
        if (hash != hash.init)
            return hash;
        hash = cast(hash_t) this;
        return hash;
    }
}

But that leaves opCmp to wrestle with.  An "object id" would be a reasonable universal solution, but I don't want to pay for the synchronized op on construction.  I suppose I'm simply having a hard time getting over the idea that an object's address is is not a unique identifier in GCed languages.

Sean

September 08, 2006

Re: Associative arrays in D and default comparators

Posted by Steve Horne
in reply to xs0

Steve Horne

Posted in reply to xs0

On Fri, 08 Sep 2006 14:26:46 +0200, xs0 <xs0@xs0.com> wrote:

>But if you hide the implementation via AA syntax sugar, those many programmers won't have to deal with templates at all?

To me, code generated by the compiler is really just a hidden template. The only reason a template wouldn't be able to do everything that a built-in feature can is because the language doesn't have the template handling features to do it. But D has excellent template support.

Syntax sugar? Well, yes, to a point. But too much can make you very very sick. And while sugar and salt look the same...

Sure, templates can cause bloat. Any language feature is bad, when its used badly. And C++ templates are cryptic. So what? D is a better C++! So long as your not pushing the limits for the fun of it, D templates are much cleaner. Too much to cope with? Take a look in a builders van. Are programmers really so much more stupid than builders? We really can't cope with a full set of tools?

OOP was considered a new, cryptic, inefficient and scarey concept once. And structured programming, too. In fact, the current template FUD is really just an echo from when the first high level languages came out. It's all been said before... "bloat, inefficiency, me scared, wah!!!"

And all the time, off people go, grabbing perl, python and even XSLT to generate template code! I mean - have you ever tried to read an XSLT code template? The obfuscated C guys were humiliated!

To me, D is like the salvation I've been waiting for - its template handling is good enough that maybe I'll never have to reach for Python (at least as a code generator) again. No disrespect to Python, but it's always easier to write each project in one language.


BUT - much as I like templates, I also like D's built-in associative arrays. And not just because of syntax sugar.

The thing about generating code using templates is that, like anything, the better the tools you have, the more efficient and maintainable the end result can be. You can use the right tools for the job. Since you're not wasting time abusing the wrong tools, you can do the job properly. So it won't all fall down the first time someone leans on it.

D already beats C++ easily, but a good solid AA available at compile time is an attractive idea. Better still, when those AA literals arrive.

Its not either-or at all. Two complimentary features :-)

-- 
Remove 'wants' and 'nospam' from e-mail.

September 08, 2006

Re: Associative arrays in D and default comparators

Posted by Chris Nicholson-Sauls
in reply to Sean Kelly

Chris Nicholson-Sauls

Posted in reply to Sean Kelly

Sean Kelly wrote:
> Walter Bright wrote:
> 
>>
>> Although the current AA implementation only uses opCmp, having both available means that an implementation could use both or either. Sometimes opEquals can be computed a lot more efficiently than opCmp.
>> The current implementation uses hashing with binary trees, but an implementation isn't required to do it that way.
> 
> 
> This is good to know.  I'd considered playing with the current implementation, but was concerned about altering spec-defined behavior.
> 
>> Hashing with binary trees has proven in practice to be very efficient. It beats even C++ templated hash tables. I'm sure with some careful coding one can create a faster one customized to a particular data type, but the current general purpose one is pretty darned fast and compact. It's the central algorithm in DMDScript, for example, and DMDScript still beats the other javascript implementations around the block.
> 
> 
> I agree that it's got fantastic performance--my concerns here are really more practical, as I have objects I want to use as keys that don't contain any data which could be used for sorting purposes.  But it sounds like there's enough wiggle room in the spec to find some reasonable middle ground.  Thanks!
> 
> 
> Sean

If I recall correctly, the old "standard" Object.opCmp simply compared addresses.  Maybe that could be written into a mixin somewhere to be used by classes with no inherent logical order of their own.

-- Chris Nicholson-Sauls

September 08, 2006

Re: Associative arrays in D and default comparators

Posted by Walter Bright
in reply to Sean Kelly

Walter Bright

Posted in reply to Sean Kelly

Sean Kelly wrote:
> Walter Bright wrote:
>> How do you do a hash for it if there's no constant data?
> 
> Using address, currently :-p  But that obviously has to change.  What might make the most sense from a hash perspective would be something like this:
> 
> class C {
>     hash_t hash;
>     hash_t toHash() {
>         if (hash != hash.init)
>             return hash;
>         hash = cast(hash_t) this;
>         return hash;
>     }
> }

That will fail if the object gets moved.

> But that leaves opCmp to wrestle with.  An "object id" would be a reasonable universal solution, but I don't want to pay for the synchronized op on construction.  I suppose I'm simply having a hard time getting over the idea that an object's address is is not a unique identifier in GCed languages.

An opCmp has the same problem that toHash does; fix it for either and it is fixed for both. I think you'll have to do a unique id for each in order to store them in an AA, or else wait until the thread id is assigned before storing it.

September 08, 2006

opEquals should default to opCmp (was: Re: Associative arrays in D and default comparators)

Posted by Kirk McDonald
in reply to Walter Bright

Kirk McDonald

Posted in reply to Walter Bright

Walter Bright wrote:
> Although the current AA implementation only uses opCmp, having both available means that an implementation could use both or either. Sometimes opEquals can be computed a lot more efficiently than opCmp.

This brings up something that has always annoyed me about opCmp/opEquals. While it is true that sometimes opEquals can be computed more efficiently than opCmp, this is not always true. Sometimes there is only one operation for comparison that makes sense.

I think that Object's opEquals should simply call opCmp.

class Object {
    // ...
    int opEquals(Object o) {
        return this.opCmp(o);
    }
    // ...
}

This way, a class can simply overload opCmp to overload all comparison operations. If it can more efficiently overload opEquals, then it is free to do so.

-- 
Kirk McDonald
Pyd: Wrapping Python with D
http://pyd.dsource.org

September 08, 2006

Re: Associative arrays in D and default comparators

Posted by Ivan Senji
in reply to xs0

Ivan Senji

Posted in reply to xs0

xs0 wrote:
> Walter Bright wrote:
>> Ivan Senji wrote:
>>> Yes: here is a suggestion: remove opCmp from Object. I think the only reason it is there is that when AAs where first implemented templates weren't where they are now so there was no way to check if an object has opCmp. These days a template version of AAs would be much better, and it would (if I'm not mistaken) remove the need for opCmp to be in Object.
>>
>> While it'd be fun to offer a templated version of AAs, I feel the core capabilities should be available to the user without needing templates. This is because many programmers are not comfortable with them.
> 
> But if you hide the implementation via AA syntax sugar, those many programmers won't have to deal with templates at all?

Yes, everything would remain as it is now, except better.

> 
>> Can you give an example of a class that could not have a meaningful opCmp implementation that one would want to put into an AA?
> 
> Object :)

LOL

> 
> OK, that may be a bit Java-ish, but the simplest way to obtain a unique key/identifier is simply "new Object()". It works in HashMaps, is equal only to itself, and that's about everything you need in some cases.
> 
> ----
> 
> I think opCmp should be removed from Object, it causes nothing but grief:
> - it errors at runtime if not implemented, instead of at compile time
> - it requires you to cast the parameter
> - can never be inlined

Thanks xs0, you have put my exact thoughts into words :)

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation