Object.toString, toHash, opCmp, opEquals

The prototypes are:

```
string toString();
size_t toHash() @trusted nothrow;
int opCmp(Object o);
bool opEquals(Object o);
```

which long predated `const`. The trouble is, they should be:

```
string toString() const;
size_t toHash() const @trusted nothrow;
int opCmp(const Object o) const;
bool opEquals(const Object o) const;
```

Without the `const` annotations, the functions are not usable by `const` objects without doing an unsafe cast. This impairs anyone wanting to write const-correct code, and also impedes use of `@live` functions.

I recommend that everyone who has overloads of these functions, alter them to have the `const` signatures. This will future-proof them against any changes to Object's signatures.

April 26, 2024

Re: Object.toString, toHash, opCmp, opEquals

Posted by Timon Gehr
in reply to Walter Bright

Permalink

Timon Gehr

Posted in reply to Walter Bright

Permalink

On 4/26/24 01:06, Walter Bright wrote:
> The prototypes are:
> 
> ```
> string toString();
> size_t toHash() @trusted nothrow;
> int opCmp(Object o);
> bool opEquals(Object o);
> ```
> ...

Beautiful. If I could change anything, I would remove `@trusted nothrow` from `toHash`. Or just delete all the functions.

> which long predated `const`. The trouble is, they should be:
> 
> ```
> string toString() const;
> size_t toHash() const @trusted nothrow;
> int opCmp(const Object o) const;
> bool opEquals(const Object o) const;
> ```
> ...

No, please.

> Without the `const` annotations, the functions are not usable by `const` objects without doing an unsafe cast. This impairs anyone wanting to write const-correct code,

"const correctness" does not work in D because const

a) provides actual guarantees
b) is transitive

It is fundamentally incompatible with many common patters of object-oriented and other state abstraction. It is not even compatible with the range API. Uses of `const` are niche. `const` is nice when it does work, but it's not something you can impose on all code, particularly object-oriented code.

> and also impedes use of `@live` functions.
> ...

Perfect. I have no intention of using `@live` functions. I do not see their utility. It would be good to reuse the dataflow analysis you implemented for `@live` in some productive way though.

> I recommend that everyone who has overloads of these functions, alter them to have the `const` signatures. This will future-proof them against any changes to Object's signatures.

I will not do that, because if it does not outright break my code (e.g. because Phobos cannot support `const` ranges), it actually limits my options in the future in a way that is entirely unnecessary.

This is a non-starter. We need another solution.

April 25, 2024

Re: Object.toString, toHash, opCmp, opEquals

Posted by Jonathan M Davis
in reply to Walter Bright

Permalink

Jonathan M Davis

Posted in reply to Walter Bright

Permalink

On Thursday, April 25, 2024 5:06:27 PM MDT Walter Bright via Digitalmars-d wrote:
> The prototypes are:
>
> ```
> string toString();
> size_t toHash() @trusted nothrow;
> int opCmp(Object o);
> bool opEquals(Object o);
> ```
>
> which long predated `const`. The trouble is, they should be:
>
> ```
> string toString() const;
> size_t toHash() const @trusted nothrow;
> int opCmp(const Object o) const;
> bool opEquals(const Object o) const;
> ```
>
> Without the `const` annotations, the functions are not usable by `const` objects without doing an unsafe cast. This impairs anyone wanting to write const-correct code, and also impedes use of `@live` functions.
>
> I recommend that everyone who has overloads of these functions, alter them to have the `const` signatures. This will future-proof them against any changes to Object's signatures.

The problem with this is that D's const is not logical const, and some objects cannot work if these functions are fully const (e.g. beacuse they're using a mutex or because they have to lazily calculate their state). Having _any_ attributes on these functions is a problem, because those attributes restrict what derived classes can do. Similarly, not having any attributes causes problems, because then they can't be used in code that requires those attributes.

The solution to this problem (as has been discussed plenty of times in the past) is to outright remove these functions from Object. The only reason that they need to be there is because of code that's written to use Object instead of using templates, and D has templates. The main blocker then has been two things:

1. We haven't wanted to break existing code by removing these functions from Object. Editions hopefully give us a way to move past that problem.

2. Instead of being properly templated, some of the key druntime code (e.g. involving hash tables) has used Object. Some work has been done to fix various parts of druntime (like the hooks for arrays) so that they're templated, but the work has not been completed. If the various parts of druntime which require Object are fully fixed to be templated, then we don't need these functions on Object any longer. The code would just be instantiated with whatever the class type that's given is, and those functions can then have whatever attributes are appropriate for that particular class hierarchy without Object needing to have them any more than Object has a foobarWilly function, because some stray library needs that for its class hierarchy.

We currently have a partial solution in druntime in that the free function, opEquals, is templated so that if you try to use == on class references which are not Object, it will use the derived class' version of opEquals, thus allowing classes to define opEquals with whatever attributes are appropriate for that particular class hierarchy (as well as allowing them to make their opEquals take the type of that specific class instead of Object). The problem of course is that when you compare classes as Object, you get the Object version, but most code doesn't use Object directly, so in general, == works with whatever attributes we want, and if we get rid of opEquals from Object, those comparisons should still work. You then won't be able to use == on Object, but that's a pretty nonsensical comparison anyway. It's only ever made sense in code where you couldn't templatize it and therefore needed a base class to use (like Java does with its containers), and even then, for most code, it makes far more sense to use a base class from that particular project than to use Object, in which case, that base class can be given whatever functions or attributes are appropriate to that particular class hierarchy.

__cmp is similarly templated, though I'm not as familiar with how the lowerings work with regards to opCmp. But as long as all of the comparison operators lower to templated functions that call opEquals or opCmp on the class references with whatever type they have instead of with Object, we can work around Object's versions of those functions.

toHash and toString would typically be called more directly, but if the code that's calling them is templated rather than using Object, derived classes can currently declare versions of those functions with a different set of attributes (though since those functions don't have parameters, they're somewhat more restricted in which attributes can be used, since they can't overload on most attributes) - and with regards to const, they can overload the Object versions, meaning that they only have a problem if Object is being used.

Adding const - or any other attributes - to the functions on Object would be a step backwards rather than forwards and needlessly restrict code. Rather, we need to take advantage of templates and and Editions and make it so that none of these functions are on Object at all, allowing each individual class hierarchy to define these functions in whatever manner makes sense for that code. Editions gives us an opportunity here which we have not had previously, and we should take grab it.

- Jonathan M Davis

April 26, 2024

Re: Object.toString, toHash, opCmp, opEquals

Posted by Richard (Rikki) Andrew Cattermole
in reply to Walter Bright

Permalink

Richard (Rikki) Andrew Cattermole

Posted in reply to Walter Bright

Permalink

On 26/04/2024 11:06 AM, Walter Bright wrote:
> I recommend that everyone who has overloads of these functions, alter them to have the `const` signatures. This will future-proof them against any changes to Object's signatures.

We don't need to do this.

The solution that covers pretty much everyone needs is custom root classes.

Attributes, monitor field, reference counting, -betterC, all handled if we just let people define their own root class that they explicitly inherit from.

Right now language is far too coupled to druntime, and this is one area I want to see fixed. Too many issues crop up because it is too coupled.

April 25, 2024

Re: Object.toString, toHash, opCmp, opEquals

Posted by Walter Bright
in reply to Timon Gehr

Permalink

Walter Bright

Posted in reply to Timon Gehr

Permalink

On 4/25/2024 4:36 PM, Timon Gehr wrote:
>> Without the `const` annotations, the functions are not usable by `const` objects without doing an unsafe cast. This impairs anyone wanting to write const-correct code,
> 
> "const correctness" does not work in D because const
> 
> a) provides actual guarantees
> b) is transitive

It's not the C++ notion of const, sure. But the name still applies.

> It is fundamentally incompatible with many common patters of object-oriented and other state abstraction. It is not even compatible with the range API. Uses of `const` are niche. `const` is nice when it does work, but it's not something you can impose on all code, particularly object-oriented code.

Why would anyone, for example, try to mutate a range when it is passed to one of these functions?

>> and also impedes use of `@live` functions.
>> ...
> 
> Perfect. I have no intention of using `@live` functions. I do not see their utility.

The utility is being able to write borrow-checker style code, so you can avoid things like double frees.

As I recall, it was you that pointed out that reference counting can never be safe if two mutable pointers to the same ref counted object (one to the object, the other to its interior) were passed to a function. (Freeing the first can leave the second interior pointer pointing to a deleted object.) The entire ref counting scheme capsized because of this.

>> I recommend that everyone who has overloads of these functions, alter them to have the `const` signatures. This will future-proof them against any changes to Object's signatures.
> 
> I will not do that, because if it does not outright break my code (e.g. because Phobos cannot support `const` ranges), it actually limits my options in the future in a way that is entirely unnecessary.

Why would anyone need toHash(), toString(), opEquals() or opCmp() to mutate their data? Wouldn't that be quite surprising behavior?

April 26, 2024

Re: Object.toString, toHash, opCmp, opEquals

Posted by Timon Gehr
in reply to Walter Bright

Permalink

Timon Gehr

Posted in reply to Walter Bright

Permalink

On 4/26/24 02:57, Walter Bright wrote:
> On 4/25/2024 4:36 PM, Timon Gehr wrote:
>>> Without the `const` annotations, the functions are not usable by `const` objects without doing an unsafe cast. This impairs anyone wanting to write const-correct code,
>>
>> "const correctness" does not work in D because const
>>
>> a) provides actual guarantees
>> b) is transitive
> 
> It's not the C++ notion of const, sure. But the name still applies.
> ...

Well, we could come up with a better name, one that actually reflects that there are some pitfalls.

>> It is fundamentally incompatible with many common patters of object-oriented and other state abstraction. It is not even compatible with the range API. Uses of `const` are niche. `const` is nice when it does work, but it's not something you can impose on all code, particularly object-oriented code.
> 
> Why would anyone, for example, try to mutate a range when it is passed to one of these functions?
> ...

A range is useless unless it is mutable. The range interface is inherently mutable. To iterate a range, you have to call `popFront()` on it. There is no way to have a `const popFront()`.

> 
>>> and also impedes use of `@live` functions.
>>> ...
>>
>> Perfect. I have no intention of using `@live` functions. I do not see their utility.
> 
> The utility is being able to write borrow-checker style code, so you can avoid things like double frees.
> ...

`@live` does not enable this. Anyway, you are trying to impose nonsensical restrictions on garbage-collected code. I have yet to run into a double-free using GC allocation and I doubt `@live` would help me avoid that if it were a thing.

> As I recall, it was you that pointed out that reference counting can never be safe if two mutable pointers to the same ref counted object (one to the object, the other to its interior) were passed to a function. (Freeing the first can leave the second interior pointer pointing to a deleted object.) The entire ref counting scheme capsized because of this.
> ...

I provided the counterexample, but the unsound generalization is yours. (Technically, there would be ways to type check that code without banning mutation outright.)

>>> I recommend that everyone who has overloads of these functions, alter them to have the `const` signatures. This will future-proof them against any changes to Object's signatures.
>>
>> I will not do that, because if it does not outright break my code (e.g. because Phobos cannot support `const` ranges), it actually limits my options in the future in a way that is entirely unnecessary.
> 
> Why would anyone need toHash(), toString(), opEquals() or opCmp() to mutate their data? Wouldn't that be quite surprising behavior?
> 

As I keep pointing out, there is a difference between mutating abstract data and concrete memory locations. For instance, data types with amortized guarantees usually have to reorganize the internal data representation on each query. (Think e.g. splay trees.)

Anyway, let's for the sake of argument assume that I want to write functions that leave memory in exactly the state they encountered it in. Const will _still_ unduly restrict me because it is not fine-grained enough.

```d
import std.stdio, std.range, std.conv;

struct S{
    auto r=iota(1,2);
    string toString()const{ return text(r); }
}

void main(){
    S s;
    writeln(s);
}
```

Writes:
```d
const(Result)(1, 2)
```

Sometimes there is not even a safe workaround to get a mutable version of a range, because of transitive `const`. A range can have indirections in its implementation.

This is just one example establishing that `const` is not expressive enough to say _ONLY_ "this will not mutate anything". It also spells: "This code can be a huge pain in the ass at any point in the future for dumb, incidental reasons."

I really do not want to deal with this. I'd much rather fork Phobos so it uses non-const alternatives to toHash and toString.

If you expect people to prove properties to an incomplete type system via annotations and to accept unnecessary restrictions, they have to get some value out of it. You also would not go: "Starting from tomorrow, you have to prove to me that you brush your teeth every day. I want video evidence." And then, when I refuse, you can't say: "Why would you not brush your teeth?" This is what this is.

I caution you to now not miss the forest for the trees and engage in a "tooth-brushing related" argument (e.g., proposing a different range design or something like that). This is an inherent issue. Even if you make the type system more expressive, the annotation overhead is still real, and often uneconomical.

I am perfectly fine with having some restricted system like Rust for people who want to do safe manual memory management. This would even be useful to me. But this has to be opt-in, based on data structures, and interoperate as seamlessly as possible with the full language.

One thing I absolutely agree on with Robert is that it should always be _possible_ to write simple @safe D code without any advanced type system shenanigans. I think any design that strays from that principle is bad. This proposed change absolutely torpedoes that.

April 26, 2024

Re: Object.toString, toHash, opCmp, opEquals

Posted by Richard (Rikki) Andrew Cattermole
in reply to Walter Bright

Permalink

Richard (Rikki) Andrew Cattermole

Posted in reply to Walter Bright

Permalink

On 26/04/2024 12:57 PM, Walter Bright wrote:
> As I recall, it was you that pointed out that reference counting can never be safe if two mutable pointers to the same ref counted object (one to the object, the other to its interior) were passed to a function. (Freeing the first can leave the second interior pointer pointing to a deleted object.) The entire ref counting scheme capsized because of this.

This is the first time I have heard of this being a concern of yours.

Stuff like this is always solvable if we acknowledge (in other words write them all down) what the requirements are!

April 25, 2024

Re: Object.toString, toHash, opCmp, opEquals

Posted by Jonathan M Davis
in reply to Walter Bright

Permalink

Jonathan M Davis

Posted in reply to Walter Bright

Permalink

On Thursday, April 25, 2024 6:57:49 PM MDT Walter Bright via Digitalmars-d wrote:
> On 4/25/2024 4:36 PM, Timon Gehr wrote:
> >> Without the `const` annotations, the functions are not usable by `const`
> >> objects without doing an unsafe cast. This impairs anyone wanting to
> >> write
> >> const-correct code,
> >
> > "const correctness" does not work in D because const
> >
> > a) provides actual guarantees
> > b) is transitive
>
> It's not the C++ notion of const, sure. But the name still applies.

The name applies, but because D's const is transisitive and can't be backdoored, it poses a serious problem for certain categories of types to require it. As such, while in C++, it's normal to slap const on stuff all over the place, because the type is logically const, and any type that needs to mutate any portion of its state which is not part of that logical constness (e.g. a mutex) is perfectly free to do so via using the mutable keyword or casting away const. In contrast, it violates the type system for any D code to work around const like that, so it becomes problematic to use const all over the place like you would in C++, and making code "const correct" like you would in C++ is typically bad practice in D. It's great to use D's const where you can, but it's simply too restrictive to require it in the general case.

> > It is fundamentally incompatible with many common patters of object-oriented and other state abstraction. It is not even compatible with the range API. Uses of `const` are niche. `const` is nice when it does work, but it's not something you can impose on all code, particularly object-oriented code.
>
> Why would anyone, for example, try to mutate a range when it is passed to one of these functions?

If you can't mutate a range, you can't iterate through it. Your proposed DIP to be able to have a form of tail-const for ranges will help with that, but the fact still stands that some types will not work with const, because they need to mutate some portion of their state in order to function, even with functions that need to be logically const. If D's const were like C++'s const, this wouldn't be a problem, but the strong guarantees that D's const is supposed to provide make it completely incompatible with some code. As such, we really can't require it anywhere without causing problems. If you want to be able to require it, it needs to have backdoors; otherwise, a number of common coding idioms become impossible to use.

So, either we have backdoors that allow mutating const, and we can require const in places that need to be logically const, or we have const be strict about mutation and can't require that it be used. As things stand with D's const, that means that we can't require that it be used.

> >> I recommend that everyone who has overloads of these functions, alter them to have the `const` signatures. This will future-proof them against any changes to Object's signatures.
> >
> > I will not do that, because if it does not outright break my code (e.g. because Phobos cannot support `const` ranges), it actually limits my options in the future in a way that is entirely unnecessary.
>
> Why would anyone need toHash(), toString(), opEquals() or opCmp() to mutate
> their data? Wouldn't that be quite surprising behavior?

It would be surprising if the logical state of the type changed, but it wouldn't be at all surprising if some portion of the type which was not part of its logical state changed. A very simple case of this would be if the type contains a member variable which is shared and a mutex to protect access to that data (be it a mutex which is also a member variable or which is a member of the shared member variable). Any of those four functions would then need to lock that mutex in order to read the data so that they can do stuff like hash it or compare it. So, while the logical state wouldn't change, the object itself would be mutated in the process.

Similarly, if a type lazily initializes some portion of its state, and that initialization hasn't happened yet before one of those functions is called, then it's going to have to do that initialization as part of the call, which means mutating the object's state. Its logical state doesn't change, so for C++, this kind of thing would be a complete non-issue, but for D, because const doesn't allow any kind of mutation, such a type cannot have const functions.

And those are just two examples of cases where an object needs to be able to mutate some portion of its state in functions like opEquals, meaning that if we put const on opEquals, either such classes can no longer be written in D, or they're going to cast away const and mutate even if that does technically violate the type system's guarantees.

If you're just dealing with ints and pointers and arrays and the like, and you aren't dealing with user-defined types at all, then const generally doesn't cause many problems. But as soon as you're dealing with user-defined types, you start running into issues with const depending on what your code needs to do, and the more complex the code, the more likely it is that issues with const are going to pop up. The same goes with pretty much all of the attributes. They add restrictions which work in some cases but don't in many others.

So, for instance, it's usually bad practice to put const on the parameters for templated functions, since that means that whole categories of types won't work with that code, whereas if you don't use const, the caller can pass a const type, and it'll work just fine in that case so long as the type in question was designed to work with const. But the types that don't work with const will also work with that code, because the template doesn't have its parameter marked as const, and so the generated code won't use const.

We have the same problem with member functions on classes, but since they're virtual, we can't templatize that code. However, we can templatize the code that uses those classes, making the use of Object completely unnecessary, and then each class can define functions like opEquals with whatever set of attributes makes sense for that class' hierarchy. Derived classes within that hierarchy will then be stuck with the decisions made for the base class, but programmers can choose what makes the most sense for that particular class hierarchy, whereas we cannot possibly make that decision for all classes and not screw over developers in the process, because it's not one size fits all.

In general, we need to be trying to support the various attributes (including const) with druntime and Phobos, but we should not be requiring them, because they are all too restrictive for that to make sense. And that includes const.

- Jonathan M Davis

April 25, 2024

Re: Object.toString, toHash, opCmp, opEquals

Posted by Walter Bright
in reply to Timon Gehr

Permalink

Walter Bright

Posted in reply to Timon Gehr

Permalink

On 4/25/2024 6:32 PM, Timon Gehr wrote:
> A range is useless unless it is mutable. The range interface is inherently mutable. To iterate a range, you have to call `popFront()` on it. There is no way to have a `const popFront()`.

I agree there's no reason to have a const popFront(). But opEquals() is inherently non-mutable. Let's posit a mutating opEquals() and:

```
o.opEquals(o);
```

and the opEquals() mutated which one, or both, or what would happen if it did?

>> The utility is being able to write borrow-checker style code, so you can avoid things like double frees.
>> ...
> 
> `@live` does not enable this.

```
auto p = q;
free(p);
free(q);
```

> Anyway, you are trying to impose nonsensical restrictions on garbage-collected code. I have yet to run into a double-free using GC allocation and I doubt `@live` would help me avoid that if it were a thing.

D doesn't distinguish between gc pointers and non-gc pointers. It has been proposed, but I have very extensive experience with multiple pointer types and it is a cure worse than the disease.

>> As I recall, it was you that pointed out that reference counting can never be safe if two mutable pointers to the same ref counted object (one to the object, the other to its interior) were passed to a function. (Freeing the first can leave the second interior pointer pointing to a deleted object.) The entire ref counting scheme capsized because of this.
> I provided the counterexample, but the unsound generalization is yours.

All it takes is one counterexample to capsize it.

> (Technically, there would be ways to type check that code without banning mutation outright.)

Neither Andrei nor I nor anyone else working on it could figure out a solution (other than disallowing all pointers to payload). The borrow checker does solve it, though.

>> Why would anyone need toHash(), toString(), opEquals() or opCmp() to mutate their data? Wouldn't that be quite surprising behavior?
>>
> 
> As I keep pointing out, there is a difference between mutating abstract data and concrete memory locations. For instance, data types with amortized guarantees usually have to reorganize the internal data representation on each query. (Think e.g. splay trees.)
> 
> Anyway, let's for the sake of argument assume that I want to write functions that leave memory in exactly the state they encountered it in. Const will _still_ unduly restrict me because it is not fine-grained enough.
> 
> ```d
> import std.stdio, std.range, std.conv;
> 
> struct S{
>      auto r=iota(1,2);
>      string toString()const{ return text(r); }

I agree that mutates the argument passed to toString(). That would consume the range. Calling toString() again would return an empty string.

> Sometimes there is not even a safe workaround to get a mutable version of a range, because of transitive `const`. A range can have indirections in its implementation.
> This is just one example establishing that `const` is not expressive enough to say _ONLY_ "this will not mutate anything". It also spells: "This code can be a huge pain in the ass at any point in the future for dumb, incidental reasons."
> 
> I really do not want to deal with this. I'd much rather fork Phobos so it uses non-const alternatives to toHash and toString.

I suppose it wouldn't help if I suggest:

```
writeln(text(r));
```

I only proposed the const toString() for Object.toString(), not for struct, where indeed you are free to have struct toString() do anything you want.

Class and struct are fundamentally different in that class is a universal hierarchy with a common root, and hence we must define what that common root is. Struct, on the other hand, is rootless, and hence the user can define it however he pleases.

I agree with you that Object shouldn't have had any members, and Andrei and I did discuss that, but since it had members, we couldn't really take them away. Note that COM classes also have a common root with one member QueryInterface().

> If you expect people to prove properties to an incomplete type system via annotations and to accept unnecessary restrictions, they have to get some value out of it. You also would not go: "Starting from tomorrow, you have to prove to me that you brush your teeth every day. I want video evidence." And then, when I refuse, you can't say: "Why would you not brush your teeth?" This is what this is.
> 
> I caution you to now not miss the forest for the trees and engage in a "tooth-brushing related" argument (e.g., proposing a different range design or something like that). This is an inherent issue. Even if you make the type system more expressive, the annotation overhead is still real, and often uneconomical.
> 
> I am perfectly fine with having some restricted system like Rust for people who want to do safe manual memory management. This would even be useful to me. But this has to be opt-in, based on data structures, and interoperate as seamlessly as possible with the full language.

I think I see your point of view. Mine is a little different. I have considerable experience with C. When I see:

```
int foo(T* p);
```

Is p an array? is foo() going to mutate what it points to? Is foo() going to free() it? How would I know without reading the implementation? (The documentation is always incomplete, wrong, or missing.) Annotations give me confidence that I understand what it does. const/ref/scope here answer my questions, and the compiler backs it up.

> One thing I absolutely agree on with Robert is that it should always be
> _possible_ to write simple @safe D code without any advanced type system
> shenanigans. I think any design that strays from that principle is bad. This
> proposed change absolutely torpedoes that.

I agree with Robert, too. I asked him to prepare a list of his proposals so I can see what can be done.

P.S. const class Objects are more or less unusable with the non-const toString, toHash, opCmp and opEquals.

P.P.S. all of D's annotations are subtractive. This means you can write code without annotations and it'll work. But safe, probably not.

P.P.P.S. I almost never write a multiple free bug these days. But that doesn't translate to "don't need double free protection", as I spent many years making that mistake and tracking them down. I even wrote my own malloc/free debugger to help. Eventually, I simply internalized what not to do. But that isn't a transferable skill. I can't even explain what I do.

Anyhow, thanks for the food for thought!

April 25, 2024

Re: Object.toString, toHash, opCmp, opEquals

Posted by Walter Bright
in reply to Richard (Rikki) Andrew Cattermole

Permalink

Walter Bright

Posted in reply to Richard (Rikki) Andrew Cattermole

Permalink

On 4/25/2024 6:39 PM, Richard (Rikki) Andrew Cattermole wrote:
> This is the first time I have heard of this being a concern of yours.

It was a working group.

> Stuff like this is always solvable if we acknowledge (in other words write them all down) what the requirements are!

We had a requirement for memory safety. Without it, RC was more of a step sideways than forwards.

Top | Forum index | About this forum

Forums