Shortcomings of D-Style Fused Operator Overloading

Shortcomings of D-Style Fused Operator Overloading
Jul 07 Ahmet Sait
Jul 07 Richard (Rikki) Andrew Cattermole
Jul 07 Quirin Schroll
Jul 07 Richard (Rikki) Andrew Cattermole
Jul 07 IchorDev
Jul 07 Richard (Rikki) Andrew Cattermole
Jul 07 IchorDev
Jul 07 Ahmet Sait
Jul 07 Dennis
Jul 07 Quirin Schroll
Jul 07 Ahmet Sait
Jul 07 Timon Gehr
Jul 08 Dom DiSc
Jul 08 Ahmet Sait
Jul 08 Timon Gehr
Jul 08 Timon Gehr
Jul 07 deadalnix
Jul 07 Jonathan M Davis
Jul 09 monkyyy

July 07

Posted by Ahmet Sait

Permalink

Ahmet Sait

Permalink

For the past few weeks I've been working on a simple compiler for my grad project, and I was trying to make use of operator overloading in some of my types named Interval and Constant.

Interval supposed to be a type that supports all kinds of arithmetic operations with another Interval which would be used for implementing Value Range Propagation.

Now to the operator overloading part. As I was trying to implement the Interval type, I realized an interesting issue. The way operator overloading works in D is with a unified opCmp() method that returns -1, 0 or 1 depending on the inequality, and a < b gets lowered to a.opCmp(b) < 0. However, implementing inequality correctly for intervals requires us to know the exact operator used.

Let's define two intervals x and y:
{x ∈ ℝ │ a ≤ x ≤ b}
{y ∈ ℝ │ c ≤ y ≤ d}

And the inequality operators:

x < y = ⎧ 1  if b < c
        ⎨
        ⎩ 0  otherwise

x ≤ y = ⎧ 1  if b ≤ c
        ⎨
        ⎩ 0  otherwise

x > y = ⎧ 1  if a > d
        ⎨
        ⎩ 0  otherwise

x ≥ y = ⎧ 1  if a ≥ d
        ⎨
        ⎩ 0  otherwise

Exercise: Write an opCmp() method that implements the rules above for struct Interval { int lower, upper; }. (Hint: It's not possible.)

I didn't fully implement or used this Interval type I was cooking because I was running out of time for the project, so I opted for something simpler which is to just do constant folding, hence the Constant type. Constant is basically a tagged union like bool|integer|real, and it doesn't have the fuzzy ordering issues like in the Interval type above.

For Constant, I wanted to implement all unary operators, one of which is logical not !. The only way to do this in D is by writing a bool opCast(T:bool), except I don't want to return a bool! I want to return another Constant that represents the logical inverse of what the current Constant contains, and be able write this:

Constant c2 = !c1; // Lowered to !c1.opCast!bool

As you can guess, returning a Constant from opCast() instead of bool results in this error message:

Error: cannot implicitly convert expression c1.opCast() of type Constant to bool

Now you might be thinking, I could just define a bunch of regular functions such as logicalNot(), lowerOrEquals() etc. and you would be right (which is what I ended up doing anyway). I guess the point here is that the way D merges these operators to reduce potential bugs actually end up making certain things inexpressible naturally.

So what do you guys think? Would you say distinct operators are better?

July 07

Re: Shortcomings of D-Style Fused Operator Overloading

Posted by Richard (Rikki) Andrew Cattermole
in reply to Ahmet Sait

Permalink

Richard (Rikki) Andrew Cattermole

Posted in reply to Ahmet Sait

Permalink

This does beg the question, should we be supporting Unicode mathematical symbols for operators?

In theory this should be easy enough to do, its just some Unicode tables that the lexer just needs a bit of info on how to categorize after all.

Will need input from Walter if this is an acceptable path to look into further though.

July 07

Re: Shortcomings of D-Style Fused Operator Overloading

Posted by Quirin Schroll
in reply to Richard (Rikki) Andrew Cattermole

Permalink

Quirin Schroll

Posted in reply to Richard (Rikki) Andrew Cattermole

Permalink

On Friday, 7 July 2023 at 09:38:37 UTC, Richard (Rikki) Andrew Cattermole wrote:

This does beg the question, should we be supporting Unicode mathematical symbols for operators?

No. If an editor, font, or terminal doesn’t support this, you’re in a bad spot. We had this discussion already. Anyone can use a OTF/TTF font that displays <= as a wider ≤ and call it a day if they want to.

July 07

Re: Shortcomings of D-Style Fused Operator Overloading

Posted by Richard (Rikki) Andrew Cattermole
in reply to Quirin Schroll

Permalink

Richard (Rikki) Andrew Cattermole

Posted in reply to Quirin Schroll

Permalink

On 07/07/2023 10:34 PM, Quirin Schroll wrote:
> On Friday, 7 July 2023 at 09:38:37 UTC, Richard (Rikki) Andrew Cattermole wrote:
>> This does beg the question, should we be supporting Unicode mathematical symbols for operators?
> 
> No. If an editor, font, or terminal doesn’t support this, you’re in a bad spot. We had this discussion already. Anyone can use a OTF/TTF font that displays <= as a wider ≤ and call it a day if they want to.

We are not talking about basic comparison operators that can be represented in a glyph by a ligature.

http://www.unicode.org/charts/PDF/U2200.pdf

http://www.unicode.org/charts/PDF/U27C0.pdf

http://www.unicode.org/charts/PDF/U2A00.pdf

There is a lot of mathematical operators which have no equivalent operator overload.

July 07

Re: Shortcomings of D-Style Fused Operator Overloading

Posted by Dennis
in reply to Ahmet Sait

Permalink

Dennis

Posted in reply to Ahmet Sait

Permalink

On Friday, 7 July 2023 at 09:18:38 UTC, Ahmet Sait wrote:

So what do you guys think? Would you say distinct operators are better?

The thing is, D's operator overloading was designed to support library numeric types, such as half float, fixed point, complex numbers, vector3 etc. It's intentionally limited in order to prevent Domain Specific Languages embedded in D code directly. Templates / string mixins can still be used for that.

Now there are other mathematical objects that could almost be modeled as a D type with operator overloading, but then it only gets you 90% there before you hit a limitation like opCmp for your interval. I have a similar case where my symbolic number type could use an opCmp returning another typeof(this) instead of an int or float, but alas.

I don't think a proposal to make operator overloading a lot more powerful has much chance, but only making opCmp a bit more powerful might if you can demonstrate valid use cases for it.

July 07

Re: Shortcomings of D-Style Fused Operator Overloading

Posted by IchorDev
in reply to Richard (Rikki) Andrew Cattermole

Permalink

IchorDev

Posted in reply to Richard (Rikki) Andrew Cattermole

Permalink

On Friday, 7 July 2023 at 10:39:40 UTC, Richard (Rikki) Andrew Cattermole wrote:

We are not talking about basic comparison operators that can be represented in a glyph by a ligature.

http://www.unicode.org/charts/PDF/U2200.pdf

http://www.unicode.org/charts/PDF/U27C0.pdf

http://www.unicode.org/charts/PDF/U2A00.pdf

There is a lot of mathematical operators which have no equivalent operator overload.

I hope that would never happen. I'm a programmer, I have no idea how to read mathematical operators. I don't want to be forced to decipher the code of others that uses them. It's already enough of a headache when I want to research a computer science topic and everything gets explained exclusively in complicated mathematical notation. I'm not a mathematician, I work exclusively with numbers of finite complexity.
Additionally, mathematical operators present a huge readability issue. People had enough trouble differentiating (e.g.) 5l and 51 that 5l was removed. Many mathematical operators look way too similar unless they're magnified. I think functions with shortened names are just fine...

July 07

Re: Shortcomings of D-Style Fused Operator Overloading

Posted by Ahmet Sait
in reply to Richard (Rikki) Andrew Cattermole

Permalink

Ahmet Sait

Posted in reply to Richard (Rikki) Andrew Cattermole

Permalink

On Friday, 7 July 2023 at 10:39:40 UTC, Richard (Rikki) Andrew Cattermole wrote:
> [snip]

I didn't post this to discuss unicode characters so I would suggest people to stop derailing discussion with any of that. Thanks.

July 08

Re: Shortcomings of D-Style Fused Operator Overloading

Posted by Richard (Rikki) Andrew Cattermole
in reply to IchorDev

Permalink

Richard (Rikki) Andrew Cattermole

Posted in reply to IchorDev

Permalink

I do not know how to read them either. I doubt you'd see much code using them.

It would only be the people who are very familiar with their usage who would find any use of it. If for no other reason than everyone else do not have quick access to those characters already setup.

July 07

Re: Shortcomings of D-Style Fused Operator Overloading

Posted by IchorDev
in reply to Richard (Rikki) Andrew Cattermole

Permalink

IchorDev

Posted in reply to Richard (Rikki) Andrew Cattermole

Permalink

On Friday, 7 July 2023 at 12:09:19 UTC, Richard (Rikki) Andrew Cattermole wrote:
> I do not know how to read them either. I doubt you'd see much code using them.
>
> It would only be the people who are very familiar with their usage who would find any use of it. If for no other reason than everyone else do not have quick access to those characters already setup.

They could always use mixins that replace those characters with function calls. What I worry about is: as a language feature, it might become common enough that I have to read through it in the code of others to understand what their code is doing.

July 07

Re: Shortcomings of D-Style Fused Operator Overloading

Posted by Quirin Schroll
in reply to Ahmet Sait

Permalink

Quirin Schroll

Posted in reply to Ahmet Sait

Permalink

TL;DR: Getting orderings right in a programming language is hard.

On Friday, 7 July 2023 at 09:18:38 UTC, Ahmet Sait wrote:

[…]
Exercise: Write an opCmp() method that implements the rules above for struct Interval { int lower, upper; }. (Hint: It's not possible.)

One can’t because D’s philosophy behind opCmp is a total ordering and that a <= b is the same as a < b || a == b (which both isn’t the case for your intervals).

The total ordering part can be stretched to a partial ordering by making opCmp return float and use float.nan to indicate that the elements are unrelated.

[…]

So what do you guys think? Would you say distinct operators are better?

Yes, 100%. Orderings cannot really be done using D’s approach with opXXX. Ideally, the programmer provides minimal details about the ordering and the compiler generates the rest so that ordering is consistent. As an example, C++ up until C++20 requires you to write both == and !=. In almost every circumstance, one would define != in terms of == completely mechanical. I see no reason why != should even be overloadable. Maybe there are examples where a != b is not the same as !(a == b), I’m open enough to accept that, even though I’m not creative enough to find something where this is reasonable. C++20 also added the <=> operator to allow for D’s approach because in a lot of cases, it serves the programmer well.

I agree with you that opCmp is simplistic design; partial orderings feel like a hack. I’ve thought about orderings a lot and I’m glad you’ve exposed me to this, because now I need to re-consider what the most general properties of ordering relations could be. However, I think that your ordering relations make little sense.

What properties should we minimally expect of ==, < and <=, >, and >= to be reasonable?

Rewrites
- a != b is the same as !(a == b), therefore != should not be user-definable, == suffices.
- a > b is the same as b < a, therefore > should not be user-definable, < suffices.
- a >= b is the same as b <= a, therefore >= should not be user-definable, <= suffices.
Individual properties
- <, <=, and == are transitive: E.g. for <: If a < b and b < c then a < c.
- < is totally irreflexive: a < a is always false.
- < is asymmetric: If a < b, then b < a is false.
- == is symmetrical: a == b is the same as b == a.
- <= is partially reflexive: For any a and b, if a <= b, then a <= a and b <= b.
- == is partially reflexive as a consequence from transitivity and symmetry: If a == b, by symmetry, also b == a, thus by transitivity, a == a and b == b.
Interactions
- a == b and a < b are mutually exclusive; in particular:
  - If a == b, then a < b is false. (This is a strengthening of irreflexivity.)
  - If a < b, then a != b.
  - Because == is symmetrical, the same for b < a.
- If a < b or a == b, then a <= b (definition of less-or-equal).
- If a <= b, then a < b or a == b (definition of less-or-equal).
- If a <= b and a >= b, then a == b (Anti-symmetry).

The last two or three are debatable; I have not put them in Rewrites intentionally. As presented, the intervals don’t adhere to the last two. It’s not even clear what == on the intervals would be. It looks like the intervals are supposed to represent any value in the bounds, which means that – unless the lower and bound is the same, i.e. the interval is a single number – I == I should actually fail.

In the axioms I presented, reflexivity is partial because == and <= on IEEE floats are not reflexive: float.nan != float.nan, but any value that is equal to any other value is equal to itself. And, of course, the axioms are supposed to be true for build-in types.

While partial in the sense that not all elements are included, IEEE floats are partially totally ordered:

We indeed have a <= b or b <= a for all a and b such that a == a and b == b.

IEEE floats are also anti-symmetric, and in general, == should be the partial equivalence relation induced by <=: a == b if and only if a <= b and a >= b.
For IEEE floats, positive and negative zero are equivalent, +0.0 == -0.0, but they are not equal (in bit pattern, i.e. +0.0 !is -0.0 and behavior: 1/+0.0 != 1/-0.0). As another example, the not-a-number values are behaviorally the same, they’re only technically distinct.

I’ve written an answer on StackOverflow to the question What is the difference between equality and equivalence? A TL;DR is: D’s is operator is (at least close to) mathematical equality, whereas the == operator is merely the default equivalence relation, which can be anything, and (think of x and y as ref parameters) &x == &y in D represents identity (any mutation of x is a mutation of y).

The C++ proposal P0515R0 is also a good read. It was rejected, but introduces relevant notions. Look at least at § 1.4.1 Guidance: What we teach.

Top | Forum index | About this forum

Forums