February 17, 2010
Steven Schveighoffer wrote:
> We're not working in Assembly here.  This is a high level language, designed to hide the complexities of the underlying processor.  The processor has no idea whether the data in its registers is signed or unsigned.  The high level language does.  Please use that knowledge to prevent stupid mistakes, or is that not one of the goals of the compiler?  I can't believe this is such a hard point to get across.

It's not that I don't understand your point. I do, I just don't agree with it. At this point, we are going in circles so I don't think there's much value in me reiterating my opinions on it, except to say that Andrei and I once spent a great deal of time trying to separate signed from unsigned using the type system. The problem was that expressions tend to legitimately mix up signed and unsigned types together. Trying to tease out the "correct" sign of the result and what the programmer might have intended turned out to be an inscrutable mess of complication that we finally concluded would never work. It's a seductive idea, it just doesn't work. That's why C, etc. allows for easy implicit conversions between signed and unsigned, and why it has a set of (indeed, arbitrary) rules for combining them. Even though arbitrary, at least they are understandable and consistent.

Back when ANSI C was first finalized, there was a raging debate for years about whether C should use value-preserving or sign-preserving integral promotion rules. There were passionate arguments on both sides, both of which claimed the territory of intuitiveness and obviousness.

The end result was both sides eventually realized there was no correct answer, and that an arbitrary decision was required. It was made (value preserving), and half of the compiler vendors changed their compilers to match, and the rancor was forgotten.

For example, let's take two indices into an array, i and j:

    size_t i,j;

size_t is, by convention, unsigned.

Now, to get the distance between two indices:

   auto delta = i - j;

By C convention, delta is unsigned. If i is >= j, which may be an invariant of my algorithm, all is well. If i < j, suddenly delta is a very large value (but it still works, because of wrap around). The point is, there is no correct rule for dealing with the types of i-j. This has consequences:

Now, if j happens instead to be a complicated loop invariant expression (e) in a loop,

    loop
	auto delta = i - (e);

we may instead opt to hoist it out of a loop:

    auto j = -(e);
    loop
          auto delta = i + j;

and suddenly the compiler spits out error messages? Why can I subtract an unsigned, but not negate one? Such rules are complicated and will seem arbitrary to the user.


>>> The case I'm talking about is the equivalent to doing:
>>>  x = x / 0;
>>
>> Even mathematicians don't know what to do about divide by zero. But 2's complement arithmetic is well defined. So the situations are not comparable.
> 
> Sure they do, the result is infinity.  It's well defined.

I'm not a mathematician, but I believe it is not well defined, which one finds out when doing branch cuts. Per IEEE 754 (and required by D), floating point arithmetic divide by 0 resolves to infinity, but not all FPU hardware conforms to this spec. There is no similar convention for integer divide by 0. This is why the C standard leaves this as "implementation defined" behavior.
February 17, 2010
Since everyone seemed to miss the point I was trying to make, I'll be more explicit.

My point was that it's all very well to say "you should know X" and you can even be totally right about that, but it doesn't mean behaviour based on X is necessarily intuitive or desirable.

Walter specifically said that "I don't feel that 2s-complement arithmetic is in any way an advanced programming topic".  I agree.  I was trying to point out that even very experienced people who would surely know what's going on in the hardware can get it wrong.

Still, my fault for posting an example.
February 17, 2010
Daniel Keep Wrote:

> 
> Since everyone seemed to miss the point I was trying to make, I'll be more explicit.
> 
> My point was that it's all very well to say "you should know X" and you can even be totally right about that, but it doesn't mean behaviour based on X is necessarily intuitive or desirable.
> 
> Walter specifically said that "I don't feel that 2s-complement arithmetic is in any way an advanced programming topic".  I agree.  I was trying to point out that even very experienced people who would surely know what's going on in the hardware can get it wrong.

I agree with this.  However, even though experienced programmers can write 2s complement math with bugs, in this particular case, there is *no* correct way to do it.  So it's impossible to get it right :)

Any opposing view would have to include what obtaining the unsigned negation of an unsigned value is useful for.  And literals don't count because they're easily expressed otherwise :)

-Steve
February 17, 2010
Steven Schveighoffer wrote:
> Daniel Keep Wrote:
> 
>> Since everyone seemed to miss the point I was trying to make, I'll be
>> more explicit.
>>
>> My point was that it's all very well to say "you should know X" and you
>> can even be totally right about that, but it doesn't mean behaviour
>> based on X is necessarily intuitive or desirable.
>>
>> Walter specifically said that "I don't feel that 2s-complement
>> arithmetic is in any way an advanced programming topic".  I agree.  I
>> was trying to point out that even very experienced people who would
>> surely know what's going on in the hardware can get it wrong.
> 
> I agree with this.  However, even though experienced programmers can write 2s complement math with bugs, in this particular case, there is *no* correct way to do it.  So it's impossible to get it right :)
> 
> Any opposing view would have to include what obtaining the unsigned negation of an unsigned value is useful for.  And literals don't count because they're easily expressed otherwise :)

x & -x
Nonzero if x has has more than one bit set, ie is not a perfect power of 2. I use that often.
<rant>
My opinion: unsigned types get used *far* more often than they should. If this kind of behaviour confuses you, there's no way you should be using unsigned.
Really, the problem is that people abuse unsigned to mean 'this is a positive integer'. And of course, negation doesn't make sense in the context of the naturals. But that is NOT what unsigned is. Unsigned types are types with NO SIGN.
</rant>
February 17, 2010
Steven Schveighoffer Wrote:

> > Even mathematicians don't know what to do about divide by zero. But 2's complement arithmetic is well defined. So the situations are not comparable.
> 
> Sure they do, the result is infinity.  It's well defined.

This is a common misconception. Of course it depends on the definition you're working with, but the usual arithmetic on real numbers does not define division by zero. The operation just doesn't exist.

To get a bit more abstract, a so-called ring with unity (an algebraic abstraction of, among many other things, the reals) is a set of things, one of which is called "1", together with operations + and *. Division is defined only insofar as that some elements 'a' may have an inverse 'b' such that a*b=b*a=1. There is no requirement that all elements have an inverse (that would be a "group"), and 0 in the reals in particular doesn't have one. In fact, infinity is not a real number (it's not in the set of "things" we're considering), so it doesn't even make sense to say that the inverse of 0 is infinity.

http://en.wikipedia.org/wiki/Ring_theory

Sorry for off-topic, just riles me to see these half-truths repeated again and again.
February 17, 2010
Walter Bright:
> For example, let's take two indices into an array, i and j:
>      size_t i,j;
> size_t is, by convention, unsigned.

From what I've seen so far unsigned integers are useful when:
- You need a bit array, for example to implement a bloom filter, a bitvector, a bit set, when you want to do SWAR, when you need bit arrays to deal with hardware, etc.
- When you really need the full range of numbers 0 .. 2^n, this happens but it's uncommon.

In most other situations using unsigned numbers is unsafe (because other rules of the language make them unsafe, mostly) and it's better to use signed values. So array indices are better signed, as almost everything else. If you mix signed and unsigned arithmetic to index an array or to measure its length you will often introduce bugs in the code (because the language seems unable to manage ranges of values in a tidy way). It seems integral numbers is one of the things CommonLisp gets right and C/D do wrong.

Bye,
bearophile
February 17, 2010
Don Wrote:

> Steven Schveighoffer wrote:
> > Any opposing view would have to include what obtaining the unsigned negation of an unsigned value is useful for.  And literals don't count because they're easily expressed otherwise :)
> 
> x & -x
> Nonzero if x has has more than one bit set, ie is not a perfect power of
> 2. I use that often.

Fine, this would not be disallowed in my mind.  The abuse is not simply applying negation to an unsigned, it is then using the result as unsigned.

Incidentally, I always used the construct x & (x-1) to be zero if it's an exact power of 2.  It's not much different.

> My opinion: unsigned types get used *far* more often than they should.
> If this kind of behaviour confuses you, there's no way you should be
> using unsigned.
> Really, the problem is that people abuse unsigned to mean 'this is a
> positive integer'. And of course, negation doesn't make sense in the
> context of the naturals. But that is NOT what unsigned is. Unsigned
> types are types with NO SIGN.

If unsigned types get used far more often than they should, then they shouldn't be all over the place in the standard language (i.e. size_t is used for everything size related).  You simply can't avoid using unsigned.  It's also useful for when you don't think you should ever receive inputs that are negative.

That being said, this one issue of applying negation and then using the result as an unsigned is not a very common usage, and is always an error.  I don't see the harm in disallowing it.

-Steve
February 17, 2010
Steven Schveighoffer wrote:
> Don Wrote:
> 
>> Steven Schveighoffer wrote:
>>> Any opposing view would have to include what obtaining the unsigned negation of an unsigned value is useful for.  And literals don't count because they're easily expressed otherwise :)
>> x & -x
>> Nonzero if x has has more than one bit set, ie is not a perfect power of 2. I use that often.
> 
> Fine, this would not be disallowed in my mind.  The abuse is not simply applying negation to an unsigned, it is then using the result as unsigned.
> 
> Incidentally, I always used the construct x & (x-1) to be zero if it's an exact power of 2.  It's not much different.
> 
>> My opinion: unsigned types get used *far* more often than they should. If this kind of behaviour confuses you, there's no way you should be using unsigned.
>> Really, the problem is that people abuse unsigned to mean 'this is a positive integer'. And of course, negation doesn't make sense in the context of the naturals. But that is NOT what unsigned is. Unsigned types are types with NO SIGN.
> 
> If unsigned types get used far more often than they should, then they shouldn't be all over the place in the standard language 
(i.e. size_t is used for everything size related).

Yes, that's exactly my opinion. I think size_t being unsigned is the primary problem.

  You simply can't avoid using unsigned.


It's also useful for when you don't think you should ever receive inputs that are negative.
That's like using double for currency. You deserve what you get.


February 17, 2010
Don:
> Steven Schveighoffer:
> > If unsigned types get used far more often than they should, then they shouldn't be all over the place in the standard language
> (i.e. size_t is used for everything size related).
> 
> Yes, that's exactly my opinion. I think size_t being unsigned is the primary problem.
> 
>    You simply can't avoid using unsigned.
> 
> 
> It's also useful for when you don't think you should ever receive inputs
> that are negative.
> That's like using double for currency. You deserve what you get.

It seems you and Steven agree with what I am saying for some months (this is just the last I have written): http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=106381

By the way, I think in C# array indexes are signed: http://msdn.microsoft.com/en-us/library/system.collections.arraylist.item.aspx

Bye,
bearophile
February 18, 2010
On Tue, 16 Feb 2010 19:33:11 -0500, Walter Bright <newshound1@digitalmars.com> wrote:

> Steven Schveighoffer wrote:
>> We're not working in Assembly here.  This is a high level language, designed to hide the complexities of the underlying processor.  The processor has no idea whether the data in its registers is signed or unsigned.  The high level language does.  Please use that knowledge to prevent stupid mistakes, or is that not one of the goals of the compiler?  I can't believe this is such a hard point to get across.
>
> It's not that I don't understand your point. I do, I just don't agree with it. At this point, we are going in circles so I don't think there's much value in me reiterating my opinions on it, except to say that Andrei and I once spent a great deal of time trying to separate signed from unsigned using the type system. The problem was that expressions tend to legitimately mix up signed and unsigned types together. Trying to tease out the "correct" sign of the result and what the programmer might have intended turned out to be an inscrutable mess of complication that we finally concluded would never work. It's a seductive idea, it just doesn't work. That's why C, etc. allows for easy implicit conversions between signed and unsigned, and why it has a set of (indeed, arbitrary) rules for combining them. Even though arbitrary, at least they are understandable and consistent.

As long as you clarify that you understand I'm not talking about the entire field of 2s complement math, but only the small case of negating an unsigned and assigning it to an unsigned, I will drop the argument.  Because it is not clear from your arguments that you understand that.

>
> Back when ANSI C was first finalized, there was a raging debate for years about whether C should use value-preserving or sign-preserving integral promotion rules. There were passionate arguments on both sides, both of which claimed the territory of intuitiveness and obviousness.
>
> The end result was both sides eventually realized there was no correct answer, and that an arbitrary decision was required. It was made (value preserving), and half of the compiler vendors changed their compilers to match, and the rancor was forgotten.

D has made huge strides in creating designs that remove whole classes of errors from equivalent C code, just by making certain constructs illegal.  I don't think that citing the past failures of C is a good way to argue why D can't do it.

I'm thankful every time the compiler fails to compile code like

if(x == 5);
   doThis();

>
> For example, let's take two indices into an array, i and j:
>
>      size_t i,j;
>
> size_t is, by convention, unsigned.
>
> Now, to get the distance between two indices:
>
>     auto delta = i - j;
>
> By C convention, delta is unsigned. If i is >= j, which may be an invariant of my algorithm, all is well. If i < j, suddenly delta is a very large value (but it still works, because of wrap around). The point is, there is no correct rule for dealing with the types of i-j. This has consequences:
>
> Now, if j happens instead to be a complicated loop invariant expression (e) in a loop,
>
>      loop
> 	auto delta = i - (e);
>
> we may instead opt to hoist it out of a loop:
>
>      auto j = -(e);
>      loop
>            auto delta = i + j;
>
> and suddenly the compiler spits out error messages? Why can I subtract an unsigned, but not negate one? Such rules are complicated and will seem arbitrary to the user.

First, it would work under my rules.  j would be of type int.  Under my rules, negating an unsigned value equates to a signed version of that type.  It is the only operator which does so because it's more accurate 50% of the time over an unsigned type (the other 50% it becomes the same value, so either is just as accurate).  In all other cases of operators on unsigned types, the result should be unsigned because there's no more accurate answer, and most of the time you wish to remain in the same type.  To re-iterate, negation of a positive value (as defined by the type) implies the user wants a negative value.

Second, I think it's way more clear to do:

auto j = (e);
loop
     auto delta = i - e;

Just looking at this one line throws up red flags for me:

auto delta = i + e; // huh?  Isn't a delta a difference?

The only other rule I proposed is that assigning a negated unsigned to an unsigned (or passing it to a function that requires unsigned) would be illegal to prevent obvious mistakes.

>>>> The case I'm talking about is the equivalent to doing:
>>>>  x = x / 0;
>>>
>>> Even mathematicians don't know what to do about divide by zero. But 2's complement arithmetic is well defined. So the situations are not comparable.
>>  Sure they do, the result is infinity.  It's well defined.
>
> I'm not a mathematician, but I believe it is not well defined, which one finds out when doing branch cuts. Per IEEE 754 (and required by D), floating point arithmetic divide by 0 resolves to infinity, but not all FPU hardware conforms to this spec. There is no similar convention for integer divide by 0. This is why the C standard leaves this as "implementation defined" behavior.

I'm not a mathematician either, but I remember in school learning about working with infinity, and varying degrees of infinity.  For example, if series 1 approaches infinity twice as fast as series 2, then you could say series 1 divided by series 2 is equal to 2.  It was some interesting stuff, and I think dealing with infinity is mostly theoretical.  Applying this to computer programming is somewhat specialized, but my point was just as a comparison to something else that is mostly always an error.  There are several other constructs I could use, interestingly enough, most are dealing with one specific literals, where negation of an unsigned results in an error for over 2 billion values.

-Steve