November 22, 2008
http://d.puremagic.com/issues/show_bug.cgi?id=1977





------- Comment #10 from schveiguy@yahoo.com  2008-11-22 12:39 -------
(In reply to comment #7)
> In general, we want to go with the simple rule is to have an operation return the tightest type that won't engender overflow (which precludes making 8- and 16-bit values as closed sets for addition).
>

Really?  I disagree that 16 bit addition causes frequent overflow. I'm not sure in what contexts you are using it, can you give an example?

And 8-bit addition is very frequent for character transformations, i.e. converting to upper case:

c += 'A' - 'a';

Casting seems too strict a requirement in these types of situations.  I can't imagine that anyone has a positive experience with these warnings, most are just going to grumble, then insert the cast without thinking about it.

> The exception to that is int, which
> for a combination of practical reasons, "stays" int even if it could overflow,
> and also long, which cannot go any larger. Anecdotal evidence suggests that
> unintended overflows can be more annoying than having to insert the occasional
> cast.

I haven't seen such anecdotal evidence.  I don't think I've ever seen an
overflow due to addition that wasn't intended in my code on 16 or 8-bit values.
 The one case where casting should be required is doing comparisons of signed
to unsigned values, where the comparison flips what it should be.  A classic
case that I've had with C++ is comparing the size() of a vector to some
subtraction of integers.  But this should be flagged because you are comparing
signed to unsigned (and most good C++ compilers flag that).

Multiplication might be a different story.

> 
> We could relax this rule by having the compiler statically tracking possible ranges of values.

Why not just relax it to allow reassignment to the same type?  I don't think that's an uncommon usage, and I don't think it would cause rampant failures.

> The random ints argument does not quite hold because one seldom adds fully random 32-bit values. "Most integers are small."

Most integers are small, including 16-bit integers.  If one uses a 16-bit integer, they are generally doing so because they know the domain of such values is small.

8-bit integers that one performs math on are generally characters, generally used for transforming them.  Most of the time, the domain of such values is known to be less than the domain of 8-bit integers.  For example, ascii characters.


-- 

November 22, 2008
http://d.puremagic.com/issues/show_bug.cgi?id=1977





------- Comment #11 from andrei@metalanguage.com  2008-11-22 12:51 -------
(In reply to comment #9)
> (In reply to comment #8)
> > The plan is to have sensible bitwise operations preserve the size of their operands. Only arithmetic and shift will "spill" into larger types.
> > 
> 
> I hope you mean only *left* shift will spill into a larger type.

Correct. So let's recap a possible set of rules:

(a) Operations will yield the statically tightest type possible, but never smaller than the largest of the two operands.

(b) However, (a) will not cause automatic promotion from 32-bit to 64-bit,
i.e., unless at least one operand is 64-bit, the result will never be 64-bit.

(c) (Not yet implemented) If one operand's value is statically-known, further
tightening without a cast is possible, e.g.:

uint a = ...;
byte b = a & 1; // pass, no cast needed

(d) (Not yet implemented, open-ended) Even if operand values are not statically
known, their possible range is computed statically in a flow-insensitive manner
and used for validation, e.g.:

uint a = 4;
if (condition) a = 200;
// a is not in range [4, 200]
ubyte x = a & 200; // pass

====

The "but never smaller than the largest of the two operands" is meant to avoid surprises of the following sort:

uint a = ...;
auto b = a & 1;
// how do you mean b is ubyte?!?

However, notice that due to (c), explicitly asking for a byte does work.


-- 

November 22, 2008
http://d.puremagic.com/issues/show_bug.cgi?id=1977





------- Comment #12 from andrei@metalanguage.com  2008-11-22 13:22 -------
(In reply to comment #10)
> (In reply to comment #7)
> > In general, we want to go with the simple rule is to have an operation return the tightest type that won't engender overflow (which precludes making 8- and 16-bit values as closed sets for addition).
> >
> 
> Really?  I disagree that 16 bit addition causes frequent overflow. I'm not sure in what contexts you are using it, can you give an example?

The most recent example that comes to mind is std.format. Out of probably too much zeal, I store the width and precision as short numbers. There are several places in the code using them where I had to pay close attention to possible overflows.

> And 8-bit addition is very frequent for character transformations, i.e. converting to upper case:
> 
> c += 'A' - 'a';
> 
> Casting seems too strict a requirement in these types of situations.  I can't imagine that anyone has a positive experience with these warnings, most are just going to grumble, then insert the cast without thinking about it.

Notice that in the particular example you mention, the code does go through because it uses +=.

> > The exception to that is int, which
> > for a combination of practical reasons, "stays" int even if it could overflow,
> > and also long, which cannot go any larger. Anecdotal evidence suggests that
> > unintended overflows can be more annoying than having to insert the occasional
> > cast.
> 
> I haven't seen such anecdotal evidence.  I don't think I've ever seen an overflow due to addition that wasn't intended in my code on 16 or 8-bit values.

This may mean that you are a great coder and that you and I frequent different circles.

>  The one case where casting should be required is doing comparisons of signed
> to unsigned values, where the comparison flips what it should be.  A classic
> case that I've had with C++ is comparing the size() of a vector to some
> subtraction of integers.  But this should be flagged because you are comparing
> signed to unsigned (and most good C++ compilers flag that).
> 
> Multiplication might be a different story.
> 
> > 
> > We could relax this rule by having the compiler statically tracking possible ranges of values.
> 
> Why not just relax it to allow reassignment to the same type?  I don't think that's an uncommon usage, and I don't think it would cause rampant failures.

Walter believes the same. I disagree.

> > The random ints argument does not quite hold because one seldom adds fully random 32-bit values. "Most integers are small."
> 
> Most integers are small, including 16-bit integers.  If one uses a 16-bit integer, they are generally doing so because they know the domain of such values is small.

Storage considerations may be at stake though. In fact, most of the uses of small integers I've seen in C and C++ come from a storage/format requirement, not a range requirement. In fact, what I'm saying is a tautology because in C and C++ there is very little enforcement on range, which means there is very low incentive to express small ranges with small integers.

> 8-bit integers that one performs math on are generally characters, generally used for transforming them.  Most of the time, the domain of such values is known to be less than the domain of 8-bit integers.  For example, ascii characters.

The compiler can't guess such legitimacy. In the characters domain, we might be able to target our effort towards improving operations on char, wchar, and dchar.


-- 

November 24, 2008
http://d.puremagic.com/issues/show_bug.cgi?id=1977





------- Comment #13 from schveiguy@yahoo.com  2008-11-24 13:41 -------
(In reply to comment #12)
> (In reply to comment #10)
> > c += 'A' - 'a';
> > 
> > Casting seems too strict a requirement in these types of situations.  I can't imagine that anyone has a positive experience with these warnings, most are just going to grumble, then insert the cast without thinking about it.
> 
> Notice that in the particular example you mention, the code does go through because it uses +=.

Wow, that surprises me.

c = c + c should be equivalent to c += c;

So right there, either both should be invalid, or neither should.  Both have an equal chance of overflow.

In fact, using obj2asm, I found that both are essentially equivalent.  With optimization on, here are the two different code generations (minus all bookkeeping stuff):

char foo(char c)
{
   c += c; // version 1
   c = c + c; // version 2
   return c;
}

c += c;
<               mov     ECX,EAX
<               add     CL,CL
<               mov     AL,CL

c = c + c;
>               push    EAX
>               movzx   ECX,byte ptr -4[EBP]
>               mov     EAX,ECX
>               add     AL,AL

These are essentially the same, except in the second version, the compiler optimizer didn't see the oportunity to get down to the same code as the first version (there's probably even more optimization to be had even in the first version).  But the danger of overflow still exists in both cases.  I'm not sure why one is treated more dangerously than the other by the compiler.


-- 

November 24, 2008
http://d.puremagic.com/issues/show_bug.cgi?id=1977





------- Comment #14 from andrei@metalanguage.com  2008-11-24 14:47 -------
(In reply to comment #13)
> (In reply to comment #12)
> > (In reply to comment #10)
> > > c += 'A' - 'a';
> > > 
> > > Casting seems too strict a requirement in these types of situations.  I can't imagine that anyone has a positive experience with these warnings, most are just going to grumble, then insert the cast without thinking about it.
> > 
> > Notice that in the particular example you mention, the code does go through because it uses +=.
> 
> Wow, that surprises me.
> 
> c = c + c should be equivalent to c += c;
> 
> So right there, either both should be invalid, or neither should.  Both have an equal chance of overflow.

It shouldn't be that surprising, particularly considering that Java and C# obey
the same rules. The correct equivalence is that c = c + c is really c =
cast(typeof(c))(c + c).

> In fact, using obj2asm, I found that both are essentially equivalent.

That's irrelevant. Operational semantics are different from typechecking.


-- 

November 24, 2008
http://d.puremagic.com/issues/show_bug.cgi?id=1977





------- Comment #15 from andrei@metalanguage.com  2008-11-24 14:48 -------
(In reply to comment #14)
> (In reply to comment #13)
> > (In reply to comment #12)
> > > (In reply to comment #10)
> > > > c += 'A' - 'a';
> > > > 
> > > > Casting seems too strict a requirement in these types of situations.  I can't imagine that anyone has a positive experience with these warnings, most are just going to grumble, then insert the cast without thinking about it.
> > > 
> > > Notice that in the particular example you mention, the code does go through because it uses +=.
> > 
> > Wow, that surprises me.
> > 
> > c = c + c should be equivalent to c += c;
> > 
> > So right there, either both should be invalid, or neither should.  Both have an equal chance of overflow.
> 
> It shouldn't be that surprising, particularly considering that Java and C# obey
> the same rules. The correct equivalence is that c = c + c is really c =
> cast(typeof(c))(c + c).

I meant c += c is equivalent to cast(typeof(c))(c + c).


-- 

November 24, 2008
http://d.puremagic.com/issues/show_bug.cgi?id=1977





------- Comment #16 from schveiguy@yahoo.com  2008-11-24 16:55 -------
I searched around, and you are right that C# disallows compiling byte + byte operands, and it does allow += operands.  The reasons given were not to forbid reassignment to the same type for fear of overflow (as is obvious by allowing the += operation), the point is to prevent operation overflow where it is not expected. for example:

int x = (byte)64 + (byte)64;

should result in x == 128, not x == -128.

And the enforcement is not in the compiler warning system, the enforcement is that they only define op codes for integer arithmetic, so the compiler promotes the bytes to integers which result in an integer.

But C++ does not forbid it, at least with g++ (even with -Wall).

This is not to say that the choices C# made are correct, it's just that there is precedent in C# (couldn't find Java reference, but I'm sure it's the same).

Here is a possible solution that allows current safe behavior and relaxes the implicit casting rules enough so that overflow is allowed to happen in the correct situations:

I think everyone agrees that the following:

byte b = 64;
int i = b + b;

should produce i == 128.

And most believe that:

byte b2 = b + b;

should produce b2 == -128 without error, and should be equivalent semantically to:

byte b2 = b;
b2 += b;

We don't want adding 2 bytes together to result in a byte result in all cases, only in cases where the actual assignment or usage is to a byte.

What if we defined several 'internal' types that were only used by the compiler?

pbyte -> byte promoted to an int (represented as an int internally)
pubyte -> ubyte promoted to an int
pshort -> short promoted to an int
pushort -> ushort promoted to an int
etc...

The 'promoted' types internally work just like int except in certain cases:

If you have (px or x) <op> (px or x), the resulting type is px

If you have (px or x) <op> (py or y), or (py or y) <op> (px or x), and the
rules of promotion allow x to be implicitly cast to y, the resulting type is
py.  Otherwise, the resulting type is int.

px is implicitly castable to x.

if the rules of promotion allow x to be implicitly cast to y, px is implicitly
castable to y.
otherwise, assigning px to y requires an explicit cast.

if calling a function foo with argument type px, where foo accepts type x, it is allowed.

If calling a function foo with argument type px, where foo accepts type y, and x is implicitly castable to y, it is allowed.  If x is not implicitly castable to y, it requires a cast.

if a variable is declared with 'auto', and the initializer is of type px, then the variable is declared as an int.

You can't declare any variables of type pbyte, etc, and the types actually don't have symbolic names, they are used internally by the compiler.

Now you have correct resolution of homogeneous operations, and no overflow of data where it is not desired.

examples:

byte b = 64;
b + b -> evaluates to pbyte(128)
b = b + b -> evaluates to b = pbyte(128), results in b == -128
int i = b + b -> evaluates to int i = pbyte(128), results in i == 128.
short s = b + b -> evaultes to short s = pbyte(128), results in s == 128.

short s = 64;
byte b = s + s; -> evaluates to byte b = pshort(128), requires a cast because
short does not fit into byte.

void foo(byte b);
void foo2(short s);

byte x;
short s;
foo(x + x); // allowed
foo2(x + x); // allowed
foo(s + s); // requires cast
foo2(s + s); // allowed

Does this cover the common cases?  Is there a reason why this can't be implemented?  Is there a reason why this *shouldn't* be implemented?


-- 

November 25, 2008
http://d.puremagic.com/issues/show_bug.cgi?id=1977





------- Comment #17 from andrei@metalanguage.com  2008-11-24 18:51 -------
(In reply to comment #16)
> I searched around, and you are right that C# disallows compiling byte + byte operands, and it does allow += operands.  The reasons given were not to forbid reassignment to the same type for fear of overflow (as is obvious by allowing the += operation), the point is to prevent operation overflow where it is not expected. for example:
> 
> int x = (byte)64 + (byte)64;
> 
> should result in x == 128, not x == -128.
> 
> And the enforcement is not in the compiler warning system, the enforcement is that they only define op codes for integer arithmetic, so the compiler promotes the bytes to integers which result in an integer.

That's not quite accurate. Again, it's one thing to pass typechecking and one thing to generate code. Any desired rule could have been implemented with only int arithmetic and subsequent masking.

> But C++ does not forbid it, at least with g++ (even with -Wall).

C++ operates in a similar way (values are conceptually promoted to int before arithmetic operations) but it's much more lax with narrowing conversions. That's why there is no problem with assigning the result of adding e.g. two shorts back to a short: the computation really yields an int, but C++ has no qualms about narrowing that into a short, regardless of the potential loss of data.

> This is not to say that the choices C# made are correct, it's just that there is precedent in C# (couldn't find Java reference, but I'm sure it's the same).
> 
> Here is a possible solution that allows current safe behavior and relaxes the implicit casting rules enough so that overflow is allowed to happen in the correct situations:
> 
> I think everyone agrees that the following:
> 
> byte b = 64;
> int i = b + b;
> 
> should produce i == 128.

I think there is agreement on that, too.

> And most believe that:
> 
> byte b2 = b + b;
> 
> should produce b2 == -128 without error, and should be equivalent semantically to:
> 
> byte b2 = b;
> b2 += b;
> 
> We don't want adding 2 bytes together to result in a byte result in all cases, only in cases where the actual assignment or usage is to a byte.

Well the "most" part doesn't quite pan out, and to me it looks like the argument fails here. For one thing, we need to eliminate people who accept Java and C#. They would believe that what their language does is the better thing to do. Also, C and C++ are getting that right by paying a very large cost - of allowing all narrowing integral conversions. I believe there is a reasonable level of agreement that automatic lossy conversions are not to be encouraged. This puts C and C++ behind Java and C# in terms of "getting it right".

> What if we defined several 'internal' types that were only used by the compiler?
> 
> pbyte -> byte promoted to an int (represented as an int internally)
> pubyte -> ubyte promoted to an int
> pshort -> short promoted to an int
> pushort -> ushort promoted to an int
> etc...
> 
> The 'promoted' types internally work just like int except in certain cases:
> 
> If you have (px or x) <op> (px or x), the resulting type is px
> 
> If you have (px or x) <op> (py or y), or (py or y) <op> (px or x), and the
> rules of promotion allow x to be implicitly cast to y, the resulting type is
> py.  Otherwise, the resulting type is int.
> 
> px is implicitly castable to x.
> 
> if the rules of promotion allow x to be implicitly cast to y, px is implicitly
> castable to y.
> otherwise, assigning px to y requires an explicit cast.
> 
> if calling a function foo with argument type px, where foo accepts type x, it is allowed.
> 
> If calling a function foo with argument type px, where foo accepts type y, and x is implicitly castable to y, it is allowed.  If x is not implicitly castable to y, it requires a cast.
> 
> if a variable is declared with 'auto', and the initializer is of type px, then the variable is declared as an int.
> 
> You can't declare any variables of type pbyte, etc, and the types actually don't have symbolic names, they are used internally by the compiler.
> 
> Now you have correct resolution of homogeneous operations, and no overflow of data where it is not desired.
> 
> examples:
> 
> byte b = 64;
> b + b -> evaluates to pbyte(128)
> b = b + b -> evaluates to b = pbyte(128), results in b == -128
> int i = b + b -> evaluates to int i = pbyte(128), results in i == 128.
> short s = b + b -> evaultes to short s = pbyte(128), results in s == 128.
> 
> short s = 64;
> byte b = s + s; -> evaluates to byte b = pshort(128), requires a cast because
> short does not fit into byte.
> 
> void foo(byte b);
> void foo2(short s);
> 
> byte x;
> short s;
> foo(x + x); // allowed
> foo2(x + x); // allowed
> foo(s + s); // requires cast
> foo2(s + s); // allowed
> 
> Does this cover the common cases?  Is there a reason why this can't be implemented?  Is there a reason why this *shouldn't* be implemented?

IMHO not enough rationale has been brought forth on why this *should* be implemented. It would make D implement an arcane set of rules for an odd, if any, benefit.

A better problem to spend energy on is the signed <-> unsigned morass. We've discussed that many times and could not come up with a reasonable solution. For now, D has borrowed the C rule "if any operand is unsigned then the result is unsigned" leading to the occasional puzzling results known from C and C++. Eliminating those fringe cases without losing compatibility with C and C++ is a tough challenge.


-- 

November 25, 2008
http://d.puremagic.com/issues/show_bug.cgi?id=1977





------- Comment #18 from schveiguy@yahoo.com  2008-11-25 09:02 -------
(In reply to comment #17)
> > And most believe that:
> > 
> > byte b2 = b + b;
> > 
> > should produce b2 == -128 without error, and should be equivalent semantically to:
> > 
> > byte b2 = b;
> > b2 += b;
> > 
> > We don't want adding 2 bytes together to result in a byte result in all cases, only in cases where the actual assignment or usage is to a byte.
> 
> Well the "most" part doesn't quite pan out, and to me it looks like the argument fails here. For one thing, we need to eliminate people who accept Java and C#. They would believe that what their language does is the better thing to do.

Just because people use a language, doesn't mean they agree with every decision.  In searching for this issue on C# blogs and message boards, the overwhelming majority prefers no error to the oversafe current implementation. The defenders of the current rules invariably use the case of adding two bytes together and assigning to an integer, their argument being that if you have the result of adding two bytes be a byte, then the integer result is a truncated byte.  If we eliminate that case from contention, as my solution has done, I think you'd be hard pressed to find anyone who thinks the loss of data errors are still needed in the cases such as the one that spawned this discussion.

> Also, C and C++ are getting that right by paying a very large cost - of allowing all narrowing integral conversions. I believe there is a reasonable level of agreement that automatic lossy conversions are not to be encouraged. This puts C and C++ behind Java and C# in terms of "getting it right".

I agree, general narrowing conversions should be failed.  It's just in the case of where arithmetic has artificially promoted the result where we disagree.

> 
> > What if we defined several 'internal' types that were only used by the compiler?
> > 
> > pbyte -> byte promoted to an int (represented as an int internally)
> > pubyte -> ubyte promoted to an int
> > pshort -> short promoted to an int
> > pushort -> ushort promoted to an int
> > etc...
> 
> IMHO not enough rationale has been brought forth on why this *should* be implemented. It would make D implement an arcane set of rules for an odd, if any, benefit.

Probably, it isn't that critical to the success of D that this be implemented. If I had to choose something to look at, this probably wouldn't be it.  This is just one of those little things that seems unnecessary and annoying more than it is blocking.  It shows up seldom enough that it probably isn't worth the trouble to fix.  But I have put my solution forth, and as far as I can tell, you didn't find anything wrong with it, and that's about all I can do.

> A better problem to spend energy on is the signed <-> unsigned morass. We've discussed that many times and could not come up with a reasonable solution. For now, D has borrowed the C rule "if any operand is unsigned then the result is unsigned" leading to the occasional puzzling results known from C and C++. Eliminating those fringe cases without losing compatibility with C and C++ is a tough challenge.

Indeed.  Without promoting to a larger type, I think you are forced to take this course of action.  When adding an int to a uint, who wants it to wrap around to a negative value?  I can't think of a better solution.


-- 

November 25, 2008
http://d.puremagic.com/issues/show_bug.cgi?id=1977





------- Comment #19 from andrei@metalanguage.com  2008-11-25 09:17 -------
(In reply to comment #18)
> > A better problem to spend energy on is the signed <-> unsigned morass. We've discussed that many times and could not come up with a reasonable solution. For now, D has borrowed the C rule "if any operand is unsigned then the result is unsigned" leading to the occasional puzzling results known from C and C++. Eliminating those fringe cases without losing compatibility with C and C++ is a tough challenge.
> 
> Indeed.  Without promoting to a larger type, I think you are forced to take this course of action.  When adding an int to a uint, who wants it to wrap around to a negative value?  I can't think of a better solution.
> 

You just did in fact. Your idea with defining some internal types is very similar to one of the promising solutions we've been exploring for resolving signedness of arithmetic operations.

I will in fact stop here and paste the rest of my message to the main newsgroup because it's of general interest and segues away from this bug report.


--