Jump to page: 1 2 3
Thread overview
Integer overflow and underflow semantics
May 05, 2012
Era Scarecrow
May 05, 2012
Era Scarecrow
May 05, 2012
Manu
May 06, 2012
Mehrdad
May 06, 2012
Era Scarecrow
May 05, 2012
bearophile
May 18, 2012
Paul D. Anderson
May 18, 2012
Paul D. Anderson
May 21, 2012
renoX
May 07, 2012
Don Clugston
May 09, 2012
renoX
May 18, 2012
akaz
May 18, 2012
akaz
May 18, 2012
Andrew Wiley
May 05, 2012
Hi,

I don't think the language really makes it clear whether overflows and underflows are well-defined. Do we guarantee that for any integral type T, T.max + 1 == T.min and T.min - 1 == T.max?

This is relevant in particular for GDC and LDC since they target a lot of weird architectures.

-- 
- Alex
May 05, 2012
On Saturday, 5 May 2012 at 04:57:48 UTC, Alex Rønne Petersen wrote:
> Hi,
>
> I don't think the language really makes it clear whether overflows and underflows are well-defined. Do we guarantee that for any integral type T, T.max + 1 == T.min and T.min - 1 == T.max?
>
> This is relevant in particular for GDC and LDC since they target a lot of weird architectures.

 Any systems that implement a carry flag likely works exactly like that. Carry flag is set or cleared after a math operation, allowing you to extend the size of your integer any level you want, like in oh the 6502?

 Might look something like this: Been a while but you should get the idea

 clc ;clear carry
loop_here:
 mov ax, [bx]
 adc [dx], ax
 inc bx
 inc dx
 dec cx
 cmp cx, 0
 jne loop_here

 ;if carry after this point then...
 jc overflow_warning

May 05, 2012
On 05-05-2012 07:32, Era Scarecrow wrote:
> On Saturday, 5 May 2012 at 04:57:48 UTC, Alex Rønne Petersen wrote:
>> Hi,
>>
>> I don't think the language really makes it clear whether overflows and
>> underflows are well-defined. Do we guarantee that for any integral
>> type T, T.max + 1 == T.min and T.min - 1 == T.max?
>>
>> This is relevant in particular for GDC and LDC since they target a lot
>> of weird architectures.
>
> Any systems that implement a carry flag likely works exactly like that.
> Carry flag is set or cleared after a math operation, allowing you to
> extend the size of your integer any level you want, like in oh the 6502?
>
> Might look something like this: Been a while but you should get the idea
>
> clc ;clear carry
> loop_here:
> mov ax, [bx]
> adc [dx], ax
> inc bx
> inc dx
> dec cx
> cmp cx, 0
> jne loop_here
>
> ;if carry after this point then...
> jc overflow_warning
>

Right, but the question was whether the language guarantees what I described. C and C++ don't, and IMO, we should strive to fix that.

-- 
- Alex
May 05, 2012
On Saturday, 5 May 2012 at 07:10:28 UTC, Alex Rønne Petersen wrote:

> Right, but the question was whether the language guarantees what I described. C and C++ don't, and IMO, we should strive to fix that.

 I can't see why it wouldn't, unless the compiler adds in checks and changes it's behavior or the assembler does it's own quirky magic. The bit patterns of how they end up are pretty fixed, it's just how we interpret them.

 I know before when i needed to know if it overflowed and simulated the carry flag, i ended up using a larger type of int. (This was for a custom multiple precision unsigned ints)



 Of course now that I think about it, if anything ends up using MMX for it's registers, the rules do change a little. MMX don't overflow if I remember right, they just cap/truncate to the Max/Min. Since it's intended more for multimedia... If the compiler uses those, I can't tell you what would happen.
May 05, 2012
On 05-05-2012 10:23, Era Scarecrow wrote:
> On Saturday, 5 May 2012 at 07:10:28 UTC, Alex Rønne Petersen wrote:
>
>> Right, but the question was whether the language guarantees what I
>> described. C and C++ don't, and IMO, we should strive to fix that.
>
> I can't see why it wouldn't, unless the compiler adds in checks and
> changes it's behavior or the assembler does it's own quirky magic. The
> bit patterns of how they end up are pretty fixed, it's just how we
> interpret them.

It all depends. GCC (and thus GDC) can target very exotic architectures where any assumptions may not, for whatever reason, hold true. This is a language design issue more than it's a "how does architecture X or compiler Y work" issue.

An interesting problem with undefined behavior for integer overflow and underflow in C/C++ is that optimizers are basically free to do anything with regards to them, and often do whatever is more convenient for them.

>
> I know before when i needed to know if it overflowed and simulated the
> carry flag, i ended up using a larger type of int. (This was for a
> custom multiple precision unsigned ints)
>
>
>
> Of course now that I think about it, if anything ends up using MMX for
> it's registers, the rules do change a little. MMX don't overflow if I
> remember right, they just cap/truncate to the Max/Min. Since it's
> intended more for multimedia... If the compiler uses those, I can't tell
> you what would happen.

Right, but it's not so much about what *would* happen as much as it is about what *should* happen. ;)

-- 
- Alex
May 05, 2012
On 5 May 2012 11:42, Alex Rønne Petersen <xtzgzorex@gmail.com> wrote:

> On 05-05-2012 10:23, Era Scarecrow wrote:
>
>> On Saturday, 5 May 2012 at 07:10:28 UTC, Alex Rønne Petersen wrote:
>>
>>  Right, but the question was whether the language guarantees what I
>>> described. C and C++ don't, and IMO, we should strive to fix that.
>>>
>>
>> I can't see why it wouldn't, unless the compiler adds in checks and changes it's behavior or the assembler does it's own quirky magic. The bit patterns of how they end up are pretty fixed, it's just how we interpret them.
>>
>
> It all depends. GCC (and thus GDC) can target very exotic architectures where any assumptions may not, for whatever reason, hold true. This is a language design issue more than it's a "how does architecture X or compiler Y work" issue.
>
> An interesting problem with undefined behavior for integer overflow and underflow in C/C++ is that optimizers are basically free to do anything with regards to them, and often do whatever is more convenient for them.


With regard to code-gen on such colourful architectures, would stating a
defined behaviour for overflow/underflow affect the common case where an
over/underflow did not occur?
Short of an architecture that causes hardware exception on over/underflow,
I suspect that it would interfere with the common case (additional code
generated around every add/sub/etc to handle the overflow behaviour), and
on such an architecture, every single numerical integer operation would
become inefficient.

I believe this is why C doesn't define the behaviour, because C is still effectively a macro language, and shouldn't produce 'unexpected' inefficient code. ('unexpected' from the perspective of the architecture you're targeting)

I would personally rather see it remain undefined like C, but with convention approved by common/conventional architectures where cross platform porting is likely to occur.


May 05, 2012
On 05-05-2012 12:25, Manu wrote:
> On 5 May 2012 11:42, Alex Rønne Petersen <xtzgzorex@gmail.com
> <mailto:xtzgzorex@gmail.com>> wrote:
>
>     On 05-05-2012 10 <tel:05-05-2012%2010>:23, Era Scarecrow wrote:
>
>         On Saturday, 5 May 2012 at 07:10:28 UTC, Alex Rønne Petersen wrote:
>
>             Right, but the question was whether the language guarantees
>             what I
>             described. C and C++ don't, and IMO, we should strive to fix
>             that.
>
>
>         I can't see why it wouldn't, unless the compiler adds in checks and
>         changes it's behavior or the assembler does it's own quirky
>         magic. The
>         bit patterns of how they end up are pretty fixed, it's just how we
>         interpret them.
>
>
>     It all depends. GCC (and thus GDC) can target very exotic
>     architectures where any assumptions may not, for whatever reason,
>     hold true. This is a language design issue more than it's a "how
>     does architecture X or compiler Y work" issue.
>
>     An interesting problem with undefined behavior for integer overflow
>     and underflow in C/C++ is that optimizers are basically free to do
>     anything with regards to them, and often do whatever is more
>     convenient for them.
>
>
> With regard to code-gen on such colourful architectures, would stating a
> defined behaviour for overflow/underflow affect the common case where an
> over/underflow did not occur?
> Short of an architecture that causes hardware exception on
> over/underflow, I suspect that it would interfere with the common case
> (additional code generated around every add/sub/etc to handle the
> overflow behaviour), and on such an architecture, every single numerical
> integer operation would become inefficient.

I don't know of a single architecture (out of the ones on dlang.org/version.html and many others) that doesn't signal overflow/underflow somehow (or obey the rules I described in the OP).

>
> I believe this is why C doesn't define the behaviour, because C is still
> effectively a macro language, and shouldn't produce 'unexpected'
> inefficient code. ('unexpected' from the perspective of the architecture
> you're targeting)

Right, but C was designed many, many years ago. :)

>
> I would personally rather see it remain undefined like C, but with
> convention approved by common/conventional architectures where cross
> platform porting is likely to occur.

Do a grep for "\.min" and "\.max" in Phobos and you'll notice a lot of places actually depend on the current x86 behavior (wrapping on overflow and underflow). I don't think we can afford to make the same mistake C and C++ did.

While I did say that this is a language design issue, it's also a matter of pragmatism - does any real world architecture that D could possibly target actually *not* obey the aforementioned rules? I don't know of any that doesn't.

-- 
- Alex
May 05, 2012
Alex Rønne Petersen:

> I don't think the language really makes it clear whether overflows and underflows are well-defined. Do we guarantee that for any integral type T, T.max + 1 == T.min and T.min - 1 == T.max?
>
> This is relevant in particular for GDC and LDC since they target a lot of weird architectures.

In a good system language I'd like to see something better than what's present in C#. So I'd like the language to offer the programmer the choice of 3 or 4 different semantics in integral operations:

1) A shared standard semantics that overflows, as in Java;
2) A semantics that overflows, that adapts to the fastest available on the CPU, as in C;
3) Shared standard overflows with unsigned values and run-time errors when a signed value overflows (or goes out of its range).
4) Run-time errors when every signed or unsigned value overflows (or goes out of its range), as in Ada.

Bye,
bearophile
May 05, 2012
On 05-05-2012 13:06, bearophile wrote:
> Alex Rønne Petersen:
>
>> I don't think the language really makes it clear whether overflows and
>> underflows are well-defined. Do we guarantee that for any integral
>> type T, T.max + 1 == T.min and T.min - 1 == T.max?
>>
>> This is relevant in particular for GDC and LDC since they target a lot
>> of weird architectures.
>
> In a good system language I'd like to see something better than what's
> present in C#. So I'd like the language to offer the programmer the
> choice of 3 or 4 different semantics in integral operations:
>
> 1) A shared standard semantics that overflows, as in Java;
> 2) A semantics that overflows, that adapts to the fastest available on
> the CPU, as in C;
> 3) Shared standard overflows with unsigned values and run-time errors
> when a signed value overflows (or goes out of its range).
> 4) Run-time errors when every signed or unsigned value overflows (or
> goes out of its range), as in Ada.
>
> Bye,
> bearophile

Good point. The wrapping rule described in my OP should be the default IMO. Then perhaps the other modes could be activated with an @attribute of sorts?

-- 
- Alex
May 06, 2012
On 05-05-2012 06:57, Alex Rønne Petersen wrote:
> Hi,
>
> I don't think the language really makes it clear whether overflows and
> underflows are well-defined. Do we guarantee that for any integral type
> T, T.max + 1 == T.min and T.min - 1 == T.max?
>
> This is relevant in particular for GDC and LDC since they target a lot
> of weird architectures.
>

Can anyone give a definitive answer to this or at least confirm that it is an open issue?

-- 
- Alex
« First   ‹ Prev
1 2 3