Treating the abusive unsigned syndrome (page 5) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Treating the abusive unsigned syndrome (page 5)

November 26, 2008

Re: Treating the abusive unsigned syndrome

Posted by Denis Koroskin
in reply to Andrei Alexandrescu

Denis Koroskin

Posted in reply to Andrei Alexandrescu

On Wed, 26 Nov 2008 18:24:17 +0300, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:

> Also consider:
>
> auto delta = a1.length - a2.length;
>
> What should the type of delta be? Well, it depends. In my scheme that wouldn't even compile, which I think is a good thing; you must decide whether prior information makes it an unsigned or a signed integral.
>

Sure, it shouldn't compile. But explicit casting to either type won't help. Let's say you expect that a1.length > a2.length and thus expect a strictly positive result. Putting an explicit cast will not detect (but suppress) an error and give you an erroneous result silently.

Putting an assert(a1.length > a2.length) might help, but the check will be unavailable unless code is compiled with asserts enabled.

A better solution would be to write code as follows:

auto delta = unsigned(a1.length - a2.length); // returns an unsigned value, throws on overflow (i.e., "2 - 4")
auto delta = signed(a1.length - a2.length); // returns result as a signed value. Throws on overflow (i.e., "int.min - 1")
auto delta = a1.length - a2.length; // won't compile

// this one is also handy:
auto newLength = checked(a1.length - 1); // preserves type of a1.length, be it int or uint, throws on overflow

I have previously shown an implementation of unsigned/signed:

import std.stdio;

int signed(lazy int dg)
{
    auto result = dg();
    asm {
       jo overflow;
    }
    return result;

    overflow:
    throw new Exception("Integer overflow occured");
}

int main()
{
   int t = int.max;
   try
   {
       int s = signed(t + 1);
       writefln("Result is %d", s);
   }
   catch(Exception e)
   {
       writefln("Whoops! %s", e.toString());
   }
   return 0;
}

But Andrei has correctly pointed out that it has a problem - it may throw without a reason:
int i = int.max + 1; // sets an overflow flag
auto result = expectSigned(1); // raises an exception

Overflow flag may also be cleared in a complex expression:
auto result = expectUnsigned(1 + (uint.max + 1)); // first add will overflow and second one clears the flag -> no exception as a result

A possible solution is to make the compiler aware of this construct and disallow passing none (case 2) or more that one operation (case 1) to the method.

November 26, 2008

Re: Treating the abusive unsigned syndrome

Posted by Andrei Alexandrescu
in reply to Don

Andrei Alexandrescu

Posted in reply to Don

Don wrote:
> Michel Fortin wrote:
>> On 2008-11-26 10:24:17 -0500, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> said:
>>
>>> Also consider:
>>>
>>> auto delta = a1.length - a2.length;
>>>
>>> What should the type of delta be? Well, it depends. In my scheme that wouldn't even compile, which I think is a good thing; you must decide whether prior information makes it an unsigned or a signed integral.
>>
>> In my scheme it would give you a uint. You'd have to cast to get a signed integer... I see how it's not ideal, but I can't imagine how it could be coherent otherwise.
>>
>>     auto diff = cast(int)a1.length - cast(int)a2.length;
> 
> Actually, there's no solution.

There is. We need to find the block of marble it's in and then chip the extra marble off it.

> Imagine a 32 bit system, where one object can be greater than 2GB in size (not possible in Windows AFAIK, but theoretically possible).

It is possible in Windows if you change some I-forgot-which parameter in boot.ini.

> Then if a1 is 3GB, delta cannot be stored in an int. If a2 is 3GB, it requires an int for storage, since result is less than 0.
> 
> ==> I think length has to be an int. It's less bad than uint.

I'm not sure how the conclusion follows from the premises, but consider this. If someone deals with large arrays, they do have the possibility of doing things like:

if (a1.length >= a2.length) {
    size_t delta = a1.length - a2.length;
    ... use delta ...
} else {
    size_t rDelta = a2.length - a1.length;
    ... use rDelta ...
}

I'm not saying it's better than sliced bread, but it is a solution. And it is correct on all systems. And cooperates with the typechecker by adding flow information to which typecheckers are usually oblivious. And types are out in the clear. And it's the programmer, not the compiler, who decides the signedness.

In contrast, using ints for array lengths beyond 2GB is a nightmare. I'm not saying it's a frequent thing though, but since you woke up the sleeping dog, I'm just barking :o).

>> Perhaps we could add a "sign" property to uint and an "unsign" property to int that'd give you the signed or unsigned corresponding value and which could do range checking at runtime (enabled by a compiler flag).
>>
>>     auto diff = a1.length.sign - a2.length.sign;
>>
>> And for the general problem of "uint - uint" giving a result below uint.min, as I said in my other post, that could be handled by a runtime check (enabled by a compiler flag) just like array bound checking.
> 
> That's not bad.

Well let's look closer at this. Consider a system in which the current rules are in vigor, plus the overflow check for uint.

auto i = arr.length - offset1 + offset2;

Although the context makes it clear that offset1 < offset2 and therefore i is within range and won't overflow, the poor code generator has no choice but insert checks throughout. Even though the entire expression is always correct, it will dynamically fail on the way to its correct form.

Contrast with the proposed system in which the expression will not compile. They will indeed require the user to somewhat redundantly insert guides for operations, but during compilation, not through runtime failure.

>>>
>>> Fine. With constants there is some mileage that can be squeezed. But let's keep in mind that that doesn't solve the larger issue.
>>
>> Well, by making implicit convertions between uint and int illegal, we're solving the larger issue. Just not in a seemless manner.
> 
> We are of one mind. I think that constants are the root cause of the problem.

Well I strongly disagree. (I assume you mean "literals", not "constants".) I see constants as just a small part of the signedness mess. Moreover, I consider that in fact creating symbolic names with "auto" compounds the problem, and this belief runs straight against yours that it's about literals. No, IMHO it's about espousing and then propagating wrong beliefs through auto!

Maybe if you walked me through your reasoning on why literals bear a significant importance I could get convinced. As far as my code is concerned, I tend to loosely go along the lines of the old adage "the only literals in a program should be 0, 1, and -1". True, the adage doesn't say how many of these three may reasonably occur, but at the end of the day I'm confused about this alleged importance of literals.

Andrei

November 26, 2008

Re: Treating the abusive unsigned syndrome

Posted by Andrei Alexandrescu
in reply to Denis Koroskin

Andrei Alexandrescu

Posted in reply to Denis Koroskin

Denis Koroskin wrote:
> On Wed, 26 Nov 2008 18:24:17 +0300, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
> 
>> Also consider:
>>
>> auto delta = a1.length - a2.length;
>>
>> What should the type of delta be? Well, it depends. In my scheme that wouldn't even compile, which I think is a good thing; you must decide whether prior information makes it an unsigned or a signed integral.
>>
> 
> Sure, it shouldn't compile. But explicit casting to either type won't help. Let's say you expect that a1.length > a2.length and thus expect a strictly positive result. Putting an explicit cast will not detect (but suppress) an error and give you an erroneous result silently.

But "silently" and "putting a cast" don't go together. It's the cast that makes the erroneous result non-silent.

Besides, you don't need to cast. You can always use a function that does the requisite checks. std.conv will have some of those, should any change in the rules make it necessary.

By this I'm essentially replying Don's message in the bugs newsgroup: nobody puts a gun to your head to cast.

> Putting an assert(a1.length > a2.length) might help, but the check will be unavailable unless code is compiled with asserts enabled.

Put an enforce(a1.length > a2.length) then.

> A better solution would be to write code as follows:
> 
> auto delta = unsigned(a1.length - a2.length); // returns an unsigned value, throws on overflow (i.e., "2 - 4")
> auto delta = signed(a1.length - a2.length); // returns result as a signed value. Throws on overflow (i.e., "int.min - 1")
> auto delta = a1.length - a2.length; // won't compile

Amazingly this solution was discussed with these exact names! The signed and unsigned functions can be implemented as libraries, but unfortunately (or fortunately I guess) that means the bits32 and bits64 are available to all code.

One fear of mine is the reaction of throwing of hands in the air "how many integral types are enough???". However, if we're to judge by the addition of long long and a slew of typedefs to C99 and C++0x, the answer is "plenty". I'd be interested in gaging how people feel about adding two (bits64, bits32) or even four (bits64, bits32, bits16, and bits8) types as basic types. They'd be bitbags with undecided sign ready to be converted to their counterparts of decided sign.

> // this one is also handy:
> auto newLength = checked(a1.length - 1); // preserves type of a1.length, be it int or uint, throws on overflow

This could be rather tricky. How can overflow be checked? By inspecting the status bits in the processor only; at the language/typesystem level there's little to do.

> I have previously shown an implementation of unsigned/signed:
> 
> import std.stdio;
> 
> int signed(lazy int dg)
> {
>     auto result = dg();
>     asm {
>        jo overflow;
>     }
>     return result;
> 
>     overflow:
>     throw new Exception("Integer overflow occured");
> }
> 
> int main()
> {
>    int t = int.max;
>    try
>    {
>        int s = signed(t + 1);
>        writefln("Result is %d", s);
>    }
>    catch(Exception e)
>    {
>        writefln("Whoops! %s", e.toString());
>    }
>    return 0;
> }

Ah, there we go! Thanks for pasting this code.

> But Andrei has correctly pointed out that it has a problem - it may throw without a reason:
> int i = int.max + 1; // sets an overflow flag
> auto result = expectSigned(1); // raises an exception
> 
> Overflow flag may also be cleared in a complex expression:
> auto result = expectUnsigned(1 + (uint.max + 1)); // first add will overflow and second one clears the flag -> no exception as a result
> 
> A possible solution is to make the compiler aware of this construct and disallow passing none (case 2) or more that one operation (case 1) to the method.

Can't you clear the overflow flag prior to invoking the operation?

I'll also mention that making it a delegate reduces appeal quite a bit; expressions under the check tend to be simple which makes the relative overhead huge.

Andrei

November 26, 2008

Re: Treating the abusive unsigned syndrome

Posted by Ary Borenszweig
in reply to bearophile

Ary Borenszweig

Posted in reply to bearophile

bearophile wrote:
> Andrei Alexandrescu:
>> The problem is, it's much more annoying than one might imagine. Even array.length - 1 is up for scrutiny. Technically, even array.length + 1 is a problem because 1 is really a signed int. We could provide exceptions for constants, but exceptions are generally not solving the core issue.
> 
> That can be solved making array.length signed.

I agree. I proposed this some time ago. For example, even though C# has unsigned types, the length of a list, array, etc., is always int. In this way, they prevented the bugs and problems everyone mention here.

November 26, 2008

Re: Treating the abusive unsigned syndrome

Posted by Sean Kelly
in reply to Don

Sean Kelly

Posted in reply to Don

Don wrote:
> 
> Although it would be nice to have a type which was range-limited, 'uint' doesn't do it. Instead, it guarantees the number is between 0 and int.max*2+1 inclusive. Allowing mixed operations encourages programmers to focus the benefit of 'the lower bound is zero!' while forgetting that there is an enormous downside ('I'm saying that this could be larger than int.max!')

This inspired me to think about where I use uint and I realized that I don't.  I use size_t for size/length representations (largely because sizes can theoretically be >2GB on a 32-bit system), and ubyte for bit-level stuff, but that's it.

Sean

November 26, 2008

Re: Treating the abusive unsigned syndrome

Posted by Sean Kelly
in reply to Andrei Alexandrescu

Sean Kelly

Posted in reply to Andrei Alexandrescu

Andrei Alexandrescu wrote:
> 
> Notice that the fact that one operand is a literal does not solve all of the problems I mentioned. There is for example no progress in typing u1 - u2 appropriately.

What /is/ the appropriate type here?  For example:

    uint a = uint.max;
    uint b = 0;
    uint c = uint.max - 1;

    int  x = a - b; // wrong, should be uint
    uint y = c - a; // wrong, should be int

I don't see any way to reliably produce a "safe" result at the language level.


Sean

November 26, 2008

Re: Treating the abusive unsigned syndrome

Posted by Denis Koroskin
in reply to Andrei Alexandrescu

Denis Koroskin

Posted in reply to Andrei Alexandrescu

On Wed, 26 Nov 2008 21:45:30 +0300, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:

> Denis Koroskin wrote:
>> On Wed, 26 Nov 2008 18:24:17 +0300, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
>>
>>> Also consider:
>>>
>>> auto delta = a1.length - a2.length;
>>>
>>> What should the type of delta be? Well, it depends. In my scheme that wouldn't even compile, which I think is a good thing; you must decide whether prior information makes it an unsigned or a signed integral.
>>>
>>  Sure, it shouldn't compile. But explicit casting to either type won't help. Let's say you expect that a1.length > a2.length and thus expect a strictly positive result. Putting an explicit cast will not detect (but suppress) an error and give you an erroneous result silently.
>
> But "silently" and "putting a cast" don't go together. It's the cast that makes the erroneous result non-silent.
>
> Besides, you don't need to cast. You can always use a function that does the requisite checks. std.conv will have some of those, should any change in the rules make it necessary.
>
> By this I'm essentially replying Don's message in the bugs newsgroup: nobody puts a gun to your head to cast.
>
>> Putting an assert(a1.length > a2.length) might help, but the check will be unavailable unless code is compiled with asserts enabled.
>
> Put an enforce(a1.length > a2.length) then.
>

Right, it is better. Problem is, you don't want to put checks like "a1.length > a2.length" into your code (I don't, at least). All you want is to be sure that "auto result = a1.length - a2.length" is positive. You *then* decide and solve the "a1.length - a2.length >= 0" equation that leads to the check. Moreover, why evaluate both a1.length and a2.length twice? And you should update all your checks everytime you change your code.

>> A better solution would be to write code as follows:
>>  auto delta = unsigned(a1.length - a2.length); // returns an unsigned value, throws on overflow (i.e., "2 - 4")
>> auto delta = signed(a1.length - a2.length); // returns result as a signed value. Throws on overflow (i.e., "int.min - 1")
>> auto delta = a1.length - a2.length; // won't compile
>
> Amazingly this solution was discussed with these exact names! The signed and unsigned functions can be implemented as libraries, but unfortunately (or fortunately I guess) that means the bits32 and bits64 are available to all code.
>
> One fear of mine is the reaction of throwing of hands in the air "how many integral types are enough???". However, if we're to judge by the addition of long long and a slew of typedefs to C99 and C++0x, the answer is "plenty". I'd be interested in gaging how people feel about adding two (bits64, bits32) or even four (bits64, bits32, bits16, and bits8) types as basic types. They'd be bitbags with undecided sign ready to be converted to their counterparts of decided sign.
>
>> // this one is also handy:
>> auto newLength = checked(a1.length - 1); // preserves type of a1.length, be it int or uint, throws on overflow
>
> This could be rather tricky. How can overflow be checked? By inspecting the status bits in the processor only; at the language/typesystem level there's little to do.

It is an implementation detail. Expression can be calculated with higher bit precision and result compared to needed range.

>
>> I have previously shown an implementation of unsigned/signed:
>>  import std.stdio;
>>  int signed(lazy int dg)
>> {
>>     auto result = dg();
>>     asm {
>>        jo overflow;
>>     }
>>     return result;
>>      overflow:
>>     throw new Exception("Integer overflow occured");
>> }
>>  int main()
>> {
>>    int t = int.max;
>>    try
>>    {
>>        int s = signed(t + 1);
>>        writefln("Result is %d", s);
>>    }
>>    catch(Exception e)
>>    {
>>        writefln("Whoops! %s", e.toString());
>>    }
>>    return 0;
>> }
>
> Ah, there we go! Thanks for pasting this code.
>
>> But Andrei has correctly pointed out that it has a problem - it may throw without a reason:
>> int i = int.max + 1; // sets an overflow flag
>> auto result = expectSigned(1); // raises an exception
>>  Overflow flag may also be cleared in a complex expression:
>> auto result = expectUnsigned(1 + (uint.max + 1)); // first add will overflow and second one clears the flag -> no exception as a result
>>  A possible solution is to make the compiler aware of this construct and disallow passing none (case 2) or more that one operation (case 1) to the method.
>
> Can't you clear the overflow flag prior to invoking the operation?
>

No need for this, it adds one more instruction for no gain, flag is automatically set/reset at any of add/sub/mul operations. It can only save you from "auto result = signed(1)" error, that's why I said it should be disallowed in first place.

> I'll also mention that making it a delegate reduces appeal quite a bit; expressions under the check tend to be simple which makes the relative overhead huge.
>

Such simple instructions are usually inlined, aren't they?

November 26, 2008

Re: Treating the abusive unsigned syndrome

Posted by Sergey Gromov
in reply to Andrei Alexandrescu

Sergey Gromov

Posted in reply to Andrei Alexandrescu

Wed, 26 Nov 2008 09:12:12 -0600, Andrei Alexandrescu wrote:

> Don wrote:
>> My suspicion is, that if you allowed all signed-unsigned operations when at least one was a literal, and made everything else illegal, you'd fix most of the problems. In particular, there'd be a big reduction in people abusing 'uint' as a primitive range-limited int.
> 
> Well, part of my attempt is to transform that abuse into legit use. In other words, I do want to allow people to consider uint a reasonable model of natural numbers. It can't be perfect, but I believe we can make it reasonable.
> 
> Notice that the fact that one operand is a literal does not solve all of the problems I mentioned. There is for example no progress in typing u1 - u2 appropriately.
> 
>> Although it would be nice to have a type which was range-limited, 'uint' doesn't do it. Instead, it guarantees the number is between 0 and int.max*2+1 inclusive. Allowing mixed operations encourages programmers to focus the benefit of 'the lower bound is zero!' while forgetting that there is an enormous downside ('I'm saying that this could be larger than int.max!')
> 
> I'm not sure I understand this part. To me, the larger problem is underflow, e.g. when subtracting two small uints results in a large uint.

I'm totally with Don here.  In math, natural numbers are a subset if integers.  But uint is not a subset of int.  If it were, most of the problems would vanish.  So it's probably feasible to ban uint from SafeD, implement natural numbers by some other means, and leave uint for low-level wizardry.

November 26, 2008

Re: Treating the abusive unsigned syndrome

Posted by Michel Fortin
in reply to Andrei Alexandrescu

Michel Fortin

Posted in reply to Andrei Alexandrescu

On 2008-11-26 13:30:30 -0500, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> said:

> Well let's look closer at this. Consider a system in which the current rules are in vigor, plus the overflow check for uint.
> 
> auto i = arr.length - offset1 + offset2;
> 
> Although the context makes it clear that offset1 < offset2 and therefore i is within range and won't overflow, the poor code generator has no choice but insert checks throughout. Even though the entire expression is always correct, it will dynamically fail on the way to its correct form.

That's because you're relying on a specific behaviour for overflows and that changes with range checking. True: in some cases the values going circular is desirable. But in this specific case I'd say it'd be better to just add parenthesis at the right place, or change the order of the arguments to avoid overflow. Avoiding overflows is a good practice in general. The only reason it doesn't bite here is because you're limited to additions and subtractions.

If you dislike the compiler checking for overflows, just tell it not to check. That's why we need a compiler switch. Perhaps it'd be good to have a pragma to disable those checks for specific pieces of code too.


> Contrast with the proposed system in which the expression will not compile. They will indeed require the user to somewhat redundantly insert guides for operations, but during compilation, not through runtime failure.

If you're just adding a special rule to prevent the result of substractions of unsigned values to be put into auto variables, I'm not terribly against that. I'm just unconvinced of its usefullness.


-- 
Michel Fortin
michel.fortin@michelf.com
http://michelf.com/

November 26, 2008

Re: Treating the abusive unsigned syndrome

Posted by Nick Sabalausky
in reply to Michel Fortin

Nick Sabalausky

Posted in reply to Michel Fortin

"Michel Fortin" <michel.fortin@michelf.com> wrote in message news:ggjpn4$1v0m$1@digitalmars.com...
> On 2008-11-25 10:59:01 -0500, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> said:
>
>> (3) u - u
>
> Just a note here, because it seems to me you're confusing two issues with that "u - u" thing. The problem with "u - u" isn't one of unsigned vs. signed integers at all. It's a problem of possibly going out of range, a problem that can happen with any type but is more likely with unsigned integers since they're often near zero.
>
> If you want to attack that problem, I think it should be done in a coherent manner with other out-of-range issues. Going below uint.min for an uint or below int.min for an int should be handled the same way. Personally, I'd just add a compiler switch for runtime range checking (just as for array bound checking).
>

I'd love to see D get the ability to turn on/off runtime range checking, but doing nothing more than a program-wide (or module-wide if compiling one-at-a-time) compiler switch is way too large-grained and blunt. I would want to also see C#'s:

checked(expr)
unchecked(expr)
checked { code }
unchecked { code }

> Treating the result u - u as __intuint is dangerous: uint.max - 1U gives you a value which int cannot hold, but you'd allow it to convert implicitly and without warning to int? I don't like it.
>
> -- 
> Michel Fortin
> michel.fortin@michelf.com
> http://michelf.com/
>

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation