November 25, 2008 Re: Treating the abusive unsigned syndrome | ||||
---|---|---|---|---|
| ||||
Posted in reply to Steven Schveighoffer | Steven Schveighoffer wrote: > "Andrei Alexandrescu" wrote >> I remembered a couple more details. The names bits8, bits16, bits32, and bits64 were a possible choice for undecided-sign integrals. Walter and I liked that quite some. Walter also suggested that we make those actually full types accessible to programmers. We both were concerned that they'd add to the already large panoply of integral types in D. Dropping bits8 and bits16 would reduce bloating at the cost of consistency. >> >> So we're contemplating: >> >> (a) Add bits8, bits16, bit32, bits64 public types. >> (b) Add bit32, bits64 public types. >> (c) Add bits8, bits16, bit32, bits64 compiler-internal types. >> (d) Add bit32, bits64 compiler-internal types. >> >> Make your pick or add more choices! > > One other thing to contemplate: > > What happens if you add a bits32 to a bits64, long, or ulong value? This needs to be illegal since you don't know whether to sign-extend or not. Or you could reinterpret the expression to promote the original types to 64-bit first? Good point. There's no (or not much) arithmetic mixing bits32 and some 64-bit integral because it's unclear whether extending the bits32 operand should extend the sign bit or not. > This makes the version with 8 and 16 bit types less attractive. > > Another alternative is to select the bits type based on the entire expression. Of course, you'd have to disallow them as public types. And you'd want to do some special optimizations. You could represent it conceptually as calculating for all the bits types until the one that is decided is used, and then the compiler can optimize out the unused ones, which would at least keep it context-free. > > -Steve That's the intent of defining arithmetic on sign-ambiguous values. The type information propagates in a complex expression. I haven't heard of typechecking on entire expression patterns and I think it would be a rather unclean technique (it means either that there are values that you can't tell the type of, or that a given value has a context-dependent type). Andrei |
November 25, 2008 Re: Treating the abusive unsigned syndrome | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | Tue, 25 Nov 2008 11:06:32 -0600, Andrei Alexandrescu wrote:
> I remembered a couple more details. The names bits8, bits16, bits32, and bits64 were a possible choice for undecided-sign integrals. Walter and I liked that quite some. Walter also suggested that we make those actually full types accessible to programmers. We both were concerned that they'd add to the already large panoply of integral types in D. Dropping bits8 and bits16 would reduce bloating at the cost of consistency.
>
> So we're contemplating:
>
> (a) Add bits8, bits16, bit32, bits64 public types.
> (b) Add bit32, bits64 public types.
> (c) Add bits8, bits16, bit32, bits64 compiler-internal types.
> (d) Add bit32, bits64 compiler-internal types.
>
> Make your pick or add more choices!
I'll add more. :)
The problem with signed/unsigned types is that neither int nor uint is a sub-type of one another. They're essentially incompatible. Therefore a possible solution is:
1. Disallow implicit signed <=> unsigned conversion.
2. For those willing to port large C/C++ codebases introduce a compiler compatibility switch which would add global operators mimicking the C behavior:
uint opAdd(int, uint)
uint opAdd(uint, int)
ulong opAdd(long, ulong)
etc.
This way you can even implement compatibility levels: only C-style additions, or additions with multiplications, or complete compatibility including the original signed/unsigned comparison behavior.
|
November 25, 2008 Re: Treating the abusive unsigned syndrome | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | I'm of the opinion that we should make mixed-sign operations a compile-time error. I know that it would be annoying in some situations, but IMHO it gives you clearer, more reliable code.
IMHO, it's a mistake to have implicit casts that lose information.
Want to hear a funny/sad, but somewhat related story? I was chasing down a segfault recently at work. I hunted and hunted, and finally found out that the pointer returned from malloc() was bad. I figured that I was overwriting the heap, right? So I added tracing and debugging everywhere...no luck.
I finally, in desperation, included <stdlib.h> to the source file (there was a warning about malloc() not being prototyped)...and the segfaults vanished!!!
The problem was that the xlc compiler, when it doesn't have the prototype for a function, assumes that it returns int...but int is 32 bits. Moreover, the compiler was happily implicitly casting that int to a pointer...which was 64 bits.
The compiler was silently cropping the top 32 bits off my pointers.
And it all was a "feature" to make programming "easier."
Russ
Andrei Alexandrescu wrote:
> D pursues compatibility with C and C++ in the following manner: if a code snippet compiles in both C and D or C++ and D, then it should have the same semantics.
>
> A classic problem with C and C++ integer arithmetic is that any operation involving at least an unsigned integral receives automatically an unsigned type, regardless of how silly that actually is, semantically. About the only advantage of this rule is that it's simple. IMHO it only has disadvantages from then on.
>
> The following operations suffer from the "abusive unsigned syndrome" (u is an unsigned integral, i is a signed integral):
>
> (1) u + i, i + u
> (2) u - i, i - u
> (3) u - u
> (4) u * i, i * u, u / i, i / u, u % i, i % u (compatibility with C requires that these all return unsigned, ouch)
> (5) u < i, i < u, u <= i etc. (all ordering comparisons)
> (6) -u
>
> Logic operations &, |, and ^ also yield unsigned, but such cases are less abusive because at least the operation wasn't arithmetic in the first place. Comparing for equality is also quite a conundrum - should minus two billion compare equal to 2_294_967_296? I'll ignore these for now and focus on (1) - (6).
>
> So far we haven't found a solid solution to this problem that at the same time allows "good" code pass through, weeds out "bad" code, and is compatible with C and C++. The closest I got was to have the compiler define the following internal types:
>
> __intuint
> __longulong
>
> I've called them "dual-signed integers" in the past, but let's try the shorter "undecided sign". Each of these is a subtype of both the signed and the unsigned integral in its name, e.g. __intuint is a subtype of both int and uint. (Originally I thought of defining __byteubyte and __shortushort as well but dropped them in the interest of simplicity.)
>
> The sign-ambiguous operations (1) - (6) yield __intuint if no operand size was larger than 32 bits, and __longulong otherwise. Undecided sign types define their own operations. Let x and y be values of undecided sign. Then x + y, x - y, and -x also return a sign-ambiguous integral (the size is that of the largest operand). However, the other operators do not work on sign-ambiguous integrals, e.g. x / y would not compile because you must decide what sign x and y should have prior to invoking the operation. (Rationale: multiplication/division work differently depending on the signedness of their operands).
>
> User code cannot define a symbol of sign-ambiguous type, e.g.
>
> auto a = u + i;
>
> would not compile. However, given that __intuint is a subtype of both int and uint, it can be freely converted to either whenever there's no ambiguity:
>
> int a = u + i; // fine
> uint b = u + i; // fine
>
> The advantage of this scheme is that it weeds out many (most? all?) surprises and oddities caused by the abusive unsigned rule of C and C++. The disadvantage is that it is more complex and may surprise the novice in its own way by refusing to compile code that looks legit.
>
> At the moment, we're in limbo regarding the decision to go forward with this. Walter, as many good long-time C programmers, knows the abusive unsigned rule so well he's not hurt by it and consequently has little incentive to see it as a problem. I have had to teach C and C++ to young students coming from Java introductory courses and have a more up-to-date perspective on the dangers. My strong belief is that we need to address this mess somehow, which type inference will only make more painful (in the hand of the beginner, auto can be a quite dangerous tool for wrong belief propagation). I also know seasoned programmers who had no idea that -u compiles and that it also oddly returns an unsigned type.
>
> Your opinions, comments, and suggestions for improvements would as always be welcome.
>
>
> Andrei
|
November 25, 2008 Re: Treating the abusive unsigned syndrome | ||||
---|---|---|---|---|
| ||||
Posted in reply to Russell Lewis | (You may want to check your system's date, unless of course you traveled in time.) Russell Lewis wrote: > I'm of the opinion that we should make mixed-sign operations a compile-time error. I know that it would be annoying in some situations, but IMHO it gives you clearer, more reliable code. The problem is, it's much more annoying than one might imagine. Even array.length - 1 is up for scrutiny. Technically, even array.length + 1 is a problem because 1 is really a signed int. We could provide exceptions for constants, but exceptions are generally not solving the core issue. > IMHO, it's a mistake to have implicit casts that lose information. Hear, hear. > Want to hear a funny/sad, but somewhat related story? I was chasing down a segfault recently at work. I hunted and hunted, and finally found out that the pointer returned from malloc() was bad. I figured that I was overwriting the heap, right? So I added tracing and debugging everywhere...no luck. > > I finally, in desperation, included <stdlib.h> to the source file (there was a warning about malloc() not being prototyped)...and the segfaults vanished!!! > > The problem was that the xlc compiler, when it doesn't have the prototype for a function, assumes that it returns int...but int is 32 bits. Moreover, the compiler was happily implicitly casting that int to a pointer...which was 64 bits. > > The compiler was silently cropping the top 32 bits off my pointers. > > And it all was a "feature" to make programming "easier." Good story for reminding ourselves of the advantages of type safety! Andrei |
November 25, 2008 Re: Treating the abusive unsigned syndrome | ||||
---|---|---|---|---|
| ||||
Posted in reply to Sergey Gromov | Sergey Gromov wrote: > Tue, 25 Nov 2008 11:06:32 -0600, Andrei Alexandrescu wrote: > >> I remembered a couple more details. The names bits8, bits16, bits32, and bits64 were a possible choice for undecided-sign integrals. Walter and I liked that quite some. Walter also suggested that we make those actually full types accessible to programmers. We both were concerned that they'd add to the already large panoply of integral types in D. Dropping bits8 and bits16 would reduce bloating at the cost of consistency. >> >> So we're contemplating: >> >> (a) Add bits8, bits16, bit32, bits64 public types. >> (b) Add bit32, bits64 public types. >> (c) Add bits8, bits16, bit32, bits64 compiler-internal types. >> (d) Add bit32, bits64 compiler-internal types. >> >> Make your pick or add more choices! > > I'll add more. :) > > The problem with signed/unsigned types is that neither int nor uint is a > sub-type of one another. They're essentially incompatible. Therefore a > possible solution is: > > 1. Disallow implicit signed <=> unsigned conversion. I forgot to mention that that's implied in the bitsNN approach too. > 2. For those willing to port large C/C++ codebases introduce a compiler > compatibility switch which would add global operators mimicking the C > behavior: > > uint opAdd(int, uint) > uint opAdd(uint, int) > ulong opAdd(long, ulong) > etc. Having semantics depend so heavily and confusingly on a compiler switch is extremely dangerous. Note that actually quite a lot of code will compile, with different semantics, with or without the switch. > This way you can even implement compatibility levels: only C-style > additions, or additions with multiplications, or complete compatibility > including the original signed/unsigned comparison behavior. I don't think we can pursue such a path. Andrei |
November 25, 2008 Re: Treating the abusive unsigned syndrome | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | Andrei Alexandrescu:
> The problem is, it's much more annoying than one might imagine. Even array.length - 1 is up for scrutiny. Technically, even array.length + 1 is a problem because 1 is really a signed int. We could provide exceptions for constants, but exceptions are generally not solving the core issue.
That can be solved making array.length signed.
Can you list few other annoying situations?
Bye,
bearophile
|
November 25, 2008 Re: Treating the abusive unsigned syndrome | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | == Quote from Andrei Alexandrescu (SeeWebsiteForEmail@erdani.org)'s article > > At the moment, we're in limbo regarding the decision to go forward with this. Walter, as many good long-time C programmers, knows the abusive unsigned rule so well he's not hurt by it and consequently has little incentive to see it as a problem. I have had to teach C and C++ to young students coming from Java introductory courses and have a more up-to-date perspective on the dangers. I'll address your actual suggestion separately, but personally, I always build C/C++ code at the max warning level, and treat warnings as errors. This typically catches all signed-unsigned interactions and requires me to add a cast for the build to succeed. The advantage of this is that if I see a cast in my code then I know that the statement is deliberate rather than accidental. I would wholeheartedly support such an approach in D as well, though I can see how this may not be terribly appealing to some experienced C/C++ programmers. Sean |
November 25, 2008 Re: Treating the abusive unsigned syndrome | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | Tue, 25 Nov 2008 15:49:23 -0600, Andrei Alexandrescu wrote:
> Sergey Gromov wrote:
>> 2. For those willing to port large C/C++ codebases introduce a compiler compatibility switch which would add global operators mimicking the C behavior:
>>
>> uint opAdd(int, uint)
>> uint opAdd(uint, int)
>> ulong opAdd(long, ulong)
>> etc.
>
> Having semantics depend so heavily and confusingly on a compiler switch is extremely dangerous. Note that actually quite a lot of code will compile, with different semantics, with or without the switch.
One of us should be missing something. There was no 'different semantics' in my proposal. The code either compiles and behaves exactly like in C or does not compile at all. The amount of code which compiles or fails depends on a compiler switch, not semantics.
|
November 25, 2008 Re: Treating the abusive unsigned syndrome | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | == Quote from Andrei Alexandrescu (SeeWebsiteForEmail@erdani.org)'s article > (You may want to check your system's date, unless of course you traveled > in time.) > Russell Lewis wrote: > > I'm of the opinion that we should make mixed-sign operations a compile-time error. I know that it would be annoying in some situations, but IMHO it gives you clearer, more reliable code. > The problem is, it's much more annoying than one might imagine. Even array.length - 1 is up for scrutiny. Technically, even array.length + 1 is a problem because 1 is really a signed int. We could provide exceptions for constants, but exceptions are generally not solving the core issue. Perhaps not, but the fact that constants are signed integers has been mentioned as a problem before. Would making these polysemous values help at all? That seems to be what your proposal is effectively trying to do anyway. Sean |
November 25, 2008 Re: Treating the abusive unsigned syndrome | ||||
---|---|---|---|---|
| ||||
Posted in reply to Sergey Gromov | Sergey Gromov wrote:
> Tue, 25 Nov 2008 15:49:23 -0600, Andrei Alexandrescu wrote:
>
>> Sergey Gromov wrote:
>>> 2. For those willing to port large C/C++ codebases introduce a compiler
>>> compatibility switch which would add global operators mimicking the C
>>> behavior:
>>>
>>> uint opAdd(int, uint)
>>> uint opAdd(uint, int)
>>> ulong opAdd(long, ulong)
>>> etc.
>> Having semantics depend so heavily and confusingly on a compiler switch is extremely dangerous. Note that actually quite a lot of code will compile, with different semantics, with or without the switch.
>
> One of us should be missing something. There was no 'different
> semantics' in my proposal. The code either compiles and behaves exactly
> like in C or does not compile at all. The amount of code which compiles
> or fails depends on a compiler switch, not semantics.
Sorry, I misunderstood.
Andrei
|
Copyright © 1999-2021 by the D Language Foundation