Treating the abusive unsigned syndrome (page 8) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Treating the abusive unsigned syndrome (page 8)

November 27, 2008

Re: Treating the abusive unsigned syndrome

Posted by Don
in reply to Andrei Alexandrescu

Don

Posted in reply to Andrei Alexandrescu

Andrei Alexandrescu wrote:
> Don wrote:
>> Andrei Alexandrescu wrote:
>>> One fear of mine is the reaction of throwing of hands in the air "how many integral types are enough???". However, if we're to judge by the addition of long long and a slew of typedefs to C99 and C++0x, the answer is "plenty". I'd be interested in gaging how people feel about adding two (bits64, bits32) or even four (bits64, bits32, bits16, and bits8) types as basic types. They'd be bitbags with undecided sign ready to be converted to their counterparts of decided sign.
>>
>> Here I think we have a fundamental disagreement: what is an 'unsigned int'? There are two disparate ideas:
>>
>> (A) You think that it is an approximation to a natural number, ie, a 'positive int'.
>> (B) I think that it is a 'number with NO sign'; that is, the sign depends on context. It may, for example, be part of a larger number. Thus, I largely agree with the C behaviour -- once you have an unsigned in a calculation, it's up to the programmer to provide an interpretation.
>>
>> Unfortunately, the two concepts are mashed together in C-family languages. (B) is the concept supported by the language typing rules, but usage of (A) is widespread in practice.
> 
> In fact we are in agreement. C tries to make it usable as both, and partially succeeds by having very lax conversions in all directions. This leads to the occasional puzzling behaviors. I do *want* uint to be an approximation of a natural number, while acknowledging that today it isn't much of that.
> 
>> If we were going to introduce a slew of new types, I'd want them to be for 'positive int'/'natural int', 'positive byte', etc.
>>
>> Natural int can always be implicitly converted to either int or uint, with perfect safety. No other conversions are possible without a cast.
>> Non-negative literals and manifest constants are naturals.
>>
>> The rules are:
>> 1. Anything involving unsigned is unsigned, (same as C).
>> 2. Else if it contains an integer, it is an integer.
>> 3. (Now we know all quantities are natural):
>> If it contains a subtraction, it is an integer [Probably allow subtraction of compile-time quantities to remain natural, if the values stay in range; flag an error if an overflow occurs].
>> 4. Else it is a natural.
>>
>>
>> The reason I think literals and manifest constants are so important is that they are a significant fraction of the natural numbers in a program.
>>
>> [Just before posting I've discovered that other people have posted some similar ideas].
> 
> That sounds encouraging. One problem is that your approach leaves the unsigned mess as it is, so although natural types are a nice addition, they don't bring a complete solution to the table.
> 
> 
> Andrei

Well, it does make unsigned numbers (case (B)) quite obscure and low-level. They could be renamed with uglier names to make this clearer.
But since in this proposal there are no implicit conversions from uint to anything, it's hard to do any damage with the unsigned type which results.
Basically, with any use of unsigned, the compiler says "I don't know if this thing even has a meaningful sign!".

Alternatively, we could add rule 0: mixing int and unsigned is illegal. But it's OK to mix natural with int, or natural with unsigned.
I don't like this as much, since it would make most usage of unsigned ugly; but maybe that's justified.

November 27, 2008

Re: Treating the abusive unsigned syndrome

Posted by Andrei Alexandrescu
in reply to Don

Andrei Alexandrescu

Posted in reply to Don

Don wrote:
> Andrei Alexandrescu wrote:
>> Don wrote:
>>> Andrei Alexandrescu wrote:
>>>> One fear of mine is the reaction of throwing of hands in the air "how many integral types are enough???". However, if we're to judge by the addition of long long and a slew of typedefs to C99 and C++0x, the answer is "plenty". I'd be interested in gaging how people feel about adding two (bits64, bits32) or even four (bits64, bits32, bits16, and bits8) types as basic types. They'd be bitbags with undecided sign ready to be converted to their counterparts of decided sign.
>>>
>>> Here I think we have a fundamental disagreement: what is an 'unsigned int'? There are two disparate ideas:
>>>
>>> (A) You think that it is an approximation to a natural number, ie, a 'positive int'.
>>> (B) I think that it is a 'number with NO sign'; that is, the sign depends on context. It may, for example, be part of a larger number. Thus, I largely agree with the C behaviour -- once you have an unsigned in a calculation, it's up to the programmer to provide an interpretation.
>>>
>>> Unfortunately, the two concepts are mashed together in C-family languages. (B) is the concept supported by the language typing rules, but usage of (A) is widespread in practice.
>>
>> In fact we are in agreement. C tries to make it usable as both, and partially succeeds by having very lax conversions in all directions. This leads to the occasional puzzling behaviors. I do *want* uint to be an approximation of a natural number, while acknowledging that today it isn't much of that.
>>
>>> If we were going to introduce a slew of new types, I'd want them to be for 'positive int'/'natural int', 'positive byte', etc.
>>>
>>> Natural int can always be implicitly converted to either int or uint, with perfect safety. No other conversions are possible without a cast.
>>> Non-negative literals and manifest constants are naturals.
>>>
>>> The rules are:
>>> 1. Anything involving unsigned is unsigned, (same as C).
>>> 2. Else if it contains an integer, it is an integer.
>>> 3. (Now we know all quantities are natural):
>>> If it contains a subtraction, it is an integer [Probably allow subtraction of compile-time quantities to remain natural, if the values stay in range; flag an error if an overflow occurs].
>>> 4. Else it is a natural.
>>>
>>>
>>> The reason I think literals and manifest constants are so important is that they are a significant fraction of the natural numbers in a program.
>>>
>>> [Just before posting I've discovered that other people have posted some similar ideas].
>>
>> That sounds encouraging. One problem is that your approach leaves the unsigned mess as it is, so although natural types are a nice addition, they don't bring a complete solution to the table.
>>
>>
>> Andrei
> 
> Well, it does make unsigned numbers (case (B)) quite obscure and low-level. They could be renamed with uglier names to make this clearer.
> But since in this proposal there are no implicit conversions from uint to anything, it's hard to do any damage with the unsigned type which results.
> Basically, with any use of unsigned, the compiler says "I don't know if this thing even has a meaningful sign!".
> 
> Alternatively, we could add rule 0: mixing int and unsigned is illegal. But it's OK to mix natural with int, or natural with unsigned.
> I don't like this as much, since it would make most usage of unsigned ugly; but maybe that's justified.

I think we're heading towards an impasse. We wouldn't want to make things much harder for systems-level programs that mix arithmetic and bit-level operations.

I'm glad there is interest and that quite a few ideas were brought up. Unfortunately, it looks like all have significant disadvantages.

One compromise solution Walter and I discussed in the past is to only sever one of the dangerous implicit conversions: int -> uint. Other than that, it's much like C (everything involving one unsigned is unsigned and unsigned -> signed is implicit) Let's see where that takes us.

(a) There are fewer situations when a small, reasonable number implicitly becomes a large, weird numnber.

(b) An exception to (a) is that u1 - u2 is also uint, and that's for the sake of C compatibility. I'd gladly drop it if I could and leave operations such as u1 - u2 return a signed number. That assumes the least and works with small, usual values.

(c) Unlike C, arithmetic and logical operations always return the tightest type possible, not a 32/64 bit value. For example, byte / int yields byte and so on.

What do you think?


Andrei

November 27, 2008

Re: Treating the abusive unsigned syndrome

Posted by Sean Kelly
in reply to Denis Koroskin

Sean Kelly

Posted in reply to Denis Koroskin

Denis Koroskin wrote:
> 27.11.08 в 03:46 Sean Kelly в своём письме писал(а):
> 
>> Andrei Alexandrescu wrote:
>>> Sean Kelly wrote:
>>>> Don wrote:
>>>>>
>>>>> Although it would be nice to have a type which was range-limited, 'uint' doesn't do it. Instead, it guarantees the number is between 0 and int.max*2+1 inclusive. Allowing mixed operations encourages programmers to focus the benefit of 'the lower bound is zero!' while forgetting that there is an enormous downside ('I'm saying that this could be larger than int.max!')
>>>>
>>>> This inspired me to think about where I use uint and I realized that I don't.  I use size_t for size/length representations (largely because sizes can theoretically be >2GB on a 32-bit system), and ubyte for bit-level stuff, but that's it.
>>>  For the record, I use unsigned types wherever there's a non-negative number involved (e.g. a count). So I'd be helped by better unsigned operations.
>>
>> To be fair, I generally use unsigned numbers for values that are logically always positive.  These just tend to be sizes and counts in my code.
>>
>>> I wonder how often these super-large arrays do occur on 32-bit systems. I do have programs that try to allocate as large a contiguous matrix as possible, but never sat down and tested whether a >2GB chunk was allocated on the Linux cluster I work on. I'm quite annoyed by this >2GB issue because it's a very practical and very rare issue in a weird contrast with a very principled issue (modeling natural numbers).
>>
>> Yeah, I have no idea how common they are, though my guess would be that they are rather uncommon.  As a library programmer, I simply must assume that they are in use, which is why I use size_t as a matter of course.
> 
> If they can be more than 2Gb, why can't they be more than 4GB? It is dangerous to assume that they won't, that's why uint is dangerous. You exchange one additional bit of information for safety, this is wrong.

Bigger than 4GB on a 32-bit system?  Files perhaps, but I'm talking about memory ranges here.

> Soon enough we won't use uints the same way we don't use ushorts (I should have asked if anyone uses ushort these day first, but there is so little gain to use  ushort as opposed to short or int that I consider it impractical). 64bit era will give us 64bit pointers and 64 bit counters. Do you think you will prefer ulong over long for an additional bit? You really shoudn't.

long vs. ulong for sizes is less of an issue, because we're a long way away from running against the limitations of a 63-bit size value.  The point of size_t to me, however, is that it scales automatically, so if I write array operations using size_t then I can be sure they will work on both a 32 and 64-bit system.

I do like Don's point about unsigned really meaning "unsigned" however, rather than "positive."  I clearly use unsigned numbers for both, even if I flag the "positive" uses via type alias such as size_t.  In C/C++ I rely on compiler warnings to trap the sort of mistakes we're talking about here, but I'd love a more logically sound solution if one could be found.


Sean

November 27, 2008

Re: Treating the abusive unsigned syndrome

Posted by KennyTM~
in reply to Andrei Alexandrescu

KennyTM~

Posted in reply to Andrei Alexandrescu

Andrei Alexandrescu wrote:
> Don wrote:
>> Andrei Alexandrescu wrote:
>>> Don wrote:
>>>> Andrei Alexandrescu wrote:
>>>>> One fear of mine is the reaction of throwing of hands in the air "how many integral types are enough???". However, if we're to judge by the addition of long long and a slew of typedefs to C99 and C++0x, the answer is "plenty". I'd be interested in gaging how people feel about adding two (bits64, bits32) or even four (bits64, bits32, bits16, and bits8) types as basic types. They'd be bitbags with undecided sign ready to be converted to their counterparts of decided sign.
>>>>
>>>> Here I think we have a fundamental disagreement: what is an 'unsigned int'? There are two disparate ideas:
>>>>
>>>> (A) You think that it is an approximation to a natural number, ie, a 'positive int'.
>>>> (B) I think that it is a 'number with NO sign'; that is, the sign depends on context. It may, for example, be part of a larger number. Thus, I largely agree with the C behaviour -- once you have an unsigned in a calculation, it's up to the programmer to provide an interpretation.
>>>>
>>>> Unfortunately, the two concepts are mashed together in C-family languages. (B) is the concept supported by the language typing rules, but usage of (A) is widespread in practice.
>>>
>>> In fact we are in agreement. C tries to make it usable as both, and partially succeeds by having very lax conversions in all directions. This leads to the occasional puzzling behaviors. I do *want* uint to be an approximation of a natural number, while acknowledging that today it isn't much of that.
>>>
>>>> If we were going to introduce a slew of new types, I'd want them to be for 'positive int'/'natural int', 'positive byte', etc.
>>>>
>>>> Natural int can always be implicitly converted to either int or uint, with perfect safety. No other conversions are possible without a cast.
>>>> Non-negative literals and manifest constants are naturals.
>>>>
>>>> The rules are:
>>>> 1. Anything involving unsigned is unsigned, (same as C).
>>>> 2. Else if it contains an integer, it is an integer.
>>>> 3. (Now we know all quantities are natural):
>>>> If it contains a subtraction, it is an integer [Probably allow subtraction of compile-time quantities to remain natural, if the values stay in range; flag an error if an overflow occurs].
>>>> 4. Else it is a natural.
>>>>
>>>>
>>>> The reason I think literals and manifest constants are so important is that they are a significant fraction of the natural numbers in a program.
>>>>
>>>> [Just before posting I've discovered that other people have posted some similar ideas].
>>>
>>> That sounds encouraging. One problem is that your approach leaves the unsigned mess as it is, so although natural types are a nice addition, they don't bring a complete solution to the table.
>>>
>>>
>>> Andrei
>>
>> Well, it does make unsigned numbers (case (B)) quite obscure and low-level. They could be renamed with uglier names to make this clearer.
>> But since in this proposal there are no implicit conversions from uint to anything, it's hard to do any damage with the unsigned type which results.
>> Basically, with any use of unsigned, the compiler says "I don't know if this thing even has a meaningful sign!".
>>
>> Alternatively, we could add rule 0: mixing int and unsigned is illegal. But it's OK to mix natural with int, or natural with unsigned.
>> I don't like this as much, since it would make most usage of unsigned ugly; but maybe that's justified.
> 
> I think we're heading towards an impasse. We wouldn't want to make things much harder for systems-level programs that mix arithmetic and bit-level operations.
> 
> I'm glad there is interest and that quite a few ideas were brought up. Unfortunately, it looks like all have significant disadvantages.
> 
> One compromise solution Walter and I discussed in the past is to only sever one of the dangerous implicit conversions: int -> uint. Other than that, it's much like C (everything involving one unsigned is unsigned and unsigned -> signed is implicit) Let's see where that takes us.
> 
> (a) There are fewer situations when a small, reasonable number implicitly becomes a large, weird numnber.
> 
> (b) An exception to (a) is that u1 - u2 is also uint, and that's for the sake of C compatibility. I'd gladly drop it if I could and leave operations such as u1 - u2 return a signed number. That assumes the least and works with small, usual values.
> 
> (c) Unlike C, arithmetic and logical operations always return the tightest type possible, not a 32/64 bit value. For example, byte / int yields byte and so on.
> 

So you mean long * int (e.g. 1234567890123L * 2) will return an int instead of a long?!

The opposite sounds more natural to me.

> What do you think?
> 
> 
> Andrei

November 27, 2008

Re: Treating the abusive unsigned syndrome

Posted by KennyTM~
in reply to KennyTM~

KennyTM~

Posted in reply to KennyTM~

KennyTM~ wrote:
> Andrei Alexandrescu wrote:
>> Don wrote:
>>> Andrei Alexandrescu wrote:
>>>> Don wrote:
>>>>> Andrei Alexandrescu wrote:
>>>>>> One fear of mine is the reaction of throwing of hands in the air "how many integral types are enough???". However, if we're to judge by the addition of long long and a slew of typedefs to C99 and C++0x, the answer is "plenty". I'd be interested in gaging how people feel about adding two (bits64, bits32) or even four (bits64, bits32, bits16, and bits8) types as basic types. They'd be bitbags with undecided sign ready to be converted to their counterparts of decided sign.
>>>>>
>>>>> Here I think we have a fundamental disagreement: what is an 'unsigned int'? There are two disparate ideas:
>>>>>
>>>>> (A) You think that it is an approximation to a natural number, ie, a 'positive int'.
>>>>> (B) I think that it is a 'number with NO sign'; that is, the sign depends on context. It may, for example, be part of a larger number. Thus, I largely agree with the C behaviour -- once you have an unsigned in a calculation, it's up to the programmer to provide an interpretation.
>>>>>
>>>>> Unfortunately, the two concepts are mashed together in C-family languages. (B) is the concept supported by the language typing rules, but usage of (A) is widespread in practice.
>>>>
>>>> In fact we are in agreement. C tries to make it usable as both, and partially succeeds by having very lax conversions in all directions. This leads to the occasional puzzling behaviors. I do *want* uint to be an approximation of a natural number, while acknowledging that today it isn't much of that.
>>>>
>>>>> If we were going to introduce a slew of new types, I'd want them to be for 'positive int'/'natural int', 'positive byte', etc.
>>>>>
>>>>> Natural int can always be implicitly converted to either int or uint, with perfect safety. No other conversions are possible without a cast.
>>>>> Non-negative literals and manifest constants are naturals.
>>>>>
>>>>> The rules are:
>>>>> 1. Anything involving unsigned is unsigned, (same as C).
>>>>> 2. Else if it contains an integer, it is an integer.
>>>>> 3. (Now we know all quantities are natural):
>>>>> If it contains a subtraction, it is an integer [Probably allow subtraction of compile-time quantities to remain natural, if the values stay in range; flag an error if an overflow occurs].
>>>>> 4. Else it is a natural.
>>>>>
>>>>>
>>>>> The reason I think literals and manifest constants are so important is that they are a significant fraction of the natural numbers in a program.
>>>>>
>>>>> [Just before posting I've discovered that other people have posted some similar ideas].
>>>>
>>>> That sounds encouraging. One problem is that your approach leaves the unsigned mess as it is, so although natural types are a nice addition, they don't bring a complete solution to the table.
>>>>
>>>>
>>>> Andrei
>>>
>>> Well, it does make unsigned numbers (case (B)) quite obscure and low-level. They could be renamed with uglier names to make this clearer.
>>> But since in this proposal there are no implicit conversions from uint to anything, it's hard to do any damage with the unsigned type which results.
>>> Basically, with any use of unsigned, the compiler says "I don't know if this thing even has a meaningful sign!".
>>>
>>> Alternatively, we could add rule 0: mixing int and unsigned is illegal. But it's OK to mix natural with int, or natural with unsigned.
>>> I don't like this as much, since it would make most usage of unsigned ugly; but maybe that's justified.
>>
>> I think we're heading towards an impasse. We wouldn't want to make things much harder for systems-level programs that mix arithmetic and bit-level operations.
>>
>> I'm glad there is interest and that quite a few ideas were brought up. Unfortunately, it looks like all have significant disadvantages.
>>
>> One compromise solution Walter and I discussed in the past is to only sever one of the dangerous implicit conversions: int -> uint. Other than that, it's much like C (everything involving one unsigned is unsigned and unsigned -> signed is implicit) Let's see where that takes us.
>>
>> (a) There are fewer situations when a small, reasonable number implicitly becomes a large, weird numnber.
>>
>> (b) An exception to (a) is that u1 - u2 is also uint, and that's for the sake of C compatibility. I'd gladly drop it if I could and leave operations such as u1 - u2 return a signed number. That assumes the least and works with small, usual values.
>>
>> (c) Unlike C, arithmetic and logical operations always return the tightest type possible, not a 32/64 bit value. For example, byte / int yields byte and so on.
>>
> 
> So you mean long * int (e.g. 1234567890123L * 2) will return an int instead of a long?!
> 
> The opposite sounds more natural to me.
> 

Em, or do you mean the tightest type that can represent all possible results? (so long*int == cent?)

>> What do you think?
>>
>>
>> Andrei

November 27, 2008

Re: Treating the abusive unsigned syndrome

Posted by Andrei Alexandrescu
in reply to KennyTM~

Andrei Alexandrescu

Posted in reply to KennyTM~

KennyTM~ wrote:
> KennyTM~ wrote:
>> Andrei Alexandrescu wrote:
>>> Don wrote:
>>>> Andrei Alexandrescu wrote:
>>>>> Don wrote:
>>>>>> Andrei Alexandrescu wrote:
>>>>>>> One fear of mine is the reaction of throwing of hands in the air "how many integral types are enough???". However, if we're to judge by the addition of long long and a slew of typedefs to C99 and C++0x, the answer is "plenty". I'd be interested in gaging how people feel about adding two (bits64, bits32) or even four (bits64, bits32, bits16, and bits8) types as basic types. They'd be bitbags with undecided sign ready to be converted to their counterparts of decided sign.
>>>>>>
>>>>>> Here I think we have a fundamental disagreement: what is an 'unsigned int'? There are two disparate ideas:
>>>>>>
>>>>>> (A) You think that it is an approximation to a natural number, ie, a 'positive int'.
>>>>>> (B) I think that it is a 'number with NO sign'; that is, the sign depends on context. It may, for example, be part of a larger number. Thus, I largely agree with the C behaviour -- once you have an unsigned in a calculation, it's up to the programmer to provide an interpretation.
>>>>>>
>>>>>> Unfortunately, the two concepts are mashed together in C-family languages. (B) is the concept supported by the language typing rules, but usage of (A) is widespread in practice.
>>>>>
>>>>> In fact we are in agreement. C tries to make it usable as both, and partially succeeds by having very lax conversions in all directions. This leads to the occasional puzzling behaviors. I do *want* uint to be an approximation of a natural number, while acknowledging that today it isn't much of that.
>>>>>
>>>>>> If we were going to introduce a slew of new types, I'd want them to be for 'positive int'/'natural int', 'positive byte', etc.
>>>>>>
>>>>>> Natural int can always be implicitly converted to either int or uint, with perfect safety. No other conversions are possible without a cast.
>>>>>> Non-negative literals and manifest constants are naturals.
>>>>>>
>>>>>> The rules are:
>>>>>> 1. Anything involving unsigned is unsigned, (same as C).
>>>>>> 2. Else if it contains an integer, it is an integer.
>>>>>> 3. (Now we know all quantities are natural):
>>>>>> If it contains a subtraction, it is an integer [Probably allow subtraction of compile-time quantities to remain natural, if the values stay in range; flag an error if an overflow occurs].
>>>>>> 4. Else it is a natural.
>>>>>>
>>>>>>
>>>>>> The reason I think literals and manifest constants are so important is that they are a significant fraction of the natural numbers in a program.
>>>>>>
>>>>>> [Just before posting I've discovered that other people have posted some similar ideas].
>>>>>
>>>>> That sounds encouraging. One problem is that your approach leaves the unsigned mess as it is, so although natural types are a nice addition, they don't bring a complete solution to the table.
>>>>>
>>>>>
>>>>> Andrei
>>>>
>>>> Well, it does make unsigned numbers (case (B)) quite obscure and low-level. They could be renamed with uglier names to make this clearer.
>>>> But since in this proposal there are no implicit conversions from uint to anything, it's hard to do any damage with the unsigned type which results.
>>>> Basically, with any use of unsigned, the compiler says "I don't know if this thing even has a meaningful sign!".
>>>>
>>>> Alternatively, we could add rule 0: mixing int and unsigned is illegal. But it's OK to mix natural with int, or natural with unsigned.
>>>> I don't like this as much, since it would make most usage of unsigned ugly; but maybe that's justified.
>>>
>>> I think we're heading towards an impasse. We wouldn't want to make things much harder for systems-level programs that mix arithmetic and bit-level operations.
>>>
>>> I'm glad there is interest and that quite a few ideas were brought up. Unfortunately, it looks like all have significant disadvantages.
>>>
>>> One compromise solution Walter and I discussed in the past is to only sever one of the dangerous implicit conversions: int -> uint. Other than that, it's much like C (everything involving one unsigned is unsigned and unsigned -> signed is implicit) Let's see where that takes us.
>>>
>>> (a) There are fewer situations when a small, reasonable number implicitly becomes a large, weird numnber.
>>>
>>> (b) An exception to (a) is that u1 - u2 is also uint, and that's for the sake of C compatibility. I'd gladly drop it if I could and leave operations such as u1 - u2 return a signed number. That assumes the least and works with small, usual values.
>>>
>>> (c) Unlike C, arithmetic and logical operations always return the tightest type possible, not a 32/64 bit value. For example, byte / int yields byte and so on.
>>>
>>
>> So you mean long * int (e.g. 1234567890123L * 2) will return an int instead of a long?!
>>
>> The opposite sounds more natural to me.
>>
> 
> Em, or do you mean the tightest type that can represent all possible results? (so long*int == cent?)

The tightest type possible depends on the operation. In that doctrine, long * int yields a long (given the demise of cent). Walters things such rules are too complicated, but I'm a big fan of operation-dependent typing. I see no good reason for requiring int * long have the same type as int / long. They are different operations with different semantics and corner cases and whatnot, so the resulting static type may as well be different.

By the way, under the tightest type doctrine, uint & ubyte is typed as ubyte. Interesting that one, huh :o).


Andrei

November 27, 2008

Re: Treating the abusive unsigned syndrome

Posted by Derek Parnell
in reply to Andrei Alexandrescu

Derek Parnell

Posted in reply to Andrei Alexandrescu

On Tue, 25 Nov 2008 09:59:01 -0600, Andrei Alexandrescu wrote:

> D pursues compatibility with C and C++ in the following manner: if a code snippet compiles in both C and D or C++ and D, then it should have the same semantics.

Interesting ... but I don't think that this should be the principle employed. If code is 'naughty' in C/C++ then D should not also produce the same results.

I would propose that a better principle to be used would be that the compiler will not allow loss or distortion of information without the coder/reader being made aware of it.

> (1) u + i, i + u
> (2) u - i, i - u
> (3) u - u
> (4) u * i, i * u, u / i, i / u, u % i, i % u (compatibility with C
> requires that these all return unsigned, ouch)
> (5) u < i, i < u, u <= i etc. (all ordering comparisons)
> (6) -u

Note that "(3) u - u" and "(6) -u" seem to be really a use of (4), namely
"(-1 * u)".

I am assming that there is no difference between 'unsigned' and 'positive', in so much as I am not treating 'unsigned' as 'sign unknown/irrelevant'.

It seems to me that the issue then is not so much one of sign but of size.
It needs an extra bit to hold the sign information thus a 32-bit unsigned
value needs a minimum of 33 bits to convert it to a signed equivalent.

In the types (1) - (4) above, I would have the compiler compute a signed
type for these. Then if the target of the result is a signed type AND
larger than the 'unsigned' portion used, then the complier would not have
to complain. In every other case the complier should complain because of
the potential for information loss. To avoid the complaint, the coder would
need to either change the result type, the input types or add a 'message'
to the compliler that in effects says "I know what I'm doing, ok?" - I
suggest a cast would suffice.

In those cases where the target type is not explicitly coded, such as using 'auto' or as a temporary value in an expression, the compiler should assume a signed type that is 'one step' larger than the 'unsigned' element in the expression.

e.g.
   auto x = int * uint; ==> 'x' is long.

If this causes code to be incompatible to C/C++, then it implies that the C/C++ code was poor (i.e. potential information loss) in the first place and deserves to be fixed up.

The scenario (5) above should also include equality comparisions, and should cause the compiler to issue a message AND generate code like ...

   if (u < i)  ====> if ( i < 0 ? false : u < cast(typeof(u))i)
   if (u <= i) ====> if ( i < 0 ? false : u <= cast(typeof(u))i)
   if (u = i)  ====> if ( i < 0 ? false : u = cast(typeof(u))i)
   if (u >= i) ====> if ( i < 0 ? true  : u >= cast(typeof(u))i)
   if (u > i)  ====> if ( i < 0 ? true  : u > cast(typeof(u))i)

The coder should be able to avoid the message and the suboptimal generated code my adding a cast ...

  if (u < cast(typeof u)i)

I am also assuming that syntax 'cast(unsigned-type)signed-type' is telling the complier to assume that the bits in the signed-value already represent a valid unsigned-value and so therefore the compiler should not generate code to 'transform' the signed-value bits to form an unsigned-value.

To summarize,
(1) Perpetuating poor quality C/C++ code should not be encouraged.
(2) The compiler should help the coder be aware of potential information
loss.
(3) The coder should have mechanisms to override the compiler's concerns.

-- 
Derek Parnell
Melbourne, Australia
skype: derek.j.parnell

November 27, 2008

Re: Treating the abusive unsigned syndrome

Posted by Andrei Alexandrescu
in reply to Derek Parnell

Andrei Alexandrescu

Posted in reply to Derek Parnell

Derek Parnell wrote:
> On Tue, 25 Nov 2008 09:59:01 -0600, Andrei Alexandrescu wrote:
> 
>> D pursues compatibility with C and C++ in the following manner: if a code snippet compiles in both C and D or C++ and D, then it should have the same semantics.
> 
> Interesting ... but I don't think that this should be the principle
> employed. If code is 'naughty' in C/C++ then D should not also produce the
> same results.
> 
> I would propose that a better principle to be used would be that the
> compiler will not allow loss or distortion of information without the
> coder/reader being made aware of it.

These two principle are not necessarily at odds with each other. The idea of being compatible with C and C++ is simple: if I paste a C function from somewhere into a D module, the function should either not compile, or compile and run with the same result. I think that's quite reasonable. So if the C code is behaving naughtily, D doesn't need to also behave naughty. It should just not compile.

>> (1) u + i, i + u
>> (2) u - i, i - u
>> (3) u - u
>> (4) u * i, i * u, u / i, i / u, u % i, i % u (compatibility with C requires that these all return unsigned, ouch)
>> (5) u < i, i < u, u <= i etc. (all ordering comparisons)
>> (6) -u
> 
> Note that "(3) u - u" and "(6) -u" seem to be really a use of (4), namely
> "(-1 * u)".

Correct.

> I am assming that there is no difference between 'unsigned' and 'positive',
> in so much as I am not treating 'unsigned' as 'sign unknown/irrelevant'. 
> 
> It seems to me that the issue then is not so much one of sign but of size.
> It needs an extra bit to hold the sign information thus a 32-bit unsigned
> value needs a minimum of 33 bits to convert it to a signed equivalent.
>  In the types (1) - (4) above, I would have the compiler compute a signed
> type for these. Then if the target of the result is a signed type AND
> larger than the 'unsigned' portion used, then the complier would not have
> to complain. In every other case the complier should complain because of
> the potential for information loss. To avoid the complaint, the coder would
> need to either change the result type, the input types or add a 'message'
> to the compliler that in effects says "I know what I'm doing, ok?" - I
> suggest a cast would suffice.
> 
> In those cases where the target type is not explicitly coded, such as using
> 'auto' or as a temporary value in an expression, the compiler should assume
> a signed type that is 'one step' larger than the 'unsigned' element in the
> expression.
> 
> e.g.
>    auto x = int * uint; ==> 'x' is long.

I don't think this will fly with Walter.

> If this causes code to be incompatible to C/C++, then it implies that the
> C/C++ code was poor (i.e. potential information loss) in the first place
> and deserves to be fixed up.

I don't quite think so. As long as the values are within range, the multiplication is legit and efficient.

> The scenario (5) above should also include equality comparisions, and
> should cause the compiler to issue a message AND generate code like ...
> 
>    if (u < i)  ====> if ( i < 0 ? false : u < cast(typeof(u))i)
>    if (u <= i) ====> if ( i < 0 ? false : u <= cast(typeof(u))i)
>    if (u = i)  ====> if ( i < 0 ? false : u = cast(typeof(u))i)
>    if (u >= i) ====> if ( i < 0 ? true  : u >= cast(typeof(u))i)
>    if (u > i)  ====> if ( i < 0 ? true  : u > cast(typeof(u))i)
> 
> The coder should be able to avoid the message and the suboptimal generated
> code my adding a cast ...
> 
>   if (u < cast(typeof u)i) 

Yah, comparisons need to be looked at too.


Andrei

November 27, 2008

Re: Treating the abusive unsigned syndrome

Posted by Derek Parnell
in reply to Andrei Alexandrescu

Derek Parnell

Posted in reply to Andrei Alexandrescu

On Thu, 27 Nov 2008 16:23:12 -0600, Andrei Alexandrescu wrote:

> Derek Parnell wrote:
>> On Tue, 25 Nov 2008 09:59:01 -0600, Andrei Alexandrescu wrote:
>> 
>>> D pursues compatibility with C and C++ in the following manner: if a code snippet compiles in both C and D or C++ and D, then it should have the same semantics.
>> 
>> Interesting ... but I don't think that this should be the principle employed. If code is 'naughty' in C/C++ then D should not also produce the same results.
>> 
>> I would propose that a better principle to be used would be that the compiler will not allow loss or distortion of information without the coder/reader being made aware of it.
> 
> These two principle are not necessarily at odds with each other. The idea of being compatible with C and C++ is simple: if I paste a C function from somewhere into a D module, the function should either not compile, or compile and run with the same result. I think that's quite reasonable. So if the C code is behaving naughtily, D doesn't need to also behave naughty. It should just not compile.

I think we are saying the same thing. If the C code compiles AND if it has the potential to lose information then the D compiler should not compile it *if* the coder has not given explicit permission to the compiler to do so.

>> In those cases where the target type is not explicitly coded, such as using 'auto' or as a temporary value in an expression, the compiler should assume a signed type that is 'one step' larger than the 'unsigned' element in the expression.
>> 
>> e.g.
>>    auto x = int * uint; ==> 'x' is long.
> 
> I don't think this will fly with Walter.

And that there is our single point of failure.

>> If this causes code to be incompatible to C/C++, then it implies that the C/C++ code was poor (i.e. potential information loss) in the first place and deserves to be fixed up.
> 
> I don't quite think so. As long as the values are within range, the multiplication is legit and efficient.

Of course. *If* the compiler can determine that the result will not lose information when being used, then it is fine. However, that is not always going to be the case.

-- 
Derek Parnell
Melbourne, Australia
skype: derek.j.parnell

November 28, 2008

Re: Treating the abusive unsigned syndrome

Posted by Andrei Alexandrescu
in reply to Derek Parnell

Andrei Alexandrescu

Posted in reply to Derek Parnell

Derek Parnell wrote:
> On Thu, 27 Nov 2008 16:23:12 -0600, Andrei Alexandrescu wrote:
> 
>> Derek Parnell wrote:
>>> On Tue, 25 Nov 2008 09:59:01 -0600, Andrei Alexandrescu wrote:
>>>
>>>> D pursues compatibility with C and C++ in the following manner: if a code snippet compiles in both C and D or C++ and D, then it should have the same semantics.
>>> Interesting ... but I don't think that this should be the principle
>>> employed. If code is 'naughty' in C/C++ then D should not also produce the
>>> same results.
>>>
>>> I would propose that a better principle to be used would be that the
>>> compiler will not allow loss or distortion of information without the
>>> coder/reader being made aware of it.
>> These two principle are not necessarily at odds with each other. The idea of being compatible with C and C++ is simple: if I paste a C function from somewhere into a D module, the function should either not compile, or compile and run with the same result. I think that's quite reasonable. So if the C code is behaving naughtily, D doesn't need to also behave naughty. It should just not compile.
> 
> I think we are saying the same thing. If the C code compiles AND if it has
> the potential to lose information then the D compiler should not compile it
> *if* the coder has not given explicit permission to the compiler to do so.

Oh, sorry. Yes, absolutely!

>>> In those cases where the target type is not explicitly coded, such as using
>>> 'auto' or as a temporary value in an expression, the compiler should assume
>>> a signed type that is 'one step' larger than the 'unsigned' element in the
>>> expression.
>>>
>>> e.g.
>>>    auto x = int * uint; ==> 'x' is long.
>> I don't think this will fly with Walter.
> 
> And that there is our single point of failure. 
> 
>>> If this causes code to be incompatible to C/C++, then it implies that the
>>> C/C++ code was poor (i.e. potential information loss) in the first place
>>> and deserves to be fixed up.
>> I don't quite think so. As long as the values are within range, the multiplication is legit and efficient.
> 
> Of course. *If* the compiler can determine that the result will not lose
> information when being used, then it is fine. However, that is not always
> going to be the case.

Well here are two objective at odds with each other. One is the systems-y level-y aspect: on 32-bit systems there is a 32-bit multiplication operation that ought to be mapped to naturally by the 32-bit D primitive. I think there is some good reason to expect that. Then there's also the argument you're making - and with which I agree - that 32-bit multiplication really yields a 64-bit value, so the type of the result should be long.

But if we really start down that path, infinite-precision integrals are the only solution. Because when you multiply two longs, you'd need something even longer and so on.

Anyhow, the ultimate reality is: we won't be able to satisfy every objective we have. We'll need to strike a good compromise.


Andrei

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation