Treating the abusive unsigned syndrome (page 9) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Treating the abusive unsigned syndrome (page 9)

November 28, 2008

Re: Treating the abusive unsigned syndrome

Posted by Andrei Alexandrescu
in reply to Andrei Alexandrescu

Andrei Alexandrescu

Posted in reply to Andrei Alexandrescu

Andrei Alexandrescu wrote:
> KennyTM~ wrote:
>> KennyTM~ wrote:
>>> Andrei Alexandrescu wrote:
>>>> Don wrote:
>>>>> Andrei Alexandrescu wrote:
>>>>>> Don wrote:
>>>>>>> Andrei Alexandrescu wrote:
>>>>>>>> One fear of mine is the reaction of throwing of hands in the air "how many integral types are enough???". However, if we're to judge by the addition of long long and a slew of typedefs to C99 and C++0x, the answer is "plenty". I'd be interested in gaging how people feel about adding two (bits64, bits32) or even four (bits64, bits32, bits16, and bits8) types as basic types. They'd be bitbags with undecided sign ready to be converted to their counterparts of decided sign.
>>>>>>>
>>>>>>> Here I think we have a fundamental disagreement: what is an 'unsigned int'? There are two disparate ideas:
>>>>>>>
>>>>>>> (A) You think that it is an approximation to a natural number, ie, a 'positive int'.
>>>>>>> (B) I think that it is a 'number with NO sign'; that is, the sign depends on context. It may, for example, be part of a larger number. Thus, I largely agree with the C behaviour -- once you have an unsigned in a calculation, it's up to the programmer to provide an interpretation.
>>>>>>>
>>>>>>> Unfortunately, the two concepts are mashed together in C-family languages. (B) is the concept supported by the language typing rules, but usage of (A) is widespread in practice.
>>>>>>
>>>>>> In fact we are in agreement. C tries to make it usable as both, and partially succeeds by having very lax conversions in all directions. This leads to the occasional puzzling behaviors. I do *want* uint to be an approximation of a natural number, while acknowledging that today it isn't much of that.
>>>>>>
>>>>>>> If we were going to introduce a slew of new types, I'd want them to be for 'positive int'/'natural int', 'positive byte', etc.
>>>>>>>
>>>>>>> Natural int can always be implicitly converted to either int or uint, with perfect safety. No other conversions are possible without a cast.
>>>>>>> Non-negative literals and manifest constants are naturals.
>>>>>>>
>>>>>>> The rules are:
>>>>>>> 1. Anything involving unsigned is unsigned, (same as C).
>>>>>>> 2. Else if it contains an integer, it is an integer.
>>>>>>> 3. (Now we know all quantities are natural):
>>>>>>> If it contains a subtraction, it is an integer [Probably allow subtraction of compile-time quantities to remain natural, if the values stay in range; flag an error if an overflow occurs].
>>>>>>> 4. Else it is a natural.
>>>>>>>
>>>>>>>
>>>>>>> The reason I think literals and manifest constants are so important is that they are a significant fraction of the natural numbers in a program.
>>>>>>>
>>>>>>> [Just before posting I've discovered that other people have posted some similar ideas].
>>>>>>
>>>>>> That sounds encouraging. One problem is that your approach leaves the unsigned mess as it is, so although natural types are a nice addition, they don't bring a complete solution to the table.
>>>>>>
>>>>>>
>>>>>> Andrei
>>>>>
>>>>> Well, it does make unsigned numbers (case (B)) quite obscure and low-level. They could be renamed with uglier names to make this clearer.
>>>>> But since in this proposal there are no implicit conversions from uint to anything, it's hard to do any damage with the unsigned type which results.
>>>>> Basically, with any use of unsigned, the compiler says "I don't know if this thing even has a meaningful sign!".
>>>>>
>>>>> Alternatively, we could add rule 0: mixing int and unsigned is illegal. But it's OK to mix natural with int, or natural with unsigned.
>>>>> I don't like this as much, since it would make most usage of unsigned ugly; but maybe that's justified.
>>>>
>>>> I think we're heading towards an impasse. We wouldn't want to make things much harder for systems-level programs that mix arithmetic and bit-level operations.
>>>>
>>>> I'm glad there is interest and that quite a few ideas were brought up. Unfortunately, it looks like all have significant disadvantages.
>>>>
>>>> One compromise solution Walter and I discussed in the past is to only sever one of the dangerous implicit conversions: int -> uint. Other than that, it's much like C (everything involving one unsigned is unsigned and unsigned -> signed is implicit) Let's see where that takes us.
>>>>
>>>> (a) There are fewer situations when a small, reasonable number implicitly becomes a large, weird numnber.
>>>>
>>>> (b) An exception to (a) is that u1 - u2 is also uint, and that's for the sake of C compatibility. I'd gladly drop it if I could and leave operations such as u1 - u2 return a signed number. That assumes the least and works with small, usual values.
>>>>
>>>> (c) Unlike C, arithmetic and logical operations always return the tightest type possible, not a 32/64 bit value. For example, byte / int yields byte and so on.
>>>>
>>>
>>> So you mean long * int (e.g. 1234567890123L * 2) will return an int instead of a long?!
>>>
>>> The opposite sounds more natural to me.
>>>
>>
>> Em, or do you mean the tightest type that can represent all possible results? (so long*int == cent?)
> 
> The tightest type possible depends on the operation. In that doctrine, long * int yields a long (given the demise of cent). Walters things such rules are too complicated, but I'm a big fan of operation-dependent typing. I see no good reason for requiring int * long have the same type as int / long. They are different operations with different semantics and corner cases and whatnot, so the resulting static type may as well be different.
> 
> By the way, under the tightest type doctrine, uint & ubyte is typed as ubyte. Interesting that one, huh :o).
> 
> 
> Andrei

I just remembered a problem with simplemindedly going with the tightest type. Consider:

uint a = ...;
ubyte b = ...;
auto c = a & b;
c <<= 16;
...

The programmer may reasonably expect that the bitwise operation yields an unsigned integer because it involved one. However, the zealous compiler cleverly notices the operation really never yields something larger than a ubyte, and therefore returns that "tightest" type, thus making c a ubyte. Subsequent uses of c will be surprising to the programmer who thought c has 32 bits.

It looks like polysemy is the only solution here: return a polysemous value with principal type uint and possible type ubyte. That way, c will be typed as uint. But at the same time, continuing the example:

ubyte d = a & b;

will go through without a cast. That's pretty cool!

One question I had is: say polysemy will be at work for integral arithmetic. Should we provide means in the language for user-defined polysemous functions? Or is it ok to leave it as compiler magic that saves redundant casts?


Andrei

November 28, 2008

Re: Treating the abusive unsigned syndrome

Posted by bearophile
in reply to Andrei Alexandrescu

bearophile

Posted in reply to Andrei Alexandrescu

Some of the purposes of a good arithmetic are:
- To give the system programmer freedom, essentially to use all the speed and flexibility of the CPU instructions.
- To allow fast-running code, it means that having ways to specify 32 or 64 bit operations in a short way.
- To allow programs that aren't bug-prone, both with compile-time safeties and where they aren't enough with run-time ones (array bounds, arithmetic overflow among not-long types, etc).
- Allow more flexibility, coming from certain usages of multi-precision integers.
- Good CommonLisp implementations are supposed to allow both fast code (fixnums) and safe/multiprecision integers (and even untagged fixnums).


Andrei Alexandrescu:
> But if we really start down that path, infinite-precision integrals are the only solution. Because when you multiply two longs, you'd need something even longer and so on.

Well, having built-in multi-precision integer values isn't bad. You then need ways to specify where you want the compiler to use fixed length numbers, for more efficiency.

Bye,
bearophile

November 28, 2008

Re: Treating the abusive unsigned syndrome

Posted by Don
in reply to Andrei Alexandrescu

Don

Posted in reply to Andrei Alexandrescu

Andrei Alexandrescu wrote:
> KennyTM~ wrote:
>> KennyTM~ wrote:
>>> Andrei Alexandrescu wrote:
>>>> Don wrote:
>>>>> Andrei Alexandrescu wrote:
>>>>>> Don wrote:
>>>>>>> Andrei Alexandrescu wrote:
>>>>>>>> One fear of mine is the reaction of throwing of hands in the air "how many integral types are enough???". However, if we're to judge by the addition of long long and a slew of typedefs to C99 and C++0x, the answer is "plenty". I'd be interested in gaging how people feel about adding two (bits64, bits32) or even four (bits64, bits32, bits16, and bits8) types as basic types. They'd be bitbags with undecided sign ready to be converted to their counterparts of decided sign.
>>>>>>>
>>>>>>> Here I think we have a fundamental disagreement: what is an 'unsigned int'? There are two disparate ideas:
>>>>>>>
>>>>>>> (A) You think that it is an approximation to a natural number, ie, a 'positive int'.
>>>>>>> (B) I think that it is a 'number with NO sign'; that is, the sign depends on context. It may, for example, be part of a larger number. Thus, I largely agree with the C behaviour -- once you have an unsigned in a calculation, it's up to the programmer to provide an interpretation.
>>>>>>>
>>>>>>> Unfortunately, the two concepts are mashed together in C-family languages. (B) is the concept supported by the language typing rules, but usage of (A) is widespread in practice.
>>>>>>
>>>>>> In fact we are in agreement. C tries to make it usable as both, and partially succeeds by having very lax conversions in all directions. This leads to the occasional puzzling behaviors. I do *want* uint to be an approximation of a natural number, while acknowledging that today it isn't much of that.
>>>>>>
>>>>>>> If we were going to introduce a slew of new types, I'd want them to be for 'positive int'/'natural int', 'positive byte', etc.
>>>>>>>
>>>>>>> Natural int can always be implicitly converted to either int or uint, with perfect safety. No other conversions are possible without a cast.
>>>>>>> Non-negative literals and manifest constants are naturals.
>>>>>>>
>>>>>>> The rules are:
>>>>>>> 1. Anything involving unsigned is unsigned, (same as C).
>>>>>>> 2. Else if it contains an integer, it is an integer.
>>>>>>> 3. (Now we know all quantities are natural):
>>>>>>> If it contains a subtraction, it is an integer [Probably allow subtraction of compile-time quantities to remain natural, if the values stay in range; flag an error if an overflow occurs].
>>>>>>> 4. Else it is a natural.
>>>>>>>
>>>>>>>
>>>>>>> The reason I think literals and manifest constants are so important is that they are a significant fraction of the natural numbers in a program.
>>>>>>>
>>>>>>> [Just before posting I've discovered that other people have posted some similar ideas].
>>>>>>
>>>>>> That sounds encouraging. One problem is that your approach leaves the unsigned mess as it is, so although natural types are a nice addition, they don't bring a complete solution to the table.
>>>>>>
>>>>>>
>>>>>> Andrei
>>>>>
>>>>> Well, it does make unsigned numbers (case (B)) quite obscure and low-level. They could be renamed with uglier names to make this clearer.
>>>>> But since in this proposal there are no implicit conversions from uint to anything, it's hard to do any damage with the unsigned type which results.
>>>>> Basically, with any use of unsigned, the compiler says "I don't know if this thing even has a meaningful sign!".
>>>>>
>>>>> Alternatively, we could add rule 0: mixing int and unsigned is illegal. But it's OK to mix natural with int, or natural with unsigned.
>>>>> I don't like this as much, since it would make most usage of unsigned ugly; but maybe that's justified.
>>>>
>>>> I think we're heading towards an impasse. We wouldn't want to make things much harder for systems-level programs that mix arithmetic and bit-level operations.
>>>>
>>>> I'm glad there is interest and that quite a few ideas were brought up. Unfortunately, it looks like all have significant disadvantages.
>>>>
>>>> One compromise solution Walter and I discussed in the past is to only sever one of the dangerous implicit conversions: int -> uint. Other than that, it's much like C (everything involving one unsigned is unsigned and unsigned -> signed is implicit) Let's see where that takes us.
>>>>
>>>> (a) There are fewer situations when a small, reasonable number implicitly becomes a large, weird numnber.
>>>>
>>>> (b) An exception to (a) is that u1 - u2 is also uint, and that's for the sake of C compatibility. I'd gladly drop it if I could and leave operations such as u1 - u2 return a signed number. That assumes the least and works with small, usual values.

The problem with that, is that you're then forcing the 'unsigned is a natural' interpretation when it may be erroneous.

uint.max - 10 is a uint.

It's an interesting case, because int = u1 - u2 is definitely incorrect when u1 > int.max.

uint = u1 - u2 may be incorrect when u1 < u2, _if you think of unsigned as a positive number_.
But, if you think of it as a natural modulo 2^32, uint = u1-u2 is always correct, since that's what's happening mathematically.

I'm strongly of the opinion that you shouldn't be able to generate an unsigned accidentally -- you should need to either declare a type as uint, or use the 'u' suffix on a literal.
Right now, properties like 'length' being uint means you get too many surprising uints, especially when using 'auto'.

I take your point about not wanting to give up the full 32 bits of address space. The problem is, that if you have an object x which is >2GB, and a small object y, then  x.length - y.length will erroneously be negative. If we want code (especially in libraries) to cope with such large objects, we need to ensure that any time there's a subtraction involving a length, the first is larger than the second. I think that would preclude the combination:

length is uint
byte[].length can exceed 2GB, and code is correct when it does
uint - uint is an int (or even, can implicitly convert to int)

As far as I can tell, at least one of these has to go.

November 28, 2008

Re: Treating the abusive unsigned syndrome

Posted by Michel Fortin
in reply to Andrei Alexandrescu

Michel Fortin

Posted in reply to Andrei Alexandrescu

On 2008-11-27 22:34:50 -0500, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> said:

> One question I had is: say polysemy will be at work for integral arithmetic. Should we provide means in the language for user-defined polysemous functions? Or is it ok to leave it as compiler magic that saves redundant casts?

I think that'd be a must. Otherwise how would you define your own arithmetical types so they work like the built-in ones?

	struct ArbitraryPrecisionInt { ... }

	ArbitraryPrecisionInt a = ...;
	uint b = ...;
	auto c = a & b;
	c <<= 16;
	...

Should't c be of type ArbitraryPresisionInt? And shouldn't the following work too?

	uint d = a & b;

That said, how can a function return a polysemous value at all? Should the function return a special kind of struct with a sample of every supported type? That'd be utterly inefficient. Should it return a custom-made struct with the ability to implicitly cast itself to other types? That would make the polysemous value propagatable through auto, and probably less efficient too.

The only way I can see this work correctly is with function overloading on return type, with a way to specify the default function (for when the return type is not specified, such as with auto). In the case above, you'd need something like this:

	struct ArbitraryPrecisionInt {
		default ArbitraryPrecisionInt opAnd(uint i);
		uint opAnd(uint i);
	}


-- 
Michel Fortin
michel.fortin@michelf.com
http://michelf.com/

November 28, 2008

Re: Treating the abusive unsigned syndrome

Posted by Andrei Alexandrescu
in reply to Don

Andrei Alexandrescu

Posted in reply to Don

(I lost track of quotes, so I yanked them all beyond Don's message.)

Don wrote:
> The problem with that, is that you're then forcing the 'unsigned is a natural' interpretation when it may be erroneous.
> 
> uint.max - 10 is a uint.
> 
> It's an interesting case, because int = u1 - u2 is definitely incorrect when u1 > int.max.
> 
> uint = u1 - u2 may be incorrect when u1 < u2, _if you think of unsigned as a positive number_.
> But, if you think of it as a natural modulo 2^32, uint = u1-u2 is always correct, since that's what's happening mathematically.

Sounds good. One important consideration is that modulo arithmetic is considerably easier to understand when two's complement and signs are not involved.

> I'm strongly of the opinion that you shouldn't be able to generate an unsigned accidentally -- you should need to either declare a type as uint, or use the 'u' suffix on a literal.
> Right now, properties like 'length' being uint means you get too many surprising uints, especially when using 'auto'.

I am not surprised by length being unsigned. I'm also not surprised by hexadecimal constants being unsigned. (They are unsigned in C. Walter made them signed or not, depending on their value.)

> I take your point about not wanting to give up the full 32 bits of address space. The problem is, that if you have an object x which is  >2GB, and a small object y, then  x.length - y.length will erroneously be negative. If we want code (especially in libraries) to cope with such large objects, we need to ensure that any time there's a subtraction involving a length, the first is larger than the second. I think that would preclude the combination:
> 
> length is uint
> byte[].length can exceed 2GB, and code is correct when it does
> uint - uint is an int (or even, can implicitly convert to int)
> 
> As far as I can tell, at least one of these has to go.

Well none has to go in the latest design:

(a) One unsigned makes everything unsigned

(b) unsigned -> signed is allowed

(c) signed -> unsigned is disallowed

Of course the latest design has imperfections, but precludes neither of the three things you mention.


Andrei

November 28, 2008

Re: Treating the abusive unsigned syndrome

Posted by Don
in reply to Andrei Alexandrescu

Don

Posted in reply to Andrei Alexandrescu

Andrei Alexandrescu wrote:
> (I lost track of quotes, so I yanked them all beyond Don's message.)
> 
> Don wrote:
>> The problem with that, is that you're then forcing the 'unsigned is a natural' interpretation when it may be erroneous.
>>
>> uint.max - 10 is a uint.
>>
>> It's an interesting case, because int = u1 - u2 is definitely incorrect when u1 > int.max.
>>
>> uint = u1 - u2 may be incorrect when u1 < u2, _if you think of unsigned as a positive number_.
>> But, if you think of it as a natural modulo 2^32, uint = u1-u2 is always correct, since that's what's happening mathematically.
> 
> Sounds good. One important consideration is that modulo arithmetic is considerably easier to understand when two's complement and signs are not involved.
> 
>> I'm strongly of the opinion that you shouldn't be able to generate an unsigned accidentally -- you should need to either declare a type as uint, or use the 'u' suffix on a literal.
>> Right now, properties like 'length' being uint means you get too many surprising uints, especially when using 'auto'.
> 
> I am not surprised by length being unsigned. I'm also not surprised by hexadecimal constants being unsigned. (They are unsigned in C. Walter made them signed or not, depending on their value.)
> 
>> I take your point about not wanting to give up the full 32 bits of address space. The problem is, that if you have an object x which is  >2GB, and a small object y, then  x.length - y.length will erroneously be negative. If we want code (especially in libraries) to cope with such large objects, we need to ensure that any time there's a subtraction involving a length, the first is larger than the second. I think that would preclude the combination:
>>
>> length is uint
>> byte[].length can exceed 2GB, and code is correct when it does
>> uint - uint is an int (or even, can implicitly convert to int)
>>
>> As far as I can tell, at least one of these has to go.
> 
> Well none has to go in the latest design:
> 
> (a) One unsigned makes everything unsigned
> 
> (b) unsigned -> signed is allowed
> 
> (c) signed -> unsigned is disallowed
> 
> Of course the latest design has imperfections, but precludes neither of the three things you mention.

It's close, but how can code such as:

if (x.length - y.length < 100) ...

be correct in the presence of length > 2GB?

since
(a) x.length  = uint.max, y.length = 1
(b) x.length = 4, y.length = 2
both produce the same binary result (0xFFFF_FFFE = -2)

Any subtraction of two lengths has a possible range of
 -int.max .. uint.max
which is quite problematic (and the root cause of the problems, I guess).
And unfortunately I think code is riddled with subtraction of lengths.

November 28, 2008

Re: Treating the abusive unsigned syndrome

Posted by Andrei Alexandrescu
in reply to Don

Andrei Alexandrescu

Posted in reply to Don

Don wrote:
> Andrei Alexandrescu wrote:
>> (I lost track of quotes, so I yanked them all beyond Don's message.)
>>
>> Don wrote:
>>> The problem with that, is that you're then forcing the 'unsigned is a natural' interpretation when it may be erroneous.
>>>
>>> uint.max - 10 is a uint.
>>>
>>> It's an interesting case, because int = u1 - u2 is definitely incorrect when u1 > int.max.
>>>
>>> uint = u1 - u2 may be incorrect when u1 < u2, _if you think of unsigned as a positive number_.
>>> But, if you think of it as a natural modulo 2^32, uint = u1-u2 is always correct, since that's what's happening mathematically.
>>
>> Sounds good. One important consideration is that modulo arithmetic is considerably easier to understand when two's complement and signs are not involved.
>>
>>> I'm strongly of the opinion that you shouldn't be able to generate an unsigned accidentally -- you should need to either declare a type as uint, or use the 'u' suffix on a literal.
>>> Right now, properties like 'length' being uint means you get too many surprising uints, especially when using 'auto'.
>>
>> I am not surprised by length being unsigned. I'm also not surprised by hexadecimal constants being unsigned. (They are unsigned in C. Walter made them signed or not, depending on their value.)
>>
>>> I take your point about not wanting to give up the full 32 bits of address space. The problem is, that if you have an object x which is  >2GB, and a small object y, then  x.length - y.length will erroneously be negative. If we want code (especially in libraries) to cope with such large objects, we need to ensure that any time there's a subtraction involving a length, the first is larger than the second. I think that would preclude the combination:
>>>
>>> length is uint
>>> byte[].length can exceed 2GB, and code is correct when it does
>>> uint - uint is an int (or even, can implicitly convert to int)
>>>
>>> As far as I can tell, at least one of these has to go.
>>
>> Well none has to go in the latest design:
>>
>> (a) One unsigned makes everything unsigned
>>
>> (b) unsigned -> signed is allowed
>>
>> (c) signed -> unsigned is disallowed
>>
>> Of course the latest design has imperfections, but precludes neither of the three things you mention.
> 
> It's close, but how can code such as:
> 
> if (x.length - y.length < 100) ...
> 
> be correct in the presence of length > 2GB?
> 
> since
> (a) x.length  = uint.max, y.length = 1
> (b) x.length = 4, y.length = 2
> both produce the same binary result (0xFFFF_FFFE = -2)

(You mean x.length = 2, y.length = 4 in the second case.)

> Any subtraction of two lengths has a possible range of
>  -int.max .. uint.max
> which is quite problematic (and the root cause of the problems, I guess).
> And unfortunately I think code is riddled with subtraction of lengths.

Code may be riddled with subtraction of lengths, but seems to be working with today's rule that the result of that subtraction is unsigned. So definitely we're not introducing new problems.

I agree the solution has problems. Following this thread that in turn follows my sleepless nights poring over the subject, I'm glad to reach a design that is better than what we currently have. I think that disallowing the signed -> unsigned conversions will be a net improvement.


Andrei

November 28, 2008

Re: Treating the abusive unsigned syndrome

Posted by Don
in reply to Andrei Alexandrescu

Don

Posted in reply to Andrei Alexandrescu

Andrei Alexandrescu wrote:
> Don wrote:
>> Andrei Alexandrescu wrote:
>>> (I lost track of quotes, so I yanked them all beyond Don's message.)
>>>
>>> Don wrote:
>>>> The problem with that, is that you're then forcing the 'unsigned is a natural' interpretation when it may be erroneous.
>>>>
>>>> uint.max - 10 is a uint.
>>>>
>>>> It's an interesting case, because int = u1 - u2 is definitely incorrect when u1 > int.max.
>>>>
>>>> uint = u1 - u2 may be incorrect when u1 < u2, _if you think of unsigned as a positive number_.
>>>> But, if you think of it as a natural modulo 2^32, uint = u1-u2 is always correct, since that's what's happening mathematically.
>>>
>>> Sounds good. One important consideration is that modulo arithmetic is considerably easier to understand when two's complement and signs are not involved.
>>>
>>>> I'm strongly of the opinion that you shouldn't be able to generate an unsigned accidentally -- you should need to either declare a type as uint, or use the 'u' suffix on a literal.
>>>> Right now, properties like 'length' being uint means you get too many surprising uints, especially when using 'auto'.
>>>
>>> I am not surprised by length being unsigned. I'm also not surprised by hexadecimal constants being unsigned. (They are unsigned in C. Walter made them signed or not, depending on their value.)
>>>
>>>> I take your point about not wanting to give up the full 32 bits of address space. The problem is, that if you have an object x which is  >2GB, and a small object y, then  x.length - y.length will erroneously be negative. If we want code (especially in libraries) to cope with such large objects, we need to ensure that any time there's a subtraction involving a length, the first is larger than the second. I think that would preclude the combination:
>>>>
>>>> length is uint
>>>> byte[].length can exceed 2GB, and code is correct when it does
>>>> uint - uint is an int (or even, can implicitly convert to int)
>>>>
>>>> As far as I can tell, at least one of these has to go.
>>>
>>> Well none has to go in the latest design:
>>>
>>> (a) One unsigned makes everything unsigned
>>>
>>> (b) unsigned -> signed is allowed
>>>
>>> (c) signed -> unsigned is disallowed
>>>
>>> Of course the latest design has imperfections, but precludes neither of the three things you mention.
>>
>> It's close, but how can code such as:
>>
>> if (x.length - y.length < 100) ...
>>
>> be correct in the presence of length > 2GB?
>>
>> since
>> (a) x.length  = uint.max, y.length = 1
>> (b) x.length = 4, y.length = 2
>> both produce the same binary result (0xFFFF_FFFE = -2)
> 
> (You mean x.length = 2, y.length = 4 in the second case.)

Yes.

> 
>> Any subtraction of two lengths has a possible range of
>>  -int.max .. uint.max
>> which is quite problematic (and the root cause of the problems, I guess).
>> And unfortunately I think code is riddled with subtraction of lengths.
> 
> Code may be riddled with subtraction of lengths, but seems to be working with today's rule that the result of that subtraction is unsigned. So definitely we're not introducing new problems.

Yes. I think much existing code would fail with sizes over 2GB, though. But it's not any worse.

> 
> I agree the solution has problems. Following this thread that in turn follows my sleepless nights poring over the subject, I'm glad to reach a design that is better than what we currently have. I think that disallowing the signed -> unsigned conversions will be a net improvement.

I agree. And dealing with compile-time constants will improve things even more.

November 28, 2008

Re: Treating the abusive unsigned syndrome

Posted by Sean Kelly
in reply to Don

Sean Kelly

Posted in reply to Don

Don wrote:
> 
> length is uint
> byte[].length can exceed 2GB, and code is correct when it does
> uint - uint is an int (or even, can implicitly convert to int)
> 
> As far as I can tell, at least one of these has to go.

This is why I never understood ptrdiff_t in C.  Having to choose between a signed value and narrower range vs. unsigned and sufficient range just stinks.

Sean

November 28, 2008

Re: Treating the abusive unsigned syndrome

Posted by Derek Parnell
in reply to Don

Derek Parnell

Posted in reply to Don

On Fri, 28 Nov 2008 17:09:25 +0100, Don wrote:

> 
> It's close, but how can code such as:
> 
> if (x.length - y.length < 100) ...
> 
> be correct in the presence of length > 2GB?

It could be transformed by the compiler into more something like ...

  if ((x.length <= y.length) || ((x.length - y.length) < 100)) ...


-- 
Derek Parnell
Melbourne, Australia
skype: derek.j.parnell

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation