November 26, 2008
Sean Kelly wrote:
> Andrei Alexandrescu wrote:
>>
>> Notice that the fact that one operand is a literal does not solve all of the problems I mentioned. There is for example no progress in typing u1 - u2 appropriately.
> 
> What /is/ the appropriate type here?  For example:
> 
>     uint a = uint.max;
>     uint b = 0;
>     uint c = uint.max - 1;
> 
>     int  x = a - b; // wrong, should be uint
>     uint y = c - a; // wrong, should be int
> 
> I don't see any way to reliably produce a "safe" result at the language level.

There are several schools of thought (for the lack of a better phrase):

1. The Purist Mathematician: We want unsigned to approximate natural numbers, natural numbers aren't closed for subtraction, therefore u1 - u2 should be disallowed.

2. The Practical Mathematician: we want unsigned to approximate natural numbers and natural numbers aren't closed for subtraction but closed for a subset satisfying u1 >= u2. We can rely on the programmer to check the condition before, and fall back on modulo difference when the condition isn't satisfied. They'll understand.

3. The C Veteran: Everything should be allowed. And when unsigned is within a mile, the type is unsigned. I'll take care of the rest.

4. The Assembly Programmer: Use whatever type you want. The assembly language operation for subtraction is the same.

5. The Dynamic Language Fan: Allow whatever and check it dynamically.

6. The Static Typing Nut: Use some scheme to magically weed out 73.56% mistakes and disallow only 14.95% valid uses.

Your example is in fact perfect. It shows how the result of a subtraction has ultimately its fate decided by case-by-case use, not picked properly by a rule. The example perfectly underlines the advantage of my scheme: the decision of how to type u1 - u2 is left to the only entity able to account: the user of the operation. Of course there remains the question, should all that be implicit or should the user employ more syntax to specify what they want? I don't know.


Andrei
November 26, 2008
Sergey Gromov wrote:
> Wed, 26 Nov 2008 09:12:12 -0600, Andrei Alexandrescu wrote:
> 
>> Don wrote:
>>> My suspicion is, that if you allowed all signed-unsigned operations when at least one was a literal, and made everything else illegal, you'd fix most of the problems. In particular, there'd be a big reduction in people abusing 'uint' as a primitive range-limited int.
>> Well, part of my attempt is to transform that abuse into legit use. In other words, I do want to allow people to consider uint a reasonable model of natural numbers. It can't be perfect, but I believe we can make it reasonable.
>>
>> Notice that the fact that one operand is a literal does not solve all of the problems I mentioned. There is for example no progress in typing u1 - u2 appropriately.
>>
>>> Although it would be nice to have a type which was range-limited, 'uint' doesn't do it. Instead, it guarantees the number is between 0 and int.max*2+1 inclusive. Allowing mixed operations encourages programmers to focus the benefit of 'the lower bound is zero!' while forgetting that there is an enormous downside ('I'm saying that this could be larger than int.max!')
>> I'm not sure I understand this part. To me, the larger problem is underflow, e.g. when subtracting two small uints results in a large uint.
> 
> I'm totally with Don here.  In math, natural numbers are a subset if
> integers.  But uint is not a subset of int.  If it were, most of the
> problems would vanish.  So it's probably feasible to ban uint from
> SafeD, implement natural numbers by some other means, and leave uint for
> low-level wizardry.

That's also a possibility - consider unsigned types just "bags of bits" and disallow most arithmetic for them. They could actually be eliminated entirely from the core language because they can be implemented as a library. I'm not sure how that would feel like.

I guess length would return an int in that case?

Andrei
November 26, 2008
Sean Kelly wrote:
> Don wrote:
>>
>> Although it would be nice to have a type which was range-limited, 'uint' doesn't do it. Instead, it guarantees the number is between 0 and int.max*2+1 inclusive. Allowing mixed operations encourages programmers to focus the benefit of 'the lower bound is zero!' while forgetting that there is an enormous downside ('I'm saying that this could be larger than int.max!')
> 
> This inspired me to think about where I use uint and I realized that I don't.  I use size_t for size/length representations (largely because sizes can theoretically be >2GB on a 32-bit system), and ubyte for bit-level stuff, but that's it.
> 
> 
> Sean

For the record, I use unsigned types wherever there's a non-negative number involved (e.g. a count). So I'd be helped by better unsigned operations.

I wonder how often these super-large arrays do occur on 32-bit systems. I do have programs that try to allocate as large a contiguous matrix as possible, but never sat down and tested whether a >2GB chunk was allocated on the Linux cluster I work on. I'm quite annoyed by this >2GB issue because it's a very practical and very rare issue in a weird contrast with a very principled issue (modeling natural numbers).


Andrei
November 26, 2008
Sergey Gromov wrote:
> So it's probably feasible to ban uint from
> SafeD, implement natural numbers by some other means, and leave uint for
> low-level wizardry.

SafeD is about memory safety, i.e. no corrupted memory. Dealing with integer overflows falls outside its agenda.
November 26, 2008
Andrei Alexandrescu wrote:
> Sean Kelly wrote:
>> Andrei Alexandrescu wrote:
>>>
>>> Notice that the fact that one operand is a literal does not solve all of the problems I mentioned. There is for example no progress in typing u1 - u2 appropriately.
>>
>> What /is/ the appropriate type here?  For example:
>>
>>     uint a = uint.max;
>>     uint b = 0;
>>     uint c = uint.max - 1;
>>
>>     int  x = a - b; // wrong, should be uint
>>     uint y = c - a; // wrong, should be int
>>
>> I don't see any way to reliably produce a "safe" result at the language level.
> 
> There are several schools of thought (for the lack of a better phrase):
> 
> 1. The Purist Mathematician: We want unsigned to approximate natural numbers, natural numbers aren't closed for subtraction, therefore u1 - u2 should be disallowed.
> 
> 2. The Practical Mathematician: we want unsigned to approximate natural numbers and natural numbers aren't closed for subtraction but closed for a subset satisfying u1 >= u2. We can rely on the programmer to check the condition before, and fall back on modulo difference when the condition isn't satisfied. They'll understand.

How about 1.5, the Somewhat Practical but Still Purist Mathematician? He (that would be me) would like integral types called nint and nlong (the "n" standing for "natural"), which can hold numbers in the range (0, int.max) and (0, long.max), respectively. Such types would have to be stored as int/long, but the sign bit should be ignored/zero in all calculations. Hence any nint/nlong would be implicitly castable to int/long. Is this a possibility?

As you say, natural numbers aren't closed under subtraction, so subtractions involving nint/nlong would have to yield an int/long result. In fact, if n1 and n2 are nints, one would be certain that n1-n2 never goes out of the range of an int.

Thing is, whenever I use one of the unsigned types, it is because I need to make sure I'm working with nonnegative numbers, not because I need to work outside the ranges of the signed integral types. Other people obviously have other needs, though, so I'm not saying "let's toss uint and ulong out the window".

-Lars
November 26, 2008
Lars Kyllingstad wrote:
> Andrei Alexandrescu wrote:
>> Sean Kelly wrote:
>>> Andrei Alexandrescu wrote:
>>>>
>>>> Notice that the fact that one operand is a literal does not solve all of the problems I mentioned. There is for example no progress in typing u1 - u2 appropriately.
>>>
>>> What /is/ the appropriate type here?  For example:
>>>
>>>     uint a = uint.max;
>>>     uint b = 0;
>>>     uint c = uint.max - 1;
>>>
>>>     int  x = a - b; // wrong, should be uint
>>>     uint y = c - a; // wrong, should be int
>>>
>>> I don't see any way to reliably produce a "safe" result at the language level.
>>
>> There are several schools of thought (for the lack of a better phrase):
>>
>> 1. The Purist Mathematician: We want unsigned to approximate natural numbers, natural numbers aren't closed for subtraction, therefore u1 - u2 should be disallowed.
>>
>> 2. The Practical Mathematician: we want unsigned to approximate natural numbers and natural numbers aren't closed for subtraction but closed for a subset satisfying u1 >= u2. We can rely on the programmer to check the condition before, and fall back on modulo difference when the condition isn't satisfied. They'll understand.
> 
> How about 1.5, the Somewhat Practical but Still Purist Mathematician? He (that would be me) would like integral types called nint and nlong (the "n" standing for "natural"), which can hold numbers in the range (0, int.max) and (0, long.max), respectively. Such types would have to be stored as int/long, but the sign bit should be ignored/zero in all calculations. Hence any nint/nlong would be implicitly castable to int/long. Is this a possibility?
> 
> As you say, natural numbers aren't closed under subtraction, so subtractions involving nint/nlong would have to yield an int/long result. In fact, if n1 and n2 are nints, one would be certain that n1-n2 never goes out of the range of an int.
> 
> Thing is, whenever I use one of the unsigned types, it is because I need to make sure I'm working with nonnegative numbers, not because I need to work outside the ranges of the signed integral types. Other people obviously have other needs, though, so I'm not saying "let's toss uint and ulong out the window".
> 
> -Lars

Another point: nint would also be implicitly castable to uint and so on, so making these types the standard choice of unsigned integers in Phobos shouldn't cause too much breakage.

-Lars
November 26, 2008
Wed, 26 Nov 2008 15:57:55 -0600, Andrei Alexandrescu wrote:

> Sergey Gromov wrote:
>> Wed, 26 Nov 2008 09:12:12 -0600, Andrei Alexandrescu wrote:
>> 
>>> Don wrote:
>>>> My suspicion is, that if you allowed all signed-unsigned operations when at least one was a literal, and made everything else illegal, you'd fix most of the problems. In particular, there'd be a big reduction in people abusing 'uint' as a primitive range-limited int.
>>> Well, part of my attempt is to transform that abuse into legit use. In other words, I do want to allow people to consider uint a reasonable model of natural numbers. It can't be perfect, but I believe we can make it reasonable.
>>>
>>> Notice that the fact that one operand is a literal does not solve all of the problems I mentioned. There is for example no progress in typing u1 - u2 appropriately.
>>>
>>>> Although it would be nice to have a type which was range-limited, 'uint' doesn't do it. Instead, it guarantees the number is between 0 and int.max*2+1 inclusive. Allowing mixed operations encourages programmers to focus the benefit of 'the lower bound is zero!' while forgetting that there is an enormous downside ('I'm saying that this could be larger than int.max!')
>>> I'm not sure I understand this part. To me, the larger problem is underflow, e.g. when subtracting two small uints results in a large uint.
>> 
>> I'm totally with Don here.  In math, natural numbers are a subset if integers.  But uint is not a subset of int.  If it were, most of the problems would vanish.  So it's probably feasible to ban uint from SafeD, implement natural numbers by some other means, and leave uint for low-level wizardry.
> 
> That's also a possibility - consider unsigned types just "bags of bits" and disallow most arithmetic for them. They could actually be eliminated entirely from the core language because they can be implemented as a library. I'm not sure how that would feel like.
> 
> I guess length would return an int in that case?

I guess so.  Actually, simply disallowing signed<=>unsigned cast and making length signed would force most people to abandon unsigned types. And moving unsgned types documentation in a separate chapter would warn newcomers about their special status.  Not a lot of changes on the compiler side, mostly throwing stuff away.
November 26, 2008
Andrei Alexandrescu wrote:
> bearophile wrote:
>> Nick Sabalausky:
>>> Oh, right. For some stupid reason I was forgetting that the param would always be an array and therefore be eligible for the existing array property syntax (and that .length always returns a uint).
>>
>> From the len() code I have posted you can see there are other places where you want to use len(), in particular to count the number of items that a lazy generator (opApply for now) yields.
>>
>> Bye,
>> bearophile
> 
> I'm rather weary of a short and suggestive name that embodies a linear operation. I recall there was a discussion about that a while ago in this newsgroup. I'd rather call it linearLength or something that suggests it's a best-effort function that may take O(n).

My personal rules of optimization:
 - I don't know what's slow.
 - I don't know what's called often enough to be worth speeding up.
 - Most of the time, my data sets are small.

If getting the length of an array were a linear operation, that wouldn't much affect any of my code. Most of my arrays are probably no larger than twenty elements, and I don't often need to get their lengths.

If I need to change data structures for better performance, I'd like to be able to replace them (or switch to generators) without undo effort. Things like changing function names according to the algorithmic complexity of the implementation just hurts.

> Andrei
November 27, 2008
Andrei Alexandrescu:
> That's also a possibility - consider unsigned types just "bags of bits" and disallow most arithmetic for them. They could actually be eliminated entirely from the core language because they can be implemented as a library. I'm not sure how that would feel like.
> 
> I guess length would return an int in that case?

I don't know what the solution is, but I am very happy to see that in this newsgroup there are people willing to reconsider such basic things, to try to improve the language.
Most ideas turn out to be wrong, but if you aren't bold enough to consider them, there will no improvements :-)

In my programs I use use unsigned integers and unsigned longs as:
- bitfields, a single size_t, for example to represent a small set of items.
- bitarrays, in an array of size_t, to represent a larger set, to have array of bit flags, etc.
- to pack small variables into a uint, size_t, etc, for example use the first 5 bits to represent a, the following 2 bits to represent b, etc. In such situation I have never pack such variables into a signed int.
- when I need very large integer values, but this has to be done with care, because they can't be converted back to ints.
- I'd also like to use unsigned ints to denote that for example a function takes a nonnegative argument. I used to do this in Delphi, but I have seen it's too much unsafe in D, so now in D I prefer to use ints and then inside the function test for a negative argument and throw an exception (generally I don't use an assert for this but in the most speed critical situations).
- I use unsigned bytes in some situations, now and then. I don't use signed bytes anymore, I used to use them for 8 bit digital audio, but not anymore. Now 16 bit signed audio is the norm (a short) or even 24 bit (I have created a slow 24 bit value time ago).
- Probably there are few other situations, for example I think I've used an ushort once, but not many of them.

Bye,
bearophile
November 27, 2008
I'm not really sure what I think about all this. I try to always insert assertions before operations like this, which makes me think the nicest solution would be if the compiler errors out if it detects a problematic expression that is unchecked...

uint diff(uint begin, uint end)
{
	return end - begin; // error
}


uint diff(uint begin, uint end)
{
	assert(begin <= end);
	return end - begin; // ok because of the assert
}


I'm not going to get into how this would be implemented in the compiler,  but it sure would be sweet :)