November 26, 2008
Don wrote:
> Andrei Alexandrescu wrote:
>> D pursues compatibility with C and C++ in the following manner: if a code snippet compiles in both C and D or C++ and D, then it should have the same semantics.
>>
>> A classic problem with C and C++ integer arithmetic is that any operation involving at least an unsigned integral receives automatically an unsigned type, regardless of how silly that actually is, semantically. About the only advantage of this rule is that it's simple. IMHO it only has disadvantages from then on.
>>
>> The following operations suffer from the "abusive unsigned syndrome" (u is an unsigned integral, i is a signed integral):
>>
>> (1) u + i, i + u
>> (2) u - i, i - u
>> (3) u - u
>> (4) u * i, i * u, u / i, i / u, u % i, i % u (compatibility with C requires that these all return unsigned, ouch)
>> (5) u < i, i < u, u <= i etc. (all ordering comparisons)
>> (6) -u
> 
> I think that most of these problems are caused by C enforcing a foolish consitency between literals and variables.
> The idea that literals like '0' and '1' are of type int is absurd, and has caused a torrent of problems. '0' is just '0'.
> 
> uint a = 1;
> does NOT contain an 'implicit conversion from int to uint', any more than there are implicit conversions from naturals to integers in mathematics. So I really like the polysemous types idea.

Yah, polysemy will take care of the constants. It's also rather easy to implement for them.

> For example, when is it reasonable to use -u?
> It's useful with literals like
> uint a = -1u; which is equivalent to uint a = 0xFFFF_FFFF.
> Anywhere else, it's probably a bug.

Maybe not even for constants as all uses of -u can be easily converted in ~u + 1. I'd gladly agree to disallow -u entirely.

> My suspicion is, that if you allowed all signed-unsigned operations when at least one was a literal, and made everything else illegal, you'd fix most of the problems. In particular, there'd be a big reduction in people abusing 'uint' as a primitive range-limited int.

Well, part of my attempt is to transform that abuse into legit use. In other words, I do want to allow people to consider uint a reasonable model of natural numbers. It can't be perfect, but I believe we can make it reasonable.

Notice that the fact that one operand is a literal does not solve all of the problems I mentioned. There is for example no progress in typing u1 - u2 appropriately.

> Although it would be nice to have a type which was range-limited, 'uint' doesn't do it. Instead, it guarantees the number is between 0 and int.max*2+1 inclusive. Allowing mixed operations encourages programmers to focus the benefit of 'the lower bound is zero!' while forgetting that there is an enormous downside ('I'm saying that this could be larger than int.max!')

I'm not sure I understand this part. To me, the larger problem is underflow, e.g. when subtracting two small uints results in a large uint.

> Interestingly, none of these problems exist in assembly language programming, where every arithmetic instruction affects the overflow flag (for signed operations) as well as the carry flag (for unsigned).

They do exist. You need to use imul/idiv vs. mul/div depending on what signedness your operators have.


Andrei
November 26, 2008
On 2008-11-25 16:39:05 -0500, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> said:

> Russell Lewis wrote:
>> I'm of the opinion that we should make mixed-sign operations a compile-time error.  I know that it would be annoying in some situations, but IMHO it gives you clearer, more reliable code.
> 
> The problem is, it's much more annoying than one might imagine. Even array.length - 1 is up for scrutiny. Technically, even array.length + 1 is a problem because 1 is really a signed int. We could provide exceptions for constants, but exceptions are generally not solving the core issue.

Then the problem is that integer literals are of a specific type. Just make them polysemous and the problem is solved.

I'm with Russel on this one. To me, a litteral value (123, -8, 0) is not an int, not even a constant: it's just a number which doesn't imply any type at all until you place it into a variable (or a constant, or an enum, etc.).

And if you're afraid the word polysemous will scare people, don't say the word and call it a "integer litteral". Polysemy in this case is just a mechanism used by the compiler to make the value work as expected with all integral types. All you really need is a type implicitly castable to everything capable of holding the numerical value (much like your __intuint).

I'd make "auto x = 1" create a signed integer variable for the sake of simplicity.

And all this would also make "uint x = -1" illegal... but then you can easily use "uint x = uint.max" if you want to enable all the bits. It's easier as in C: you don't have to include the right header and remember the name of a constant.

-- 
Michel Fortin
michel.fortin@michelf.com
http://michelf.com/

November 26, 2008
On 2008-11-25 10:59:01 -0500, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> said:

> (3) u - u

Just a note here, because it seems to me you're confusing two issues with that "u - u" thing. The problem with "u - u" isn't one of unsigned vs. signed integers at all. It's a problem of possibly going out of range, a problem that can happen with any type but is more likely with unsigned integers since they're often near zero.

If you want to attack that problem, I think it should be done in a coherent manner with other out-of-range issues. Going below uint.min for an uint or below int.min for an int should be handled the same way. Personally, I'd just add a compiler switch for runtime range checking (just as for array bound checking).

Treating the result u - u as __intuint is dangerous: uint.max - 1U gives you a value which int cannot hold, but you'd allow it to convert implicitly and without warning to int? I don't like it.

-- 
Michel Fortin
michel.fortin@michelf.com
http://michelf.com/

November 26, 2008
Michel Fortin wrote:
> On 2008-11-25 16:39:05 -0500, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> said:
> 
>> Russell Lewis wrote:
>>> I'm of the opinion that we should make mixed-sign operations a compile-time error.  I know that it would be annoying in some situations, but IMHO it gives you clearer, more reliable code.
>>
>> The problem is, it's much more annoying than one might imagine. Even array.length - 1 is up for scrutiny. Technically, even array.length + 1 is a problem because 1 is really a signed int. We could provide exceptions for constants, but exceptions are generally not solving the core issue.
> 
> Then the problem is that integer literals are of a specific type. Just make them polysemous and the problem is solved.

Well that at best takes care of _some_ operations involving constants, but for example does not quite take care of array.length - 1.

I am now sorry I gave the silly example of array.length + 1. Many people latched on it and thought that solving that solves the whole problem. That's not quite the case.

Also consider:

auto delta = a1.length - a2.length;

What should the type of delta be? Well, it depends. In my scheme that wouldn't even compile, which I think is a good thing; you must decide whether prior information makes it an unsigned or a signed integral.

> I'm with Russel on this one. To me, a litteral value (123, -8, 0) is not an int, not even a constant: it's just a number which doesn't imply any type at all until you place it into a variable (or a constant, or an enum, etc.).
>
> And if you're afraid the word polysemous will scare people, don't say the word and call it a "integer litteral". Polysemy in this case is just a mechanism used by the compiler to make the value work as expected with all integral types. All you really need is a type implicitly castable to everything capable of holding the numerical value (much like your __intuint).
> 
> I'd make "auto x = 1" create a signed integer variable for the sake of simplicity.

That can be formalized by having polysemous types have a "lemma", a default type.

> And all this would also make "uint x = -1" illegal... but then you can easily use "uint x = uint.max" if you want to enable all the bits. It's easier as in C: you don't have to include the right header and remember the name of a constant.

Fine. With constants there is some mileage that can be squeezed. But let's keep in mind that that doesn't solve the larger issue.


Andrei
November 26, 2008
Michel Fortin wrote:
> On 2008-11-25 10:59:01 -0500, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> said:
> 
>> (3) u - u
> 
> Just a note here, because it seems to me you're confusing two issues with that "u - u" thing. The problem with "u - u" isn't one of unsigned vs. signed integers at all. It's a problem of possibly going out of range, a problem that can happen with any type but is more likely with unsigned integers since they're often near zero.

It's also a problem of signedness, considering that int can hold the difference of two small unsigned integrals. So if the result is unsigned there may be overflow (I abusively call it "underflow"), but if the result is an int that overflow may be avoided, or a different overflow may occur.

> If you want to attack that problem, I think it should be done in a coherent manner with other out-of-range issues. Going below uint.min for an uint or below int.min for an int should be handled the same way. Personally, I'd just add a compiler switch for runtime range checking (just as for array bound checking).
> 
> Treating the result u - u as __intuint is dangerous: uint.max - 1U gives you a value which int cannot hold, but you'd allow it to convert implicitly and without warning to int? I don't like it.

I understand. It's what I have so far, so I'm looking forward to better ideas. Resorting to runtime checks is always a possibility but I'd like to focus on the static checking aspect for now.


Andrei
November 26, 2008
bearophile Wrote:

> From the len() code I have posted you can see there are other places where you want to use len(), in particular to count the number of items that a lazy generator (opApply for now) yields.
>
hmm...

import std.stdio, std.algorithm;

void main()
{
	bool pred(int x){ return x>2; }
	auto counter=(int count, int x){ return pred(x)?count+1:count; };
	int[] a=[0,1,2,3,4];
	auto lazylen=reduce!(counter)(0,a);
	writeln(lazylen); //2
}
November 26, 2008
On 2008-11-26 10:24:17 -0500, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> said:

> Well that at best takes care of _some_ operations involving constants, but for example does not quite take care of array.length - 1.

How does it not solve the problem. array.length is of type uint, 1 is polysemous(byte, ubyte, short, ushort, int, uint, long, ulong). Only "uint - uint" is acceptable, and its result is "uint".


> Also consider:
> 
> auto delta = a1.length - a2.length;
> 
> What should the type of delta be? Well, it depends. In my scheme that wouldn't even compile, which I think is a good thing; you must decide whether prior information makes it an unsigned or a signed integral.

In my scheme it would give you a uint. You'd have to cast to get a signed integer... I see how it's not ideal, but I can't imagine how it could be coherent otherwise.

	auto diff = cast(int)a1.length - cast(int)a2.length;

By casting explicitly, you indicate in the code that if a1.length or a2.length contain numbers which are too big to be represented as int, you'll get garbage. In this case, it'd be pretty surprising to get that problem. In other cases it may not be so clear-cut.

Perhaps we could add a "sign" property to uint and an "unsign" property to int that'd give you the signed or unsigned corresponding value and which could do range checking at runtime (enabled by a compiler flag).

	auto diff = a1.length.sign - a2.length.sign;

And for the general problem of "uint - uint" giving a result below uint.min, as I said in my other post, that could be handled by a runtime check (enabled by a compiler flag) just like array bound checking.

One last thing. I think that in general it's a much better habit to change the type to signed prior doing the substratction. It may be harmless in the case of a substraction, but as you said when starting the thread, it isn't for others (multiply, divide, modulo). I think the scheme above promotes this good habit by making it easier to change the type at the operands rather than at the result.


>> I'd make "auto x = 1" create a signed integer variable for the sake of simplicity.
> 
> That can be formalized by having polysemous types have a "lemma", a default type.

That's indeed what I'm suggesting.


>> And all this would also make "uint x = -1" illegal... but then you can easily use "uint x = uint.max" if you want to enable all the bits. It's easier as in C: you don't have to include the right header and remember the name of a constant.
> 
> Fine. With constants there is some mileage that can be squeezed. But let's keep in mind that that doesn't solve the larger issue.

Well, by making implicit convertions between uint and int illegal, we're solving the larger issue. Just not in a seemless manner.


-- 
Michel Fortin
michel.fortin@michelf.com
http://michelf.com/

November 26, 2008
Andrei Alexandrescu:
> I'm rather weary of a short and suggestive name that embodies a linear operation. I recall there was a discussion about that a while ago in this newsgroup. I'd rather call it linearLength or something that suggests it's a best-effort function that may take O(n).

I remember parts of that discussion, and I like your general rule, and I agree that generally it's better to give the programmer a hint of the complexity of a specific operation, for example a method of a user defined class, etc.

But len() is supposed to be used very often, so it's better to keep it short, because if you don't have an IDE it's not nice to type linearLength() one time every 2 lines of code.

Being used so often also implies that you remember how it works, so you are supposed to be able to remember it can be O(n) on lazy iterators.

So in this specific case I think it's acceptable to break your general rule, for practical reasons.

Bye,
bearophile
November 26, 2008
bearophile wrote:
> Andrei Alexandrescu:
>> I'm rather weary of a short and suggestive name that embodies a linear operation. I recall there was a discussion about that a while ago in this newsgroup. I'd rather call it linearLength or something that suggests it's a best-effort function that may take O(n).
> 
> I remember parts of that discussion, and I like your general rule, and I agree that generally it's better to give the programmer a hint of the complexity of a specific operation, for example a method of a user defined class, etc.
> 
> But len() is supposed to be used very often, so it's better to keep it short, because if you don't have an IDE it's not nice to type linearLength() one time every 2 lines of code.
> 
> Being used so often also implies that you remember how it works, so you are supposed to be able to remember it can be O(n) on lazy iterators.
> 
> So in this specific case I think it's acceptable to break your general rule, for practical reasons.
> 
> Bye,
> bearophile

If it's used often it shouldn't have linear complexity :o).

Andrei
November 26, 2008
Michel Fortin wrote:
> On 2008-11-26 10:24:17 -0500, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> said:
> 
>> Also consider:
>>
>> auto delta = a1.length - a2.length;
>>
>> What should the type of delta be? Well, it depends. In my scheme that wouldn't even compile, which I think is a good thing; you must decide whether prior information makes it an unsigned or a signed integral.
> 
> In my scheme it would give you a uint. You'd have to cast to get a signed integer... I see how it's not ideal, but I can't imagine how it could be coherent otherwise.
> 
>     auto diff = cast(int)a1.length - cast(int)a2.length;

Actually, there's no solution.
Imagine a 32 bit system, where one object can be greater than 2GB in size (not possible in Windows AFAIK, but theoretically possible). Then if a1 is 3GB, delta cannot be stored in an int. If a2 is 3GB, it requires an int for storage, since result is less than 0.

==> I think length has to be an int. It's less bad than uint.


> Perhaps we could add a "sign" property to uint and an "unsign" property to int that'd give you the signed or unsigned corresponding value and which could do range checking at runtime (enabled by a compiler flag).
> 
>     auto diff = a1.length.sign - a2.length.sign;
> 
> And for the general problem of "uint - uint" giving a result below uint.min, as I said in my other post, that could be handled by a runtime check (enabled by a compiler flag) just like array bound checking.

That's not bad.
>>
>> Fine. With constants there is some mileage that can be squeezed. But let's keep in mind that that doesn't solve the larger issue.
> 
> Well, by making implicit convertions between uint and int illegal, we're solving the larger issue. Just not in a seemless manner.

We are of one mind. I think that constants are the root cause of the problem.