November 22, 2014
On Saturday, 22 November 2014 at 11:12:06 UTC, Marc Schütz wrote:
> I'd say that when two values are to be subtracted (signed or unsigned), and there's no knowledge about which one is larger, it's more useful to get a signed difference. This should be correct in most cases, because I believe it is more likely that the two values are close to each other. It only becomes a problem when they're an opposite sides of the value range.

Not being able to decrement unsigned types would be a disaster. Think about unsigned integers as an enumeration. You should be able to both take the predecessor and successor of the value.

This is also in line with how you formalize natural numbers in math:

0 == zero
1 == successor(zero)
2 == successor(successor(zero))

This is basically a unary representation of natural numbers and it allows both addition and subtraction. Unsigned int should be considered a binary representation of the same capped at max value.

Bearophile has given a sensible solution a long time ago, make type coercion explicit and add a weaker coercion operator. That operator should prevent senseless type coercion, but allow system-level-coercion over signedness. Problem fixed.
November 22, 2014
On 20/11/2014 08:02, Walter Bright wrote:
> On 11/19/2014 5:03 PM, H. S. Teoh via Digitalmars-d wrote:
>> If this kind of unsafe mixing wasn't allowed, or required explict casts
>> (to signify "yes I know what I'm doing and I'm prepared to face the
>> consequences"), I suspect that bearophile would be much happier about
>> this issue. ;-)
>
> Explicit casts are worse than the problem - they can easily cause bugs.

I recently explained to you that explicit casts are easily avoided using `import std.conv: signed, unsigned;`.

D compilers badly need a way to detect bug-prone sign mixing. It is no exaggeration to say D is worse than C compilers in this regard. Usually we discuss how to compete with modern languages; here we are not even keeping up with C.

It's disappointing this issue was pre-approved last year, but now neither you nor even Andrei seem particularly cognizant of the need to resolve it. If you belittle the problem, you discourage others from trying to solve it.
November 22, 2014
Am Fri, 21 Nov 2014 17:50:11 -0800
schrieb Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org>:

> I agree, though "foreach (i; length.iota.retro)" is no slouch either! -- Andrei

Yes, no, well, it feels like too much science for a loop with
a decrementing index instead of an incrementing, no matter how
few parenthesis are used. It is not the place where I would
want to introduce functional programming to someone who never
saw D code before.
That said, I'd also be uncertain if compilers transparently
convert this to the equivalent of a reverse loop.

-- 
Marco

November 22, 2014
Am Fri, 21 Nov 2014 18:07:15 -0800
schrieb Walter Bright <newshound2@digitalmars.com>:

> On 11/21/2014 1:33 PM, Marco Leise wrote:
> > As someone using unsigned types all the time, all I need to keep in mind are two rules:
> >
> > 1) Overflow:
> >
> >    uint number;
> > > >    number = 10 * number + ch - '0';
> >
> >    It is handled with:
> >    if (number > uint.max / 10)
> >    if (number > uint.max - (ch - '0'))
> 
> You're better off with core.checkedint here, as not only is it correct, but (eventually) it will enable the compiler to optimize it to simply checking the carry flag.

Ah right, I keep forgetting that module. By the way, are calls to core.checkedint inlined? Since they are in another module and not templated I'd think not.

-- 
Marco

November 23, 2014
On 11/22/2014 1:07 PM, Marco Leise wrote:
> Am Fri, 21 Nov 2014 18:07:15 -0800
>> You're better off with core.checkedint here, as not only is it correct, but
>> (eventually) it will enable the compiler to optimize it to simply checking the
>> carry flag.
>
> Ah right, I keep forgetting that module.

It is brand new (!) so you're forgiven.

> By the way, are calls
> to core.checkedint inlined? Since they are in another module
> and not templated I'd think not.

Yes, since the import leaves the source for it in there.

November 23, 2014
On Friday, 21 November 2014 at 16:12:19 UTC, Don wrote:
> It is not uint.max. It is natint.max. And yes, that's an overflow condition.
>
> Exactly the same as when you do int.max + int.max.

This depends on how you look at it. From a formal perspective assume zero as the base, then a predecessor function P and a successor function S.

Then you have 0u  - 1u + 2u ==> SSP0

Then you do a normalization where you cancel out successor and predecessor pairs and you get the result S0 ==> 1u. On the other hand if you end up with P0 the result should be bottom (error).

In binary representation you need to collect the carry over N terms, so you need an extra accumulator which you can get by extending the precision by ~ log2(N) bits. Then do a masking of the most significant bits to check for over/underflow.

Advanced for a compiler, but possible.

> The type that I think would be useful, would be a number in the range 0..int.max.
> It has no risk of underflow.

Yep, from a correctness perspective length should be integer with a >=0 constraint. Ada also acknowledge this by having unsigned integers being 31 bits like you suggest. And now that most CPUs go 64 bit then a 63 bit integer would be the right choice for array length.

> unsigned types are not a subset of mathematical integers.
>
> They do not just have a restricted range. They have different semantics.
>
>
> The question of what happens when a range is exceeded, is a different question.

There is really no difference between signed and unsigned in principle since you only have an offset, but in practical programming 64 bits signed and 63 bits unsigned is enough for most situations with the advantage that you have the same bit representation with only one interpretation.

What the semantics are depend on how you define the operators, right? So you can have both modular arithmetic and non-modular in the same type by providing more operators. This is after all how the hardware does it.

Contrary to what is claimed by others in this thread the general hardware ALU does not default to modular arithmetic, it preserves resolution:

32bit + 32bit ==> 33bit result
32bit * 32bit ==> 64bit result

Modular arithmetic is an artifact of the language, not the hardware.
November 24, 2014
On Friday, 21 November 2014 at 20:17:12 UTC, Walter Bright wrote:
> On 11/21/2014 7:36 AM, Don wrote:
>> On Friday, 21 November 2014 at 04:53:38 UTC, Walter Bright wrote:
>>> 0 crossing bugs tend to show up much sooner, and often immediately.
>>
>>
>> You're missing the point here. The problem is that people are using 'uint' as if
>> it were a positive integer type.
>>
>> Suppose  D had a type 'natint', which could hold natural numbers in the range
>> 0..uint.max.  Sounds like 'uint', right? People make the mistake of thinking
>> that is what uint is. But it is not.
>>
>> How would natint behave, in the type system?
>>
>> typeof (natint - natint)  ==  int     NOT natint  !!!
>>
>> This would of course overflow if the result is too big to fit in an int. But the
>> type would be correct.  1 - 2 == -1.
>>
>> But
>>
>> typeof (uint - uint ) == uint.
>>
>> The bit pattern is identical to the other case. But the type is wrong.
>>
>> It is for this reason that uint is not appropriate as a model for positive
>> integers. Having warnings about mixing int and uint operations in relational
>> operators is a bit misleading, because mixing signed and unsigned is not usually
>> the real problem. Instead, those warnings a symptom of a type system mistake.
>>
>> You are quite right in saying that with a signed length, overflows can still
>> occur. But, those are in principle detectable. The compiler could add runtime
>> overflow checks for them, for example. But the situation for unsigned is not
>> fixable, because it is a problem with the type system.
>>
>>
>> By making .length unsigned, we are telling people that if .length is
>> used in a subtraction expression, the type will be wrong.
>>
>> It is the incorrect use of the type system that is the underlying problem.
>
> I believe I do understand the problem. As a practical matter, overflow checks are not going to be added for performance reasons.

The performance overhead would be practically zero. All we would need to do, is restrict array slices such that the length cannot exceed ssize_t.max.

This can only happen in the case where the element type has a size of 1, and only in the case of slicing a pointer, concatenation, and memory allocation.

Making this restriction would have been unreasonable in the 8 and 16 bit days, but D doesn't support those.  For 32 bits, this is an extreme corner case. For 64 bit, this condition never happens at all.

In exchange, 99% of uses of unsigned would disappear from D code, and with it, a whole category of bugs.


> Also, in principle, uint-uint can generate a runtime check for underflow (i.e. the carry flag).

No it cannot. The compiler does not have enough information to know if the value is intended to be positive integer, or an unsigned. That information is lost from the type system.

Eg from C, wrapping of an unsigned type is not an error. It is perfectly defined behaviour. With signed types, it's undefined behaviour.


To make this clear: I am not proposing that size_t should be changed.
I am proposing that for .length returns a signed type, that for array slices is guaranteed to never be negative.




November 24, 2014
On Friday, 21 November 2014 at 17:23:51 UTC, Marco Leise wrote:
> Am Thu, 20 Nov 2014 08:18:23 +0000
> schrieb "Don" <x@nospam.com>:
>
>> It's particularly challenging in D because of the widespread use of 'auto':
>> 
>> auto x = foo();
>> auto y = bar();
>> auto z = baz();
>> 
>> if (x - y > z) { ... }
>> 
>> 
>> This might be a bug, if one of these functions returns an unsigned type.  Good luck finding that. Note that if all functions return unsigned, there isn't even any signed-unsigned mismatch.
>
> With those function names I cannot write code.
>
> ℕ x = length();
> ℕ y = index();
> ℕ z = requiredRange();
>
> if (x - y > z) { ... }
>
> Ah, now we're getting somewhere. Yes the code is obviously
> correct. You need to be aware of the value ranges of your
> variables and write subtractions in a way that the result can
> only be >= 0. If you realize that you cannot guarantee that
> for some case, you just found a logic bug. An invalid program
> state that you need to assert/if-else/throw.

Yup. And that is not captured in the type system.

>
> I don't get why so many APIs return ints. Must be to support
> Java or something where proper unsigned types aren't available.

???? D and C do not have suitable types either.

unsigned !=  ℕ.

In D,  1u - 2u > 0u. This is defined behaviour, not an overflow.


November 24, 2014
On Friday, 21 November 2014 at 08:46:20 UTC, Walter Bright wrote:
> On 11/21/2014 12:10 AM, bearophile wrote:
>> Walter Bright:
>>
>>> All you're doing is trading 0 crossing for 0x7FFFFFFF crossing issues, and
>>> pretending the problems have gone away.
>>
>> I'm not pretending anything. I am asking in practical programming what of the
>> two solutions leads to leas problems/bugs. So far I've seen the unsigned
>> solution and I've seen it's highly bug-prone.
>
> I'm suggesting that having a bug and detecting the bug are two different things. The 0-crossing bug is easier to detect, but that doesn't mean that shifting the problem to 0x7FFFFFFF crossing bugs is making the bug count less.
>
>
>>> BTW, granted the 0x7FFFFFFF problems exhibit the bugs less often, but
>>> paradoxically this can make the bug worse, because then it only gets found
>>> much, much later in supposedly tested & robust code.
>>
>> Is this true? Do you have some examples of buggy code?
>
> http://googleresearch.blogspot.com/2006/06/extra-extra-read-all-about-it-nearly.html

Changing signed to unsigned in that example does NOT fix the bug.
It just means it fails with length = 2^^31 instead of length = 2^^30.

uint a = 0x8000_0000u;
uint b = 0x8000_0002u;
assert( (a + b) /2 == 0);

But actually I don't understand that article.
The arrays are int, not char. Since length fits into 32 bits, the largest possible value is 2^^32-1. Therefore, for an int array, with 4 byte elements, the largest possible value is 2^^30-1.

So I think the article is wrong. I don't think there is a bug in the code.




November 24, 2014
On 11/24/14 2:20 AM, Don wrote:
> I am proposing that for .length returns a signed type, that for array
> slices is guaranteed to never be negative.

Assuming you do make the case this change is an improvement, do you believe it's worth the breakage it would create? -- Andrei