Apparently unsigned types really are necessary (page 2)

On 22.01.2012 13:49, Jonathan M Davis wrote: > On Sunday, January 22, 2012 13:40:08 Marco Leise wrote: >> I heard that in the past, but in my own experience using unsigned data >> types, it did not cause any more bugs. OTOH, textual output is more >> correct and I find code easier to understand, if it is using the correct >> 'class' of integers. But this "a lot of programmers who don't particularly >> like using unsigned types" must come from somewhere. Except for existing >> bugs in the form of silent under-/overflows that do not appear alarming in >> a debugger due to their signedness, I've yet to see a convincing example >> of real world code, that I would write this way and is flawed due to the >> use of uint instead of int. Or is this like spaces vs. tabs? 'Cause I'm >> also a tab user. > > Down with tabs! ;) > > One issue with unsigned integers right off the bat is for loops. > > for(size_t i = a.length; i> 0; --i) {} > > is not going to work. That'll work just fine, you probably meant '>=' ;)

On Sun, 22 Jan 2012 13:49:37 +0100, Jonathan M Davis <jmdavisProg@gmx.com> wrote: > One issue with unsigned integers right off the bat is for loops. > for(size_t i = a.length; i > 0; --i) {} > is not going to work. What's not working with this? Besides a neat idiom for reverse array indexing for (size_t i = a.length; i--; ) writeln(a[i]);

Am 22.01.2012, 13:49 Uhr, schrieb Jonathan M Davis <jmdavisProg@gmx.com>: > Down with tabs! ;) > > One issue with unsigned integers right off the bat is for loops. > > for(size_t i = a.length; i > 0; --i) {} > is not going to work. That is C style. In D you would write: foreach_reverse(i; 0 .. a.length) {}, which is safe and corrects the two bugs in your code. > Another potentially nasty situation is subtraction. It > can do fun things when you subtract one unsigned type from another if you're > not careful (since if the result is negative and is then assigned to an > unsigned integer...). There are probably others, but that's what comes to mind > immediately. In general, it comes down to issues with them rolling over and > becoming incredibly large values when they go below 0. I'm always careful when subtracting unsigned ints for the simple reason that the code working on them would be incorrect if results were negative. One example is subtracting two TickDurations. You always know which one is the lower. The same goes for offsets into files. When you copy the block between two locations you cannot exchange start and end. Imagine we had checked integers now, a proposal that doesn't seem far fetched. Would they scream in pain if I wrote "checked_ulong duration = start_time - end_time"? Yes. > Sure, unsigned types can be useful, and if you're careful with them, you can > be fine, but there are definitely cases where they cause trouble. Hence, why > many programmers argue for not using them unless you actually need them. > > - Jonathan M Davis I guess my mental model of integers has grown on the idea that an unsigned integer matches the addressable memory of my computer, and thus it is the natural choice there for array lengths and whatever is limited only by available RAM and comes in positive counts; whereas I 'waste' half the range and have only 'half' a match with signed types. I will put this under "tabs vs. spaces". :)

On 01/22/2012 04:49 AM, Jonathan M Davis wrote: > Another potentially nasty situation is subtraction. It > can do fun things when you subtract one unsigned type from another if you're > not careful (since if the result is negative and is then assigned to an > unsigned integer...). No need to assign the result explicitly either. Additionally, the subtraction is someties implicit. When the expression has an unsigned in it, the temporary result is unsigned by the language rules since C: import std.stdio; int foo() { return -2; } uint bar() { return 1; } void main() { writeln(foo() + bar()); } The program above prints 4294967295. It may make perfect sense for bar() to return an unsigned type (like arrays' .length property), but every time I decide on an unsigned type, I think about the potentially-unintended implicit conversion to unsigned that may bite the users of bar(). Ali

Am 22.01.2012, 18:00 Uhr, schrieb Ali Çehreli <acehreli@yahoo.com>: > On 01/22/2012 04:49 AM, Jonathan M Davis wrote: > > > Another potentially nasty situation is subtraction. It > > can do fun things when you subtract one unsigned type from another if you're > > not careful (since if the result is negative and is then assigned to an > > unsigned integer...). > > No need to assign the result explicitly either. Additionally, the subtraction is someties implicit. > > When the expression has an unsigned in it, the temporary result is unsigned by the language rules since C: > > import std.stdio; > > int foo() > { > return -2; > } > > uint bar() > { > return 1; > } > > void main() > { > writeln(foo() + bar()); > } > > The program above prints 4294967295. > > It may make perfect sense for bar() to return an unsigned type (like arrays' .length property), but every time I decide on an unsigned type, I think about the potentially-unintended implicit conversion to unsigned that may bite the users of bar(). > > Ali That's a valid point, if the order "foo() + bar()" makes sense and a negative value is expected. After all foo() and bar() must be related somehow, otherwise you wouldn't add them. In this case, if the expected result is a signed int, I would make bar() return a signed int as well and not treat that case like array.length!

On 1/22/2012 9:44 AM, equinox@atw.hu wrote: > I noticed I cannot use typedef any longer in D2. > Why did it go? typedef turned out to have many difficult issues about when it was a distinct type and when it wasn't.

On 1/22/2012 4:40 AM, Marco Leise wrote: > Or is > this like spaces vs. tabs? 'Cause I'm also a tab user. I struggled with that for years. Not with my own code, the tabs worked fine. The trouble was when collaborating with other people, who insisted on using tab stop settings that were the evil spawn of satan. Hence, collaborated code was always a mess. Like newklear combat toe to toe with the roosskies, the only way to win is to not play.

January 22, 2012

Re: Apparently unsigned types really are necessary

Posted by bcs
in reply to Marco Leise

Permalink

bcs

Posted in reply to Marco Leise

Permalink

On 01/22/2012 01:31 AM, Marco Leise wrote:
> Am 22.01.2012, 08:23 Uhr, schrieb bcs <bcs@example.com>:
>
>> On 01/21/2012 10:05 PM, Walter Bright wrote:
>>> http://news.ycombinator.com/item?id=3495283
>>>
>>> and getting rid of unsigned types is not the solution to signed/unsigned
>>> issues.
>>
>> A quote from that link:
>>
>> "There are many use cases for data types that behave like pure bit
>> strings with no concept of sign."
>>
>> Why not recast the concept of unsigned integers as "bit vectors (that
>> happen to implement arithmetic)"? I've seen several sources claim that
>> uint (and friends) should never be used unless you are using it for
>> low level bit tricks and the like.
>
> Those are heretics.
>
>> Rename them bits{8,16,32,64} and make the current names aliases.
>
> So everyone uses int, and we get messages like: "This program currently
> uses -1404024 bytes of RAM". I have strong feelings against using signed
> types for variables that are ever going to only hold positive numbers,
> especially when it comes to sizes and lengths.

OK, I'll grant that there are a (*extremely* limited) number of cases where you actually need the full range of an unsigned integers type. I'm not suggesting that the actual semantics of the type be modified and it would still be usable for exactly that sort of cases. My suggestion is that the naming be modified to avoid suggesting that the *primary* use for the type is for non negative numbers.

To support that position, if you really expect to encounter and thus need to correctly handle numbers between 2^31 and 2^32 (or 63/64, etc.) then you already need to be doing careful analyses to avoid bugs from overflow. At that point, you are already considering low level details and using a "bit vector" type as a number is not much more complicated. The added bonus is that the mismatch between the name and what it's used for is a big red flag saying "be careful or this is likely to cause bugs".

Getting people to think of it that way is likely to prevent more bugs that it cause.

Forums