Thread overview
[Issue 3843] New: Signed lengths (and other built-in values)
February 24, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=3843

           Summary: Signed lengths (and other built-in values)
           Product: D
           Version: 2.040
          Platform: All
        OS/Version: Windows
            Status: NEW
          Severity: normal
          Priority: P2
         Component: DMD
        AssignedTo: nobody@puremagic.com
        ReportedBy: bearophile_hugs@eml.cc


--- Comment #0 from bearophile_hugs@eml.cc 2010-02-23 16:58:38 PST ---
This is Java code coming from Wikipedia: http://en.wikipedia.org/wiki/Selection_Sort#Code


void selectionSort(int[] a)
{
    for (int i = 0; i < a.length - 1; i++)
    {
        int min = i;
        for (int j = i + 1; j < a.length; j++)
        {
            if (a[j] < a[min])
            {
                min = j;
            }
        }
        if (i != min)
        {
            int swap = a[i];
            a[i] = a[min];
            a[min] = swap;
        }
    }
}


You can translate it in correct Python in a short time (I have added a postcondition too, plus doctests):

def selection_sort(arr):
    """
    >>> items = []
    >>> selection_sort(items)
    >>> items
    []
    >>> items = [5, 3, 1]
    >>> selection_sort(items)
    >>> items
    [1, 3, 5]
    """
    for i in xrange(0, len(arr)-1):
        min = i
        for j in xrange(i + 1, len(arr)):
            if arr[j] < arr[min]:
                min = j
        if i != min:
            arr[i], arr[min] = arr[min], arr[i]

    if __debug__:
        # postcondition that it's not increasing
        for i, el in enumerate(arr[:-1]):
            assert el <= arr[i + 1], "'arr' not correctly sorted."

if __name__ == "__main__":
    import doctest
    doctest.testmod()
    print "Doctests done.\n"


This is a possible translation in D2 with the postcondition:

import std.stdio: writeln;

void selectionSort(T)(T[] arr)
out {
    // assert that it's not increasing
    foreach (i, el; arr[0 .. $-1])
        assert (el <= arr[i + 1],
                "selectionSort: 'arr' not correctly sorted.");
}
body {
    for (int i = 0; i < arr.length - 1; i++) {
        T min = i;
        for (int j = i + 1; j < arr.length; j++)
            if (arr[j] < arr[min])
                min = j;
        if (i != min)  {
            T aux = arr[i];
            arr[i] = arr[min];
            arr[min] = aux;
        }
    }
}

void main() {
    int[] items;
    writeln(items);
    selectionSort(items);
    writeln(items);
}


But unlike the Python and Java code, that D2 code contains two bugs (or more), that Python, Java and C# programmers may need some time to spot.

The first bug is in this line:
for (int i = 0; i < arr.length - 1; i++) {

This bug is caused by the combination of the following factors:
- arr.length is unsigned, so 0-1 is not -1, it's not less than 0.
- Currently with DMD there's no way to ask the code to perform integral
overflow tests at runtime, so it can't catch it and give a nice error message
at runtime.
- The compiler isn't even telling me that i < arr.length performs a comparison
between a signed and unsigned number, that can hide a bug (I think GCC can
raise a warning there).

I can show other examples of D code that contain similar bugs caused by mixing signed and unsigned values.

Unsigned integers are useful when:
- You need a bit array, for example to implement a bloom filter, a bitvector, a
bit set, when you want to do SWAR, when you need bit arrays to deal with
hardware, etc.
- When you really need the full range of numbers 0 .. 2^n, this happens but
it's uncommon for the numbers of 32 or more bits (it's more common for numbers
less than 32 bits long).

In most other situations using unsigned numbers is unsafe (because D can silently cast signed values to unsigned ones, and because it doesn't perform run time overflow tests) and it's better to use signed values. So for example array indices and lengths are (I think) better to be signed (ptrdiff_t), as almost everything else.

If you mix signed and unsigned arithmetic to index an array or to measure its length you will often introduce bugs in the code.

Note 1: C# has unsigned numbers, but I think its array lengths are signed.

Note 2: on 64 bit operating systems arrays can be longer than 2^31, and even
now an int is not able to store the full range of a 32 bit ptrdiff_t. So code
like:
int len = arr.length;
Is unsafe now and will be even more unsafe on 64 bit systems where arrays can
and will be longer. It's better to invent ways to avoid such kind bugs as much
as possible.

Note 3: The second bug in that D2 selectionSort is in this line:
foreach (i, el; arr[0 .. $-1])
If arr is empty, that produces an out of bound error in nonrelease mode (a
runtime error is better than nothing).
In Python a slice of an empty array is empty (this is not a design overlook or
mistake, and it's something Python designers wanted).

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
January 09, 2011
http://d.puremagic.com/issues/show_bug.cgi?id=3843


Andrei Alexandrescu <andrei@metalanguage.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
                 CC|                            |andrei@metalanguage.com
         AssignedTo|nobody@puremagic.com        |andrei@metalanguage.com


-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
June 06, 2011
http://d.puremagic.com/issues/show_bug.cgi?id=3843


Andrei Alexandrescu <andrei@metalanguage.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|                            |INVALID


--- Comment #1 from Andrei Alexandrescu <andrei@metalanguage.com> 2011-06-06 06:27:39 PDT ---
At this point it's virtually impossible to turn lengths to signed types.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
June 06, 2011
http://d.puremagic.com/issues/show_bug.cgi?id=3843



--- Comment #2 from bearophile_hugs@eml.cc 2011-06-06 10:17:33 PDT ---
(In reply to comment #1)
> At this point it's virtually impossible to turn lengths to signed types.

I understand.

A note: in Bugzilla I have some issues open regarding little D improvements. If they don't happen in something like another year, it will probably be quite harder to perform those changes, even if they are appreciated. I am open for any question about those issues.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------