December 24, 2010
Another post about Phobos usability. Again the relative code is a rosettacode Task:

>Assume we have a collection of numbers, and want to find the one with the largest minimal prime factor (that is, the one that contains relatively large factors). To speed up the search, the factorization should be done in parallel using separate threads or processes, to take advantage of multi-core CPUs.<

The original D2 implementation (probably by feepinCreature), with little changes: http://rosettacode.org/wiki/Parallel_calculations#D

This D2 version is not bad, but there are some ways to improve it.

------------------------------------

1) Finding the max or min of an iterable is a really common need. Probably I need to do it every 50-100 lines of code or less. This is how it is found in that program:

reduce!q{max(a, b)}(0L, minFactors));

I strongly suggest the max() and min() to find the max of an iterable by themselves. And to optionally accept a mapping function (as schwartzSort). See:
http://d.puremagic.com/issues/show_bug.cgi?id=4705

With that the line of code becomes more readable and shorter, and surely bug-free:
max(minFactors);

Similar considerations suggest the creation of a sum() function: http://d.puremagic.com/issues/show_bug.cgi?id=4725

A sum() is not just a reduce, because it is allowed to use an optimization. See issue 4725.

------------------------------------

2) Clojure language shows how nice is a pmap(), parallel map.

If necessary two different pmap() may be created, one where the single mapping operation is slow enough, and one where it is very cheap. For this Task I use the pmap() for costly mapping function.

Using pmap() the D2 program may be shortened to (this loses the timings, but they are not required):

import std.stdio, std.math, std.algorithm, std.parallel;

pure ulong lowestFactor(immutable ulong n) {
    if (n % 2 == 0)
        return 2;
    else {
        immutable ulong limit = cast(ulong)sqrt(n) + 1;
        for (ulong i = 3; i < limit; i += 2)
            if (n % i == 0)
                return i;
    }
    return n;
}

void main() {
    auto numbers = [2UL^^61-1, 2UL^^61-1, 2UL^^61-1, 112272537195293,
                    115284584522153, 115280098190773, 115797840077099,
                    112582718962171, 112272537095293, 1099726829285419];

    writefln("Largest min. factor is %s.",
             max(pmap(&lowestFactor, numbers)));
}


pmap is able to see that the given mapping function is pure. Here lowestFactor() is strongly pure, so pmap() is free to ignore the execution order of the single lowestFactor functions.

An alternative syntax for pmap is similar to the map:
pmap!(&lowest_factor)(numbers)

------------------------------------

3) For debugging I may want to print the array of lowest factors:
writeln(pmap(&lowest_factor, numbers));

Or even just:
writeln(map!(&lowest_factor)(numbers));

writeln() is able to print a lazy iterable, but it prints it just as an array. This is bad, because it's important to give cues to the person that reads to the printout regarding the types of the types printed.

A simple way to tell apart lazy sequences from arrays is to use a semicolon instead a comma to tell apart items (lists are sometimes written using a semicolon in other functional languages, so this is not a new thing):

[0; 1; 2; 3; 4]

See:
http://d.puremagic.com/issues/show_bug.cgi?id=3813

Empty dynamic arrays, empty static arrays, empty associative arrays and empty lazy iterables need some kind of output when they are printed, I suggest a simple "[]" at least. With no output it's too much easy to confuse "no output" with "empty array" with "missing writeln" with "not run code path", etc. (In bugzilla there is a bug report about this, not written by me, I don't remember its number).

Bye,
bearophile