[Issue 4725] std.algorithm.sum() (page 2)

May 15, 2011

[Issue 4725] std.algorithm.sum()

Posted by David Simcha
in reply to bearophile_hugs@eml.cc

Permalink

David Simcha

Posted in reply to bearophile_hugs@eml.cc

Permalink

http://d.puremagic.com/issues/show_bug.cgi?id=4725


David Simcha <dsimcha@yahoo.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |dsimcha@yahoo.com


--- Comment #9 from David Simcha <dsimcha@yahoo.com> 2011-05-15 09:33:10 PDT ---
I'm going to have to agree with Bearophile on this one.  Using multiple summation variables is both faster (because it breaks some dependency chains and uses instruction level parallelism better) and more accurate (because you effectively have more precision).  Here's a test program:

import std.stdio, std.datetime, std.random, std.range, std.traits;

// Adapted from dstats.summary.sum().
auto ilpSum(T)(T data) if(isRandomAccessRange!T) {
    enum nILP = 8;  // Empirically optimal on Athlon 64 X2.
    Unqual!(ElementType!T)[nILP] sum = 0;

    size_t i = 0;
    if(data.length > 2 * nILP) {

        for(; i + nILP < data.length; i += nILP) {
            foreach(j; 0..nILP) {
                sum[j] += data[i + j];
            }
        }

        foreach(j; 1..nILP) {
            sum[0] += sum[j];
        }
    }

    for(; i < data.length; i++) {
        sum[0] += data[i];
    }

    return sum[0];
}

auto naiveSum(T)(T data) if(isRandomAccessRange!T) {
    Unqual!(ElementType!T) ret = 0;
    foreach(elem; data) {
        ret += elem;
    }

    return ret;
}

void main() {
    auto nums = new double[10_000_000];
    foreach(ref x; nums) x = uniform(0.0, 1.0);

    auto sw = StopWatch(AutoStart.yes);
    auto s = naiveSum(nums);
    immutable naiveTime = sw.peek.msecs;

    // Printing out s to make sure the compiler doesn't optimize away the
    // whole function.
    writefln("naive:  Sum = %s, %s milliseconds.", s, naiveTime);

    sw.reset();
    s = ilpSum(nums);
    immutable ilpTime = sw.peek.msecs;
    writefln("ilp:  Sum = %s, %s milliseconds.", s, ilpTime);
}

Results:

naive:  Sum = 4.9988e+06, 51 milliseconds.
ilp:  Sum = 4.9988e+06, 33 milliseconds.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

http://d.puremagic.com/issues/show_bug.cgi?id=4725 --- Comment #10 from Andrei Alexandrescu <andrei@metalanguage.com> 2011-05-15 11:36:24 PDT --- No need to teach or convince me of the virtues of ILP. We've long discussed in the newsgroup how associative operations can be significantly accelerated. I was simply replying to the implementation shown, which is a duplication of reduce. Ideally we'd capture more operation than summation in e.g. assoc_reduce. Then sum would become a simple alias. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------

http://d.puremagic.com/issues/show_bug.cgi?id=4725 --- Comment #11 from David Simcha <dsimcha@yahoo.com> 2011-05-15 11:53:59 PDT --- (In reply to comment #10) > No need to teach or convince me of the virtues of ILP. We've long discussed in the newsgroup how associative operations can be significantly accelerated. I was simply replying to the implementation shown, which is a duplication of reduce. Ideally we'd capture more operation than summation in e.g. assoc_reduce. Then sum would become a simple alias. Great idea! When we figure hot how we want to do this, I should probably incorporate it into std.parallelism, too, since std.parallelism already assumes associativity anyhow. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------

http://d.puremagic.com/issues/show_bug.cgi?id=4725 --- Comment #12 from Andrei Alexandrescu <andrei@metalanguage.com> 2011-05-15 12:05:38 PDT --- Probably we need two more reduce primitives, both of which assume associativity. One uses ILP and simple loop unrolling whereas the other uses full-blown threads. There are definitely needs for each. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------

http://d.puremagic.com/issues/show_bug.cgi?id=4725 --- Comment #13 from David Simcha <dsimcha@yahoo.com> 2011-05-15 12:15:45 PDT --- (In reply to comment #12) > Probably we need two more reduce primitives, both of which assume associativity. One uses ILP and simple loop unrolling whereas the other uses full-blown threads. There are definitely needs for each. Right. We already have the full blown threads one in std.parallelism. What I'm saying is that, if you're using threads, you're already assuming associativity. Therefore, you may as well use ILP and loop unrolling, too. std.parallelism currently doesn't do this and once we work out the details and create an assocReduce in std.algorithm, these techniques should be ported to std.parallelism.reduce, too. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------

http://d.puremagic.com/issues/show_bug.cgi?id=4725 --- Comment #14 from bearophile_hugs@eml.cc 2011-11-24 02:39:05 PST --- In Fortran 90+ c'e' una sum(): http://www.nsc.liu.se/~boein/f77to90/a5.html#section14 I think sum() needs a specialization for things like float[4], int[4], double[4], ecc. If a float[4] is represented in a SSE register then sum(float[4]) needs just 2 sums. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------

http://d.puremagic.com/issues/show_bug.cgi?id=4725 --- Comment #15 from bearophile_hugs@eml.cc 2011-11-25 01:04:35 PST --- A good signature for sum(): sum(sequence[, dim=0[, start]]) In Fortran's sum() 'dim' is an optional value that defaults to 0 (well to the first index of an array, that is 0 in D), it specifies what dimension to perfom the sum to. On default it acts like the Python sum(). In Fortran you often have arrays of vectors (or arrays of tuples in D), and Fortran supports vector operations similar to D ones, so: VA = [[1,2,3], [4,5,6]] sum(VA) ==> [5, 7, 9] sum(VA, 0) ==> [5, 7, 9] sum(VA, 1) ==> [6, 15] sum(VA, x) ==> Error if x > 1 Example usage, points is an array of double[3] that represent 3D points: totalDistance = sqrt(sum(points[] ^^ 2, dim=0)); -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------

http://d.puremagic.com/issues/show_bug.cgi?id=4725 --- Comment #16 from bearophile_hugs@eml.cc 2012-04-17 14:14:24 PDT --- See also Issue 7934 for an extra improvement. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------

http://d.puremagic.com/issues/show_bug.cgi?id=4725 --- Comment #17 from Andrei Alexandrescu <andrei@erdani.com> 2013-03-12 22:14:45 PDT --- https://github.com/D-Programming-Language/phobos/pull/1205 -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------

http://d.puremagic.com/issues/show_bug.cgi?id=4725 --- Comment #18 from bearophile_hugs@eml.cc 2013-03-13 11:46:47 PDT --- (In reply to comment #17) > https://github.com/D-Programming-Language/phobos/pull/1205 In the patch this the code used to sum in the general case: + Result seed = 0; + return reduce!"a + b"(seed, r); Beside the idea in Issue 7934 , I suggest to add a "static if" that tests if the input is an array (or a random access range) and in such case instead of reduce uses a loop like this: for (int i = 2; i < stop; i += 2) { total += array[i]; total2 += array[i + 1]; } -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------

Forums