topN using a heap (page 8)

On 09/23/2016 11:40 AM, Stefan Koch wrote: > On Friday, 23 September 2016 at 15:30:20 UTC, Andrei Alexandrescu wrote: >> Work is blocked by https://issues.dlang.org/show_bug.cgi?id=16528, >> which is quite a head-scratcher. Any ideas for a workaround? Thanks! >> -- Andrei > > annotate the templates. Nagonna work for generic functions -- Andrei

On Friday, 23 September 2016 at 16:09:18 UTC, Andrei Alexandrescu wrote: > > BTW, as I commented in https://issues.dlang.org/show_bug.cgi?id=16517, using the new topN implementation instead of sort to compute the median over google's 1-grams is over 11x faster using dmd. > That's a very nice result. Both the absolute numbers and that all three sets achieve similar performance, as they different distribution characteristics. --Jon

September 24, 2016

Re: topN using a heap

Posted by Andrei Alexandrescu
in reply to Jon Degenhardt

Permalink

Andrei Alexandrescu

Posted in reply to Jon Degenhardt

Permalink

On 09/23/2016 04:45 PM, Jon Degenhardt wrote:
> That's a very nice result. Both the absolute numbers and that all three
> sets achieve similar performance, as they different distribution
> characteristics.

Got curious so I tried to patch my ldc installation (0.17.1 (DMD v2.068.2, LLVM 3.8.0)), but sadly that didn't work out because sorting.d uses the new syntax in various places.

Probably the best baseline is the equivalent C++ code using the mature implementation of nth_element, see http://paste.ofcode.org/ieZPdchjzTXbDcG3LvsYBP (also pasted at the end of this message). Here's what I got: http://paste.ofcode.org/6feyBLRiJtieHdZ3bmGaUW, see also text below.

The topN code produced with dmd is faster in virtually all cases (has parity for -f 4, which I suspect is a degenerate case with most elements equal, which exercises only a small part of the algorithm). For C++ I used -O3 -DNDEBUG, any other flags I should use?

As an aside, I enjoy poking fun at the stunning inefficiency of iostreams in my public talks; they delivered again :o).


Andrei

=========================== shell
$ g++ -O3 -DNDEBUG -omedian_by_sort_cpp issue16517.cpp
$ g++ -O3 -DNDEBUG -DTOPN -omedian_by_topn_cpp issue16517.cpp
$ dmd -O -inline -release -ofmedian_by_sort -boundscheck=off issue16517.d
$ dmd -O -inline -release -version=topn -ofmedian_by_topn -boundscheck=off issue16517.d
$ cut -f 2  /tmp/googlebooks-eng-all-1gram-20120701-0 | ./median_by_sort_cpp
median (via sort): 1972; lines: 10512769; total time (ms): 3419.11; sort time (ms): 314.086
$ cut -f 2  /tmp/googlebooks-eng-all-1gram-20120701-0 | ./median_by_sort
median (via sort): 1972; lines: 10512769; total time (ms): 1462; sort time (ms): 399
$ cut -f 2  /tmp/googlebooks-eng-all-1gram-20120701-0 | ./median_by_topn_cpp
median (via nth_element): 1972; lines: 10512769; total time (ms): 3255.41; nth_element time (ms): 40.676
$ cut -f 2  /tmp/googlebooks-eng-all-1gram-20120701-0 | ./median_by_topn
median (via topN): 1972; lines: 10512769; total time (ms): 1211; topN time (ms): 32
$ cut -f 3  /tmp/googlebooks-eng-all-1gram-20120701-0 | ./median_by_sort_cpp
median (via sort): 3; lines: 10512769; total time (ms): 2949.11; sort time (ms): 297.814
$ cut -f 3  /tmp/googlebooks-eng-all-1gram-20120701-0 | ./median_by_sort
median (via sort): 3; lines: 10512769; total time (ms): 1351; sort time (ms): 382
$ cut -f 3  /tmp/googlebooks-eng-all-1gram-20120701-0 | ./median_by_topn_cpp
median (via nth_element): 3; lines: 10512769; total time (ms): 2316.75; nth_element time (ms): 65.041
$ cut -f 3  /tmp/googlebooks-eng-all-1gram-20120701-0 | ./median_by_topn
median (via topN): 3; lines: 10512769; total time (ms): 973; topN time (ms): 59
$ cut -f 4  /tmp/googlebooks-eng-all-1gram-20120701-0 | ./median_by_sort_cpp
median (via sort): 2; lines: 10512769; total time (ms): 2614.47; sort time (ms): 351.285
$ cut -f 4  /tmp/googlebooks-eng-all-1gram-20120701-0 | ./median_by_sort
median (via sort): 2; lines: 10512769; total time (ms): 1269; sort time (ms): 361
$ cut -f 4  /tmp/googlebooks-eng-all-1gram-20120701-0 | ./median_by_topn_cpp
median (via nth_element): 2; lines: 10512769; total time (ms): 2443.17; nth_element time (ms): 76.211
$ cut -f 4  /tmp/googlebooks-eng-all-1gram-20120701-0 | ./median_by_topn
median (via topN): 2; lines: 10512769; total time (ms): 998; topN time (ms): 77
===========================

=========================== issue16517.cpp
#include <iostream>
#include <cstdio>
#include <ctime>
#include <vector>
#include <algorithm>

using namespace std;

int main()
{
    clock_t swTotal, swSort;
    swTotal = clock();
    vector<double> values;
    double num;
    while (cin >> num) {
        values.push_back(num);
    }

    size_t medianIndex = values.size() / 2;
    swSort = clock();
    #ifdef TOPN
        const char* method = "nth_element";
        nth_element(values.begin(), values.begin() + medianIndex, values.end());
    #else
        const char* method = "sort";
        sort(values.begin(), values.end());
    #endif
    clock_t done = clock();

    printf("median (via %s): %g; lines: %zu; total time (ms): %g; %s time (ms): %g\n",
             method, values[medianIndex], values.size(),
             (done - swTotal) * 1000. / CLOCKS_PER_SEC,
             method, (done - swSort) * 1000. / CLOCKS_PER_SEC);
}
===========================

=========================== issue16517.d
import std.stdio;
import std.array : appender;
import std.algorithm : sort, topN;
import std.conv;
import std.range;
import std.datetime: StopWatch;

void main()
{
    StopWatch swTotal, swSort;
    swTotal.start;
    Appender!(double[]) values;
    foreach (line; stdin.byLine)
        values.put(line.to!double);

    size_t medianIndex = values.data.length/2;
    swSort.start;
    version(topn)
    {
        topN(values.data, medianIndex);
        auto method = "topN";
    }
    else {
        sort(values.data);
        auto method = "sort";
    }
    swTotal.stop;
    swSort.stop;

    writefln("median (via %s): %g; lines: %d; total time (ms): %d; %s time (ms): %d",
             method, values.data[medianIndex], values.data.length, swTotal.peek.msecs,
             method, swSort.peek.msecs);
}
===========================

On Saturday, 24 September 2016 at 18:22:51 UTC, Andrei Alexandrescu wrote: > Got curious so I tried to patch my ldc installation (0.17.1 (DMD v2.068.2, LLVM 3.8.0)), but sadly that didn't work out because sorting.d uses the new syntax in various places. > > Probably the best baseline is the equivalent C++ code using the mature implementation of nth_element, see http://paste.ofcode.org/ieZPdchjzTXbDcG3LvsYBP (also pasted at the end of this message). Here's what I got: http://paste.ofcode.org/6feyBLRiJtieHdZ3bmGaUW, see also text below. > > The topN code produced with dmd is faster in virtually all cases (has parity for -f 4, which I suspect is a degenerate case with most elements equal, which exercises only a small part of the algorithm). For C++ I used -O3 -DNDEBUG, any other flags I should use? > > [snip] > I added the topN pull request to a local LDC build (current LDC master). Also built the C++ nth_element and sort program you listed. (GCC 4.9.3, compiler line: g++-mp-4.9 -O3 -static-libgcc -static-libstdc++ -std=c++11). Did 7 runs for each variant on the three fields in ngram file. Median times are below. Median sort times (ms): Field 2 Field 3 Field 4 DMD 351 348 326 LDC 273 261 245 C++ 281 260 246 Median topN / nth_element times (ms): Field 2 Field 3 Field 4 LDC 21 44 57 C++ 41 60 71 Looking very good indeed!

On 9/25/16 4:19 AM, Jon Degenhardt wrote: > On Saturday, 24 September 2016 at 18:22:51 UTC, Andrei Alexandrescu wrote: >> Got curious so I tried to patch my ldc installation (0.17.1 (DMD >> v2.068.2, LLVM 3.8.0)), but sadly that didn't work out because >> sorting.d uses the new syntax in various places. >> >> Probably the best baseline is the equivalent C++ code using the mature >> implementation of nth_element, see >> http://paste.ofcode.org/ieZPdchjzTXbDcG3LvsYBP (also pasted at the end >> of this message). Here's what I got: >> http://paste.ofcode.org/6feyBLRiJtieHdZ3bmGaUW, see also text below. >> >> The topN code produced with dmd is faster in virtually all cases (has >> parity for -f 4, which I suspect is a degenerate case with most >> elements equal, which exercises only a small part of the algorithm). >> For C++ I used -O3 -DNDEBUG, any other flags I should use? >> >> [snip] >> > I added the topN pull request to a local LDC build (current LDC master). > Also built the C++ nth_element and sort program you listed. (GCC 4.9.3, > compiler line: g++-mp-4.9 -O3 -static-libgcc -static-libstdc++ -std=c++11). > > Did 7 runs for each variant on the three fields in ngram file. Median > times are below. > > Median sort times (ms): > > Field 2 Field 3 Field 4 > DMD 351 348 326 > LDC 273 261 245 > C++ 281 260 246 > > Median topN / nth_element times (ms): > > Field 2 Field 3 Field 4 > LDC 21 44 57 > C++ 41 60 71 > > Looking very good indeed! Thanks for this work! -- Andrei

Forums