D vs Haskell

Jun 22, 2013

Szymon Gatner

Jun 22, 2013

Juan Manuel Cabo

Jun 22, 2013

Joseph Rushton Wakeling

Jun 22, 2013

Juan Manuel Cabo

Jun 23, 2013

Joseph Rushton Wakeling

Jun 23, 2013

Juan Manuel Cabo

June 22, 2013

Re: D vs Haskell

Posted by Juan Manuel Cabo
in reply to Szymon Gatner

Permalink

Juan Manuel Cabo

Posted in reply to Szymon Gatner

Permalink

On 06/22/2013 03:25 PM, Szymon Gatner wrote:
> Word counting problem in D and Haskell:
> 
> http://leonardo-m.livejournal.com/109201.html

I thought that the D time could be improved further with little changes.

Testing against Complete Works of William Shakespeare (5.3 MiB
plaintext): http://www.gutenberg.org/ebooks/100
and using "dmd -O -inline -noboundscheck -release" on both
the last version of the author, and my version using "byWord"
I got these times (minimum of 10 runs):

    before: 2781 ms

    after:   805 ms


Here is the code, with a "byWord" range using std.ascii.toAlpha:


    import std.stdio, std.conv, std.file, std.string,
           std.algorithm, std.range, std.traits, std.ascii;


    auto hashCounter(R)(R items) if (isForwardRange!R) {
        size_t[ForeachType!R] result;
        foreach (x; items)
            result[x]++;
        return result.byKey.zip(result.byValue);
    }

    void main(string[] args) {
        //Slow:
        //      args[1]
        //      .readText
        //      .toLower
        //      .tr("A-Za-z", "\n", "cs")
        //      .split

        //Faster:
        args[1]
        .readText
        .byWord
        .map!toLower()
        .array
        .hashCounter
        .array
        .sort!"-a[1] < -b[1]"()
        .take(args[2].to!uint)
        .map!q{ text(a[0], " ", a[1]) }
        .join("\n")
        .writeln;
    }

    /** Range that extracts words from a string. Words are
        strings composed only of chars accepted by std.ascii.toAlpha() */
    struct byWord {
        string s;
        size_t pos;
        string word;

        this(string s) {
            this.s = s;
            popFront();
        }

        @property bool empty() const {
            return s.length == 0;
        }

        @property string front() {
            return word;
        }

        void popFront() {
            if (pos == s.length) {
                //Mark the range as empty, only after popFront fails:
                s = null;
                return;
            }

            while (pos < s.length && !std.ascii.isAlpha(s[pos])) {
                ++pos;
            }
            auto start = pos;
            while (pos < s.length && std.ascii.isAlpha(s[pos])) {
                ++pos;
            }

            if (start == s.length) {
                //No more words. Range empty:
                s = null;
            } else {
                word = s[start .. pos];
            }
        }
    }

    unittest {
        assert([] == array(byWord("")));
        assert([] == array(byWord("!@#$")));
        assert(["a", "b"] == array(byWord("a b")));
        assert(["a", "b", "c"] == array(byWord("a b c")));
    }


--jm

On 06/22/2013 10:11 PM, Juan Manuel Cabo wrote: > Testing against Complete Works of William Shakespeare (5.3 MiB > plaintext): http://www.gutenberg.org/ebooks/100 > and using "dmd -O -inline -noboundscheck -release" on both > the last version of the author, and my version using "byWord" > I got these times (minimum of 10 runs): > > before: 2781 ms > > after: 805 ms What about with GDC or LDC?

On 06/22/2013 06:29 PM, Joseph Rushton Wakeling wrote: > On 06/22/2013 10:11 PM, Juan Manuel Cabo wrote: >> Testing against Complete Works of William Shakespeare (5.3 MiB >> plaintext): http://www.gutenberg.org/ebooks/100 >> and using "dmd -O -inline -noboundscheck -release" on both >> the last version of the author, and my version using "byWord" >> I got these times (minimum of 10 runs): >> >> before: 2781 ms >> >> after: 805 ms > > What about with GDC or LDC? > Right, the author of the article used ldc. I'm always used to dmd. Keep in mind that he hasn't released the 9MB text that he used, but instead pointed to that 5.3MB Shakespeare file. This time I used LDC 0.11.0, with: ldc2 -O -release -disable-boundscheck Times (minimum of 10 runs): before: 1617 ms after: 554 ms Conclusion: using "byWords" was 3.4 times faster with DMD and 2.9 times faster with LDC. --jm

On Saturday, 22 June 2013 at 21:45:48 UTC, Juan Manuel Cabo wrote: > Right, the author of the article used ldc. I'm always used > to dmd. You know that you can use ldmd2 to invoke LDC using the same flags as DMD? > This time I used LDC 0.11.0, with: > > ldc2 -O -release -disable-boundscheck > > Times (minimum of 10 runs): > > before: 1617 ms > > after: 554 ms > > Conclusion: using "byWords" was 3.4 times faster with DMD > and 2.9 times faster with LDC. Any chance of GDC results? You can again use gdmd for DMD-like interface.

On 06/23/2013 05:42 AM, Joseph Rushton Wakeling wrote: > On Saturday, 22 June 2013 at 21:45:48 UTC, Juan Manuel Cabo wrote: >> Right, the author of the article used ldc. I'm always used to dmd. > > You know that you can use ldmd2 to invoke LDC using the same flags as DMD? > >> This time I used LDC 0.11.0, with: >> >> ldc2 -O -release -disable-boundscheck >> >> Times (minimum of 10 runs): >> >> before: 1617 ms >> >> after: 554 ms >> >> Conclusion: using "byWords" was 3.4 times faster with DMD and 2.9 times faster with LDC. > > > Any chance of GDC results? You can again use gdmd for DMD-like interface. The computer where I tested has Kubuntu 12.04, which is too old even for gcc-4.7, and I cannot compile GDC there. The gdc version which comes in the repository fails to compile the program, even after some rearranging so that it doesn't use UFCS. Anyway, using my "byWord" range to iterate the string will surely be faster with GDC too, because tr and split can be a little slow, especially since the string is big (5MB). I tested a little more and found that using a handcoded function instead of tr, and using splitter instead of split, it's almost as fast as the byWord version. The handcoded function assumes ascii and preallocates the resulting string (with result.length = source.length). --jm

Forums