Thread overview | ||||||||
---|---|---|---|---|---|---|---|---|
|
June 22, 2013 D vs Haskell | ||||
---|---|---|---|---|
| ||||
Word counting problem in D and Haskell: http://leonardo-m.livejournal.com/109201.html |
June 22, 2013 Re: D vs Haskell | ||||
---|---|---|---|---|
| ||||
Posted in reply to Szymon Gatner | On 06/22/2013 03:25 PM, Szymon Gatner wrote: > Word counting problem in D and Haskell: > > http://leonardo-m.livejournal.com/109201.html I thought that the D time could be improved further with little changes. Testing against Complete Works of William Shakespeare (5.3 MiB plaintext): http://www.gutenberg.org/ebooks/100 and using "dmd -O -inline -noboundscheck -release" on both the last version of the author, and my version using "byWord" I got these times (minimum of 10 runs): before: 2781 ms after: 805 ms Here is the code, with a "byWord" range using std.ascii.toAlpha: import std.stdio, std.conv, std.file, std.string, std.algorithm, std.range, std.traits, std.ascii; auto hashCounter(R)(R items) if (isForwardRange!R) { size_t[ForeachType!R] result; foreach (x; items) result[x]++; return result.byKey.zip(result.byValue); } void main(string[] args) { //Slow: // args[1] // .readText // .toLower // .tr("A-Za-z", "\n", "cs") // .split //Faster: args[1] .readText .byWord .map!toLower() .array .hashCounter .array .sort!"-a[1] < -b[1]"() .take(args[2].to!uint) .map!q{ text(a[0], " ", a[1]) } .join("\n") .writeln; } /** Range that extracts words from a string. Words are strings composed only of chars accepted by std.ascii.toAlpha() */ struct byWord { string s; size_t pos; string word; this(string s) { this.s = s; popFront(); } @property bool empty() const { return s.length == 0; } @property string front() { return word; } void popFront() { if (pos == s.length) { //Mark the range as empty, only after popFront fails: s = null; return; } while (pos < s.length && !std.ascii.isAlpha(s[pos])) { ++pos; } auto start = pos; while (pos < s.length && std.ascii.isAlpha(s[pos])) { ++pos; } if (start == s.length) { //No more words. Range empty: s = null; } else { word = s[start .. pos]; } } } unittest { assert([] == array(byWord(""))); assert([] == array(byWord("!@#$"))); assert(["a", "b"] == array(byWord("a b"))); assert(["a", "b", "c"] == array(byWord("a b c"))); } --jm |
June 22, 2013 Re: D vs Haskell | ||||
---|---|---|---|---|
| ||||
Posted in reply to Juan Manuel Cabo | On 06/22/2013 10:11 PM, Juan Manuel Cabo wrote:
> Testing against Complete Works of William Shakespeare (5.3 MiB
> plaintext): http://www.gutenberg.org/ebooks/100
> and using "dmd -O -inline -noboundscheck -release" on both
> the last version of the author, and my version using "byWord"
> I got these times (minimum of 10 runs):
>
> before: 2781 ms
>
> after: 805 ms
What about with GDC or LDC?
|
June 22, 2013 Re: D vs Haskell | ||||
---|---|---|---|---|
| ||||
Posted in reply to Joseph Rushton Wakeling | On 06/22/2013 06:29 PM, Joseph Rushton Wakeling wrote:
> On 06/22/2013 10:11 PM, Juan Manuel Cabo wrote:
>> Testing against Complete Works of William Shakespeare (5.3 MiB
>> plaintext): http://www.gutenberg.org/ebooks/100
>> and using "dmd -O -inline -noboundscheck -release" on both
>> the last version of the author, and my version using "byWord"
>> I got these times (minimum of 10 runs):
>>
>> before: 2781 ms
>>
>> after: 805 ms
>
> What about with GDC or LDC?
>
Right, the author of the article used ldc. I'm always used
to dmd.
Keep in mind that he hasn't released the 9MB text that he
used, but instead pointed to that 5.3MB Shakespeare file.
This time I used LDC 0.11.0, with:
ldc2 -O -release -disable-boundscheck
Times (minimum of 10 runs):
before: 1617 ms
after: 554 ms
Conclusion: using "byWords" was 3.4 times faster with DMD and 2.9 times faster with LDC.
--jm
|
June 23, 2013 Re: D vs Haskell | ||||
---|---|---|---|---|
| ||||
Posted in reply to Juan Manuel Cabo | On Saturday, 22 June 2013 at 21:45:48 UTC, Juan Manuel Cabo wrote: > Right, the author of the article used ldc. I'm always used > to dmd. You know that you can use ldmd2 to invoke LDC using the same flags as DMD? > This time I used LDC 0.11.0, with: > > ldc2 -O -release -disable-boundscheck > > Times (minimum of 10 runs): > > before: 1617 ms > > after: 554 ms > > Conclusion: using "byWords" was 3.4 times faster with DMD > and 2.9 times faster with LDC. Any chance of GDC results? You can again use gdmd for DMD-like interface. |
June 23, 2013 Re: D vs Haskell | ||||
---|---|---|---|---|
| ||||
Posted in reply to Joseph Rushton Wakeling | On 06/23/2013 05:42 AM, Joseph Rushton Wakeling wrote:
> On Saturday, 22 June 2013 at 21:45:48 UTC, Juan Manuel Cabo wrote:
>> Right, the author of the article used ldc. I'm always used to dmd.
>
> You know that you can use ldmd2 to invoke LDC using the same flags as DMD?
>
>> This time I used LDC 0.11.0, with:
>>
>> ldc2 -O -release -disable-boundscheck
>>
>> Times (minimum of 10 runs):
>>
>> before: 1617 ms
>>
>> after: 554 ms
>>
>> Conclusion: using "byWords" was 3.4 times faster with DMD and 2.9 times faster with LDC.
>
>
> Any chance of GDC results? You can again use gdmd for DMD-like interface.
The computer where I tested has Kubuntu 12.04, which is
too old even for gcc-4.7, and I cannot compile GDC there.
The gdc version which comes in the repository fails
to compile the program, even after some rearranging so
that it doesn't use UFCS.
Anyway, using my "byWord" range to iterate the string
will surely be faster with GDC too, because tr and split can
be a little slow, especially since the string is big (5MB).
I tested a little more and found that using a handcoded function instead of tr, and using splitter instead of split, it's almost as fast as the byWord version. The handcoded function assumes ascii and preallocates the resulting string (with result.length = source.length).
--jm
|
Copyright © 1999-2021 by the D Language Foundation