wc in D: 712 Characters Without a Single Branch

Jan 28, 2020

Mike Parker

Jan 28, 2020

Steven Schveighoffer

Jan 28, 2020

OlaOst

Jan 28, 2020

Paul Backus

Jan 28, 2020

Petar Kirov [ZombineDev]

Jan 28, 2020

Petar Kirov [ZombineDev]

Jan 28, 2020

Petar Kirov [ZombineDev]

Jan 28, 2020

sarn

Jan 28, 2020

Steven Schveighoffer

Robert Schadek was inspired by a post he saw on Hacker News a while back showing an implementation of wc in Haskell totaling 80 lines. He decided he could do better in D. So he did. This post on the D blog shows what he came up with and also provides a brief introduction to ranges. The blog: https://dlang.org/blog/2020/01/28/wc-in-d-712-characters-without-a-single-branch/ Reddit: https://www.reddit.com/r/programming/comments/ev5w91/wc_in_d_712_characters_without_a_single_branch/ It's also on Hacker News, but please don't post a direct link to it if you find it there. https://news.ycombinator.com/

January 28, 2020

Re: wc in D: 712 Characters Without a Single Branch

Posted by Steven Schveighoffer
in reply to Mike Parker

Permalink

Steven Schveighoffer

Posted in reply to Mike Parker

Permalink

On 1/28/20 9:01 AM, Mike Parker wrote:
> Robert Schadek was inspired by a post he saw on Hacker News a while back showing an implementation of wc in Haskell totaling 80 lines. He decided he could do better in D. So he did. This post on the D blog shows what he came up with and also provides a brief introduction to ranges.
> 
> 
> The blog:
> https://dlang.org/blog/2020/01/28/wc-in-d-712-characters-without-a-single-branch/ 
> 
> 
> Reddit:
> https://www.reddit.com/r/programming/comments/ev5w91/wc_in_d_712_characters_without_a_single_branch/ 
> 
> 
> It's also on Hacker News, but please don't post a direct link to it if you find it there.
> 
> https://news.ycombinator.com/

Nice! I bet you get better performance with a tweak: you are currently iterating each line 2x (3x if you count searching for the line ending).

If you did both at once, the toLine function would be longer, and probably have some branching in it. But I bet you get close to double performance for that.

You could also combine all 3 checks into one (if you are checking for spaces, you might as well check for newlines). But newline searching is pretty fast for byLine. iopipe could probably do a bit better ;)

In any case, I totally get what you are saying: 15 minutes to write a program that works, is easily understandable, but has performance issues for unusual input is pretty cool.

-Steve

On Tuesday, 28 January 2020 at 14:01:35 UTC, Mike Parker wrote: > Robert Schadek was inspired by a post he saw on Hacker News a while back showing an implementation of wc in Haskell totaling 80 lines. He decided he could do better in D. So he did. This post on the D blog shows what he came up with and also provides a brief introduction to ranges. > > > The blog: > https://dlang.org/blog/2020/01/28/wc-in-d-712-characters-without-a-single-branch/ > > Reddit: > https://www.reddit.com/r/programming/comments/ev5w91/wc_in_d_712_characters_without_a_single_branch/ > > It's also on Hacker News, but please don't post a direct link to it if you find it there. > > https://news.ycombinator.com/ Golfed down to 9 lines: import std; void main(string[] args) { args[1].File.byLine(Yes.keepTerminator) .map!(l => [l.byCodePoint.walkLength, l.splitter.walkLength]) .fold!((o, l) => [o[0]+1, o[1]+l[1], o[2]+l[0]])([0uL, 0, 0]) .writefln!"%(%u %) %s"(args[1]); } Sacrificed most of the readability of course, but I think it is functionally equivalent.

On Tuesday, 28 January 2020 at 14:01:35 UTC, Mike Parker wrote: > Robert Schadek was inspired by a post he saw on Hacker News a while back showing an implementation of wc in Haskell totaling 80 lines. He decided he could do better in D. So he did. This post on the D blog shows what he came up with and also provides a brief introduction to ranges. > > > The blog: > https://dlang.org/blog/2020/01/28/wc-in-d-712-characters-without-a-single-branch/ > unittest { > foreach(it; Iota(1,10).Filter(&isEven)) { > writeln(it); > } > } Today I learned you can use UFCS with struct literals/constructors.

On Tuesday, 28 January 2020 at 14:01:35 UTC, Mike Parker wrote: > [snip] BTW, while playing with a solution of my own [0] I noticed that both mine and Robert's version return different results for the following input [1]: > expected: '\u0003\u0000\u0000\u00005èÆÕL]\u0012|Î¾\u001a7«\u00052\u0011(ÐY\n<\u0010\u0000\u0000\u0000\u0000\u0000\u0000e!ßd/ñõì\f:z¦Î¦±ç·÷Í¢Ëß\u00076*\bñùC1ÉUÀé2\u001aÓB' }, To reproduce: $ curl -fsSL https://github.com/ethjs/ethjs-util/raw/e9aede668177b6d1ea62d741ba1c19402bc337b3/src/tests/test.index.js | sed '350q;d' > input $ ./robert input 1 4 190 input $ wc -lwm input 1 3 190 input [0]: > import std.algorithm : count, splitter; > import std.stdio : File, writefln; > import std.typecons : Yes; > > void main(string[] args) { > size_t lines, words, bytes; > foreach (line; args[1].File.byLine(Yes.keepTerminator)) { > lines++; > bytes += line.count; > words += line.splitter.count; > } > writefln!"%u %u %u %s"(lines, words, bytes, args[1]); > } [1]: https://github.com/ethjs/ethjs-util/blob/e9aede668177b6d1ea62d741ba1c19402bc337b3/src/tests/test.index.js#L350

On Tuesday, 28 January 2020 at 21:40:40 UTC, Petar Kirov [ZombineDev] wrote: > [snip] > >> import std.algorithm : count, splitter; >> import std.stdio : File, writefln; >> import std.typecons : Yes; >> >> void main(string[] args) { >> size_t lines, words, bytes; >> foreach (line; args[1].File.byLine(Yes.keepTerminator)) { >> lines++; >> bytes += line.count; >> words += line.splitter.count; >> } >> writefln!"%u %u %u %s"(lines, words, bytes, args[1]); >> } > > [1]: https://github.com/ethjs/ethjs-util/blob/e9aede668177b6d1ea62d741ba1c19402bc337b3/src/tests/test.index.js#L350 s/bytes/chars/

On Tuesday, 28 January 2020 at 14:01:35 UTC, Mike Parker wrote: > Robert Schadek was inspired by a post he saw on Hacker News a while back showing an implementation of wc in Haskell totaling 80 lines. I enjoyed the article overall, but I think this part lets it down a bit: > Is the Haskell wc faster? For big files, absolutely, but then it is using threads. For small files, GNU’s coreutils still beats the competition. At this stage my version is very likely IO bound, and it’s fast enough anyway. Admit it, "my version is very likely IO bound" is hand-wavey. The top comment on HN right now is pointing out that it doesn't make sense. It would be a better tech article if it stuck to the facts by either cutting out the hand-wavey bit (and just saying "it's fast enough anyway") or doing a test. (Quick and dirty way: running the code on a large file and seeing if it maxes out a CPU using the "top" command. More elegant way: using tools like these https://github.com/sysstat/sysstat/) Remember: plenty of us on the forums are happy to help make a D article more interesting.

On 1/28/20 5:11 PM, sarn wrote: > Admit it, "my version is very likely IO bound" is hand-wavey. The top comment on HN right now is pointing out that it doesn't make sense. I don't think it's i/o bound. Doing the split/walk-length and then walk-length again is likely making it 2x slower. Phobos' byLine is not the fastest you can get, but it's not a slouch either. It also tries to use as many libc tricks as it can. I would say the key point is that it's fast enough for most purposes, no matter what the actual performance problem is. -Steve

On Tuesday, 28 January 2020 at 21:40:40 UTC, Petar Kirov [ZombineDev] wrote: > > BTW, while playing with a solution of my own [0] I noticed that both mine and Robert's version return different [... snip] I found the culprit - iswspace. For more info see: https://www.mail-archive.com/bug-coreutils@gnu.org/msg30803.html

Forums