Thread overview | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
December 24, 2013 [Issue 11810] New: std.stdio.byLine/readln performance is very bad | ||||
---|---|---|---|---|
| ||||
https://d.puremagic.com/issues/show_bug.cgi?id=11810 Summary: std.stdio.byLine/readln performance is very bad Product: D Version: D2 Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Phobos AssignedTo: nobody@puremagic.com ReportedBy: peter.alexander.au@gmail.com --- Comment #0 from Peter Alexander <peter.alexander.au@gmail.com> 2013-12-24 04:34:29 PST --- std.stdio.readln (and hence byLine) use repeated calls to fgetc() to find the new line characters. This is a very inefficient way to read files (lots of per-byte overhead). I have a version of byLine that reads the files in 4kb chunks and then does the new line search. It is 6 times faster than byLine on my machine on a 10MB file (OSX 10.8.5, x64 2GHz MacBook). -- Configure issuemail: https://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
December 24, 2013 [Issue 11810] std.stdio.byLine/readln performance is very bad | ||||
---|---|---|---|---|
| ||||
Posted in reply to Peter Alexander | https://d.puremagic.com/issues/show_bug.cgi?id=11810 Dejan Lekic <dejan.lekic@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |dejan.lekic@gmail.com --- Comment #1 from Dejan Lekic <dejan.lekic@gmail.com> 2013-12-24 04:42:03 PST --- It has been discussed on IRC hundreds of times and we all agreed that if developer wants performance (s)he would read page-size chunks. That is why we have byChunk(size_t) in std.stdio, I believe. :) -- Configure issuemail: https://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
December 24, 2013 [Issue 11810] std.stdio.byLine/readln performance is very bad | ||||
---|---|---|---|---|
| ||||
Posted in reply to Peter Alexander | https://d.puremagic.com/issues/show_bug.cgi?id=11810 --- Comment #2 from Peter Alexander <peter.alexander.au@gmail.com> 2013-12-24 05:27:14 PST --- (In reply to comment #1) > It has been discussed on IRC hundreds of times and we all agreed that if > developer wants performance (s)he would read page-size chunks. That is why we > have byChunk(size_t) in std.stdio, I believe. :) OK, but: 1. It's non-trivial to implement byLine on top of byChunk. 2. Why would you want byLine to be slow? I'm not seeing the advantage of keeping byLine as it is. Fixing it doesn't change the API and has no downsides other than requiring a bit extra memory for the buffer. -- Configure issuemail: https://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
December 24, 2013 [Issue 11810] std.stdio.byLine/readln performance is very bad | ||||
---|---|---|---|---|
| ||||
Posted in reply to Peter Alexander | https://d.puremagic.com/issues/show_bug.cgi?id=11810 bearophile_hugs@eml.cc changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |bearophile_hugs@eml.cc --- Comment #3 from bearophile_hugs@eml.cc 2013-12-24 10:08:13 PST --- (In reply to comment #1) > It has been discussed on IRC hundreds of times and we all agreed that if > developer wants performance (s)he would read page-size chunks. That is why we > have byChunk(size_t) in std.stdio, I believe. :) This is not acceptable. byLine is a very commonly used function (far more than byChunk in script-like D programs) and it should be sufficiently fast. -- Configure issuemail: https://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
December 24, 2013 [Issue 11810] std.stdio.byLine/readln performance is very bad | ||||
---|---|---|---|---|
| ||||
Posted in reply to Peter Alexander | https://d.puremagic.com/issues/show_bug.cgi?id=11810 Artem Tarasov <lomereiter@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |lomereiter@gmail.com --- Comment #4 from Artem Tarasov <lomereiter@gmail.com> 2013-12-24 10:47:16 PST --- +1 There's also this implementation: http://permalink.gmane.org/gmane.comp.lang.d.general/117750 -- Configure issuemail: https://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
December 24, 2013 [Issue 11810] std.stdio.byLine/readln performance is very bad | ||||
---|---|---|---|---|
| ||||
Posted in reply to Peter Alexander | https://d.puremagic.com/issues/show_bug.cgi?id=11810 --- Comment #5 from bearophile_hugs@eml.cc 2013-12-24 13:11:31 PST --- Created an attachment (id=1305) byLineFast with small changes -- Configure issuemail: https://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
December 26, 2013 [Issue 11810] std.stdio.byLine/readln performance is very bad | ||||
---|---|---|---|---|
| ||||
Posted in reply to Peter Alexander | https://d.puremagic.com/issues/show_bug.cgi?id=11810 --- Comment #6 from hsteoh@quickfur.ath.cx 2013-12-26 14:23:27 PST --- The whole point of byLine is to be a convenient API for user code to read lines from a file. It should not be constrained to using fgetc() just because we can't predict line length in advance. It should be built on top of a buffering mechanism (maybe byChunk) so that it offers good performance to a very commonly-used user operation. I highly recommend Peter Alexander to submit the improved byLine implementation to Phobos. -- Configure issuemail: https://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
February 28, 2014 [Issue 11810] std.stdio.byLine/readln performance is very bad | ||||
---|---|---|---|---|
| ||||
Posted in reply to Peter Alexander | https://d.puremagic.com/issues/show_bug.cgi?id=11810 Nick Treleaven <ntrel-public@yahoo.co.uk> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |ntrel-public@yahoo.co.uk --- Comment #7 from Nick Treleaven <ntrel-public@yahoo.co.uk> 2014-02-28 08:55:17 PST --- (In reply to comment #5) > Created an attachment (id=1305) [details] > byLineFast with small changes BTW I'm working on porting this to std.stdio.byLine, I'll submit a PR when finished. -- Configure issuemail: https://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
Copyright © 1999-2021 by the D Language Foundation