Thread overview | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
|
February 02, 2010 [Issue 3763] New: std.stdio.readlnImpl absurdly inefficient | ||||
---|---|---|---|---|
| ||||
http://d.puremagic.com/issues/show_bug.cgi?id=3763 Summary: std.stdio.readlnImpl absurdly inefficient Product: D Version: 2.040 Platform: Other OS/Version: Windows Status: NEW Keywords: performance Severity: normal Priority: P2 Component: Phobos AssignedTo: nobody@puremagic.com ReportedBy: dsimcha@yahoo.com --- Comment #0 from David Simcha <dsimcha@yahoo.com> 2010-02-01 16:18:57 PST --- Apparently stdio.readln() copies the data it reads a ridiculous number of times and performs a ridiculous amount of heap allocations. The following program uses **gigabytes** of RAM. This issue came to my attention while reading in a huge file in the presence of large associative arrays that contained lots of false pointers. import std.stdio, core.memory; void main() { // Write 512 kilobytes out to a file on a single line. auto writeHandle = File("foo.txt", "wb"); auto contents = new char[512 * 1024]; contents[] = 'a'; writeHandle.writeln(contents); writeHandle.close; contents = null; GC.collect(); // Read it back with the GC disabled. GC.disable; auto readHandle = File("foo.txt"); auto firstLine = readHandle.readln(); stderr.writeln("Check task manager for memory usage, then press enter."); stdin.readln(); } Under a sane allocation scheme (geometric growth of the buffer), this program would only use about 1 MB of RAM plus overhead for stack space, etc., even with the GC disabled. (Derivation: 512 * 1024 == 2^19. 2^0 + 2^1 + ... + 2^19 == 2^20 - 1.) -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
February 02, 2010 [Issue 3763] std.stdio.readlnImpl absurdly inefficient | ||||
---|---|---|---|---|
| ||||
Posted in reply to David Simcha | http://d.puremagic.com/issues/show_bug.cgi?id=3763 Andrei Alexandrescu <andrei@metalanguage.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |andrei@metalanguage.com --- Comment #1 from Andrei Alexandrescu <andrei@metalanguage.com> 2010-02-01 17:04:56 PST --- Since you are on the dev team, could you please look into this? I take it it must be the array appends that are the culprit. If you're testing on Windows, I think the stuff is under the DIGITAL_MARS_STDIO version. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
February 02, 2010 [Issue 3763] std.stdio.readlnImpl absurdly inefficient | ||||
---|---|---|---|---|
| ||||
Posted in reply to David Simcha | http://d.puremagic.com/issues/show_bug.cgi?id=3763 --- Comment #2 from David Simcha <dsimcha@yahoo.com> 2010-02-01 20:37:40 PST --- Well, the problem is pretty clear. It's in std.stdio.readlnImpl(). In the unbuffered I/O routine in the DIGITAL_MARS_STDIO version block, we basically do the following in pseudocode: buf = readNext64Bytes(); buf ~= readRest(); // Recurse. We're effectively prepending to our result in 64-byte increments. Therefore, buf is reallocated once for every 64 bytes once we hit unbuffered I/O. Also note that the use of O(N) stack space causes stack overflows for very long lines (around 700 KB+). The problem is that, from looking at the rest of the code, all the other routines for different OS's and I/O libs are implemented the obvious way, using plain old array appending, which makes me believe that this one is different for a (unknown to me and probably relatively obscure) reason. Why was the unbuffered I/O routine in the DIGITAL_MARS_STDIO version block coded in such an odd way in the first place? -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
February 02, 2010 [Issue 3763] std.stdio.readlnImpl absurdly inefficient | ||||
---|---|---|---|---|
| ||||
Posted in reply to David Simcha | http://d.puremagic.com/issues/show_bug.cgi?id=3763 --- Comment #3 from Andrei Alexandrescu <andrei@metalanguage.com> 2010-02-01 22:30:44 PST --- (In reply to comment #2) > Well, the problem is pretty clear. It's in std.stdio.readlnImpl(). In the unbuffered I/O routine in the DIGITAL_MARS_STDIO version block, we basically do the following in pseudocode: > > buf = readNext64Bytes(); > buf ~= readRest(); // Recurse. That's Walter's code :o). > We're effectively prepending to our result in 64-byte increments. Ouch. > Therefore, > buf is reallocated once for every 64 bytes once we hit unbuffered I/O. Also > note that the use of O(N) stack space causes stack overflows for very long > lines (around 700 KB+). > > The problem is that, from looking at the rest of the code, all the other routines for different OS's and I/O libs are implemented the obvious way, using plain old array appending, which makes me believe that this one is different for a (unknown to me and probably relatively obscure) reason. Why was the unbuffered I/O routine in the DIGITAL_MARS_STDIO version block coded in such an odd way in the first place? I recall Walter wrote that code, and he wrote it in a hurry. We were under deadline pressure. I personally find that code extremely bulky and difficult to follow, and would like to see it simplified. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
February 04, 2010 [Issue 3763] std.stdio.readlnImpl absurdly inefficient and overflows stack | ||||
---|---|---|---|---|
| ||||
Posted in reply to David Simcha | http://d.puremagic.com/issues/show_bug.cgi?id=3763 --- Comment #4 from Andrei Alexandrescu <andrei@metalanguage.com> 2010-02-03 17:37:42 PST --- Love the summary, hate the code. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
February 21, 2010 [Issue 3763] std.stdio.readlnImpl absurdly inefficient and overflows stack | ||||
---|---|---|---|---|
| ||||
Posted in reply to David Simcha | http://d.puremagic.com/issues/show_bug.cgi?id=3763 David Simcha <dsimcha@yahoo.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED --- Comment #5 from David Simcha <dsimcha@yahoo.com> 2010-02-21 13:58:38 PST --- Changeset 1429. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
March 09, 2010 [Issue 3763] std.stdio.readlnImpl absurdly inefficient and overflows stack | ||||
---|---|---|---|---|
| ||||
Posted in reply to David Simcha | http://d.puremagic.com/issues/show_bug.cgi?id=3763 Walter Bright <bugzilla@digitalmars.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |bugzilla@digitalmars.com --- Comment #6 from Walter Bright <bugzilla@digitalmars.com> 2010-03-08 22:28:15 PST --- Fixed dmd 2.041 -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
Copyright © 1999-2021 by the D Language Foundation