Thread overview
[Issue 3763] New: std.stdio.readlnImpl absurdly inefficient
Feb 02, 2010
David Simcha
Feb 02, 2010
David Simcha
[Issue 3763] std.stdio.readlnImpl absurdly inefficient and overflows stack
Feb 21, 2010
David Simcha
Mar 09, 2010
Walter Bright
February 02, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=3763

           Summary: std.stdio.readlnImpl absurdly inefficient
           Product: D
           Version: 2.040
          Platform: Other
        OS/Version: Windows
            Status: NEW
          Keywords: performance
          Severity: normal
          Priority: P2
         Component: Phobos
        AssignedTo: nobody@puremagic.com
        ReportedBy: dsimcha@yahoo.com


--- Comment #0 from David Simcha <dsimcha@yahoo.com> 2010-02-01 16:18:57 PST ---
Apparently stdio.readln() copies the data it reads a ridiculous number of times and performs a ridiculous amount of heap allocations.  The following program uses **gigabytes** of RAM.  This issue came to my attention while reading in a huge file in the presence of large associative arrays that contained lots of false pointers.

import std.stdio, core.memory;

void main() {
    // Write 512 kilobytes out to a file on a single line.
    auto writeHandle = File("foo.txt", "wb");
    auto contents = new char[512 * 1024];
    contents[] = 'a';
    writeHandle.writeln(contents);
    writeHandle.close;
    contents = null;
    GC.collect();

    // Read it back with the GC disabled.
    GC.disable;
    auto readHandle = File("foo.txt");
    auto firstLine = readHandle.readln();

    stderr.writeln("Check task manager for memory usage, then press enter.");
    stdin.readln();
}

Under a sane allocation scheme (geometric growth of the buffer), this program would only use about 1 MB of RAM plus overhead for stack space, etc., even with the GC disabled.  (Derivation:  512 * 1024 == 2^19.  2^0 + 2^1 + ... + 2^19 == 2^20 - 1.)

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
February 02, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=3763


Andrei Alexandrescu <andrei@metalanguage.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |andrei@metalanguage.com


--- Comment #1 from Andrei Alexandrescu <andrei@metalanguage.com> 2010-02-01 17:04:56 PST ---
Since you are on the dev team, could you please look into this? I take it it must be the array appends that are the culprit. If you're testing on Windows, I think the stuff is under the DIGITAL_MARS_STDIO version.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
February 02, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=3763



--- Comment #2 from David Simcha <dsimcha@yahoo.com> 2010-02-01 20:37:40 PST ---
Well, the problem is pretty clear.  It's in std.stdio.readlnImpl().  In the unbuffered I/O routine in the DIGITAL_MARS_STDIO version block, we basically do the following in pseudocode:

buf = readNext64Bytes();
buf ~= readRest();  // Recurse.

We're effectively prepending to our result in 64-byte increments.  Therefore, buf is reallocated once for every 64 bytes once we hit unbuffered I/O.  Also note that the use of O(N) stack space causes stack overflows for very long lines (around 700 KB+).

The problem is that, from looking at the rest of the code, all the other routines for different OS's and I/O libs are implemented the obvious way, using plain old array appending, which makes me believe that this one is different for a (unknown to me and probably relatively obscure) reason.  Why was the unbuffered I/O routine in the DIGITAL_MARS_STDIO version block coded in such an odd way in the first place?

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
February 02, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=3763



--- Comment #3 from Andrei Alexandrescu <andrei@metalanguage.com> 2010-02-01 22:30:44 PST ---
(In reply to comment #2)
> Well, the problem is pretty clear.  It's in std.stdio.readlnImpl().  In the unbuffered I/O routine in the DIGITAL_MARS_STDIO version block, we basically do the following in pseudocode:
> 
> buf = readNext64Bytes();
> buf ~= readRest();  // Recurse.

That's Walter's code :o).

> We're effectively prepending to our result in 64-byte increments.

Ouch.

>  Therefore,
> buf is reallocated once for every 64 bytes once we hit unbuffered I/O.  Also
> note that the use of O(N) stack space causes stack overflows for very long
> lines (around 700 KB+).
> 
> The problem is that, from looking at the rest of the code, all the other routines for different OS's and I/O libs are implemented the obvious way, using plain old array appending, which makes me believe that this one is different for a (unknown to me and probably relatively obscure) reason.  Why was the unbuffered I/O routine in the DIGITAL_MARS_STDIO version block coded in such an odd way in the first place?

I recall Walter wrote that code, and he wrote it in a hurry. We were under deadline pressure. I personally find that code extremely bulky and difficult to follow, and would like to see it simplified.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
February 04, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=3763



--- Comment #4 from Andrei Alexandrescu <andrei@metalanguage.com> 2010-02-03 17:37:42 PST ---
Love the summary, hate the code.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
February 21, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=3763


David Simcha <dsimcha@yahoo.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


--- Comment #5 from David Simcha <dsimcha@yahoo.com> 2010-02-21 13:58:38 PST ---
Changeset 1429.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
March 09, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=3763


Walter Bright <bugzilla@digitalmars.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bugzilla@digitalmars.com


--- Comment #6 from Walter Bright <bugzilla@digitalmars.com> 2010-03-08 22:28:15 PST ---
Fixed dmd 2.041

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------