Thread overview
[Issue 8219] New: File.writeln is slow
Sep 11, 2012
Denis Shelomovskij
June 11, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8219

           Summary: File.writeln is slow
           Product: D
           Version: D2
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Phobos
        AssignedTo: nobody@puremagic.com
        ReportedBy: bearophile_hugs@eml.cc


--- Comment #0 from bearophile_hugs@eml.cc 2012-06-11 05:20:08 PDT ---
A little benchmark to show the low performance of the
std.stdio.File().writeln() with numbers. The programs accept N from the command
line, and just save a textual file (opened as binary file) that contains
numbers from 0 to N-1.

The write1 program is the basic one that uses File.writeln(). The successive programs are for comparison.

The write2 program is equally basic, but it uses C functions, using the same D compiler and compilation arguments, and it's almost 12 times faster.

The write3 program is a basic Java program, it's more than 17 times faster than write1.

The write4 is a test to write a very fast D program, it's more than 30 times faster than write2 (it's not the fastest possible program because the DMD compiler is currently not able to perform the small dividend optimization. GCC and Clang are able to replace the small dividend with some faster operations).


DMD 2.060alpha, Windows, -O -release -inline
Java 1.7.0_03-b05

Timings, N = 2_000_000, best of 3, seconds:
  write1: 9.67
  write2: 0.83
  write3: 0.56 (Java)
  write4: 0.32

- - - - - - - - - - - - - - - - - - - - - - - - -

// write1
import std.stdio, std.conv;

void main(string[] args) {
    auto f = File("numbers.txt", "wb");
    foreach (i; 0 .. args[1].to!int())
        f.writeln(i);
    f.close();
}

- - - - - - - - - - - - - - - - - - - - - - - - -

// write2
import core.stdc.stdio, std.conv;

void main(string[] args) {
    auto f = fopen("numbers.txt", "wb");
    foreach (i; 0 .. args[1].to!int())
        fprintf(f, "%d\n", i);
    fclose(f);
}

- - - - - - - - - - - - - - - - - - - - - - - - -

// write3
import java.io.BufferedWriter;
import java.io.FileWriter;
import java.io.IOException;

class write3 {
    public static void main(String args[]) {
        if (args.length != 1)
            System.exit(-1);
        int count = Integer.parseInt(args[0]);

        try {
            BufferedWriter bw = new BufferedWriter(new
FileWriter("numbers.txt"));

            for (int i = 0; i < count; i++) {
                bw.write(i);
                bw.write("\n");
            }

            bw.close();
        } catch (IOException e) {
            System.exit(-2);
        }
    }
}

- - - - - - - - - - - - - - - - - - - - - - - - -

// write4
import core.stdc.stdio, std.conv, std.typetuple;

void saveNumbers(uint buf_size)(FILE* f, in uint n, in uint start) nothrow {
    __gshared static char[buf_size] buffer = void;
    __gshared static immutable string digits = "0123456789";
    char* ptr = buffer.ptr;

    foreach (k; start .. start + n) {
        uint v = k;
        if (v >= 0 && v < 10) {
            *ptr = digits[v];
            ptr++;
        } else {
            if (v < 0) {
                v = -v;
                *ptr = '-';
                ptr++;
            }

            char* start_ptr = ptr;

            foreach (_; TypeTuple!(0,1,2,3,4,5,6,7,8,9)) {
                // DMD doesn't know how to perform a fast division
                // or modulus when the quotient is a small integer
                *ptr = (v % 10) + '0';
                v /= 10;
                ptr++;
                if (!v) break;
            }

            immutable size_t i = ptr - start_ptr;
            for (size_t j = 0; j < i / 2; ++j) {
                immutable aux = start_ptr[j];
                start_ptr[j] = start_ptr[i - j - 1];
                start_ptr[i - j - 1] = aux;
            }
        }
        *ptr = '\n';
        ptr++;
    }

    *ptr = '\0';
    fputs(buffer.ptr, f);
}

void main(in string[] args) {
    auto f = fopen("numbers.txt", "wb");
    if (args.length != 2 || f == null) return;
    immutable uint N = to!uint(args[1]);

    enum uint max_int_len = 12;
    enum uint buf_size = 1_000_000;
    enum uint M = buf_size / max_int_len;

    uint count;
    foreach (_0; 0 .. N / M) {
        saveNumbers!buf_size(f, M, count);
        count += M;
    }
    saveNumbers!buf_size(f, N % M, count);

    fclose(f);
}

- - - - - - - - - - - - - - - - - - - - - - - - -

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
September 11, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8219


Denis Shelomovskij <verylonglogin.reg@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |verylonglogin.reg@gmail.com


--- Comment #1 from Denis Shelomovskij <verylonglogin.reg@gmail.com> 2012-09-11 20:35:29 MSD ---
Looks like Marco Righele was the first with this issue: http://forum.dlang.org/thread/mailman.257.1281550251.13841.digitalmars-d@puremagic.com

Fix added to https://github.com/D-Programming-Language/phobos/pull/778

From commit message:
There is absolutely no need to explicitly flush a stream on new line because it
is expected only for console output which is line buffered or unbuffered by
default.

P.S.
I have no idea how such performance killer was added but with a fix of this
issue we will finally get non-“considered out-dated” file IO suitable for
simple programs with text output.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
October 04, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8219



--- Comment #2 from github-bugzilla@puremagic.com 2012-10-04 09:51:18 PDT ---
Commit pushed to master at https://github.com/D-Programming-Language/phobos

https://github.com/D-Programming-Language/phobos/commit/e281d8deaa8e6adbc42ee5f6dd69d303a7d67b58 Fix Issue 8219 - File.writeln is slow

There is absolutely no need to explicitly flush a stream on new line because it is expected only for console output which is line buffered or unbuffered by default.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
October 04, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8219



--- Comment #3 from bearophile_hugs@eml.cc 2012-10-04 10:54:38 PDT ---
Now the runtime for the write1 D benchmark is 1.12 seconds, it's a significant improvement.

When N = 10_000_000 the D version takes about 5.61 seconds, while the Java program takes (including the start of the JVM) about 2.33 seconds, so it's about 2.4 times faster.

Do you want me/suggest me to close this issue?

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
October 04, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8219


Andrei Alexandrescu <andrei@metalanguage.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |andrei@metalanguage.com


--- Comment #4 from Andrei Alexandrescu <andrei@metalanguage.com> 2012-10-04 11:08:53 PDT ---
Let's keep this open, we should achieve parity with the best out there. Thanks bearophile and Denis for your work on this.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
October 04, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8219



--- Comment #5 from Andrei Alexandrescu <andrei@metalanguage.com> 2012-10-04 14:44:55 PDT ---
bearophile, I committed https://github.com/D-Programming-Language/phobos/commit/28e1c98d69a0258992068f4410bdd62c46f5474a could you please compare again against everything?

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
October 04, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8219



--- Comment #6 from bearophile_hugs@eml.cc 2012-10-04 15:50:29 PDT ---
(In reply to comment #5)

> could you please compare again against everything?

I have re-compiled druntime and Phobos (but not dmd, because in the meantime it has not changed), and my run-time for the write1 with N=2_000_000 is about the same. The run-time with N=10_000_000 seems a little lower, 5.55 seconds (but changes with similar magnitude are sometimes just noise, so it's not significant).

Those timings are just one data point (32 bit Windows, Core 2). If we you want to optimize textual output, then probably more data points are needed, with a Core i7, on 64 bits, and on a Linux. And probably a profiling is useful.

-------------------

The optimization stages of GCC and Clang contain an algorithm that turn small dividend (division and modulus) operations in a faster combination of simpler operations. This is handy in all kind of code, and it's also quite useful in code like:

auto c = cast(uint) (v % 10);
v /= 10;

Some info:
http://www.hackersdelight.org/divcMore.pdf

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
October 06, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8219



--- Comment #7 from bearophile_hugs@eml.cc 2012-10-06 06:09:35 PDT ---
(In reply to comment #5)
> could you please compare again against everything?

Andrei, unfortunately I think there is a bug in your code, this prints "1-":


import std.stdio: writeln;
void main() {
    writeln(-1);
}


I think Phobos needs more unittests :-(

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
October 07, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8219



--- Comment #8 from Andrei Alexandrescu <andrei@metalanguage.com> 2012-10-06 19:52:20 PDT ---
Thanks, Brad fixed it (and added a unittest). I'm not that bright.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------