June 09, 2015
https://issues.dlang.org/show_bug.cgi?id=14256

Andrei Alexandrescu <andrei@erdani.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Version|unspecified                 |D2

--
June 25, 2017
https://issues.dlang.org/show_bug.cgi?id=14256

Vladimir Panteleev <dlang-bugzilla@thecybershadow.net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
                 CC|                            |dlang-bugzilla@thecybershad
                   |                            |ow.net
         Resolution|---                         |WORKSFORME

--- Comment #10 from Vladimir Panteleev <dlang-bugzilla@thecybershadow.net> ---
>From a cursory benchmark, the D programs perform as well as or better than the
Python version for me, except for the splitLines version.

Note that splitLines also splits by Unicode line delimiters, not just \r and
\n, which is why it's slower. Replacing splitLines(s) with split(s, '\n') makes
the program much faster.

Please reopen if you think the problem persists.

--
June 25, 2017
https://issues.dlang.org/show_bug.cgi?id=14256

Ivan Kazmenko <gassa@mail.ru> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|WORKSFORME                  |---

--- Comment #11 from Ivan Kazmenko <gassa@mail.ru> ---
I've just checked with a more recent version (dmd 2.074.1), and the picture is
the same as two years ago here (Win64, file is on SSD), so reopened.

In the readln program, I allocated buffer of 1100000 bytes (the posted version had 10000, and so failed on test 3).

The current table:

Entry                                       test1     test2     test3 Number of lines                           1000000     10000       100 Length of each line                           100     10000   1000000
------------------------------------------------------------------------
Python 2.7.5 x32:                            0.68      0.44      0.36 Python 2.7.10 x64:                           0.55      0.36      0.33 DMD 2.074.1 byLine -m32:                     0.27      0.73      1.05 DMD 2.074.1 byLine -m64:                     1.45      1.31      1.43 DMD 2.074.1 readln -m32:                     0.25      0.63      1.00 DMD 2.074.1 readln -m64:                     1.55      1.54      1.46 DMD 2.074.1 read+splitLines -m32:            0.35      0.39      0.31 DMD 2.074.1 read+splitLines -m64:            0.41      0.31      0.32

The times of 1 second or above are clearly problematic.

In Python, string storage is low-level but number of lines affects the Pythonic part, so test1 is slower.

In D -m32, the byLine and readln versions are slower when the length of lines grows, possibly due to reallocation when constructing a string.  I'd say 3x slower than Python on large strings feels like too much.

In D -m64, the byLine and readln versions still take 1.3+ seconds on all tests, more than 2x slower than Python, which is sad.

As earlier, the read+splitLines version is the fastest on all tests in both -m32 and -m64, so speed is definitely possible, just not as out-of-the-box as the other two versions.

Ivan Kazmenko.

--
June 25, 2017
https://issues.dlang.org/show_bug.cgi?id=14256

--- Comment #12 from Vladimir Panteleev <dlang-bugzilla@thecybershadow.net> ---
Can you please test with -m32mscoff, in addition to -m32 and -m64?

I predict that the numbers are going to be very close to -m64. If so, then that indicates that we are limited by Microsoft's C runtime, in which case there is nothing that can be done short of rewriting std.stdio to not use C I/O.

--
June 25, 2017
https://issues.dlang.org/show_bug.cgi?id=14256

--- Comment #13 from Vladimir Panteleev <dlang-bugzilla@thecybershadow.net> ---
(In reply to Ivan Kazmenko from comment #11)
> The current table:

Also, if you have a script that generates this table, posting it here would be helpful.

--
June 28, 2017
https://issues.dlang.org/show_bug.cgi?id=14256

Jon Degenhardt <jrdemail2000-dlang@yahoo.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jrdemail2000-dlang@yahoo.co
                   |                            |m

--- Comment #14 from Jon Degenhardt <jrdemail2000-dlang@yahoo.com> ---
I've benchmarked File.byLine on OS X and Linux and they are quite fast on these platforms. I have not tested Windows, but have seen reports indicating it is quite slow there. I know also that performance on OS X poor prior to 2.068, when it was dramatically improved.

The improvement in 2.068 was via PR #3089 (https://github.com/dlang/phobos/pull/3089). This changed File.byLine to use getdelim() on platforms supporting it, including OS X and most Linux versions. It's not clear if a similar change was made for Windows.

This can be seen in part in the source file (https://github.com/dlang/phobos/blob/master/std/stdio.d) by searching for HAS_GETDELIM and NO_GETDELIM. Most platforms are listed as one or the other, Windows does not appear to be included and may still use a slow implementation.

--
June 28, 2017
https://issues.dlang.org/show_bug.cgi?id=14256

--- Comment #15 from Vladimir Panteleev <dlang-bugzilla@thecybershadow.net> ---
(In reply to Jon Degenhardt from comment #14)
> This can be seen in part in the source file (https://github.com/dlang/phobos/blob/master/std/stdio.d) by searching for HAS_GETDELIM and NO_GETDELIM. Most platforms are listed as one or the other, Windows does not appear to be included and may still use a slow implementation.

The DigitalMars and Microsoft C runtime versions have their own implementation of readlnImpl targeting those runtimes.

--
December 17, 2022
https://issues.dlang.org/show_bug.cgi?id=14256

Iain Buclaw <ibuclaw@gdcproject.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P1                          |P3

--
1 2
Next ›   Last »