Thread overview | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
July 25, 2015 Read text file fast, how? | ||||
---|---|---|---|---|
| ||||
Attachments:
| Hi! I am trying to port a program I have written earlier to D. My previous versions are in C++ and Python. I was hoping that a D version would be similar in speed to the C++ version, rather than similar to the Python version. But currently it isn't. Part of the problem may be that I haven't learned the idiomatic way to do things in D. One such thing is perhaps: how do I read large text files in an efficient manner in D? Currently I have created a little test-program that does the same job as the UNIX-command "wc -lc", i.e. counting the number of lines and characters in a file. The timings I get in different languages are: D: 15s C++: 1.1s Python: 3.7s Perl: 2.9s The central loop in my D program looks like: foreach (line; f.byLine) { nlines += 1; nchars += line.length + 1; } I have also tried another variant with this inner loop: char[] line; while(f.readln(line)) { nlines += 1; nchars += line.length; } but in both cases this D program is much slower than any of the others in C++/Python/Perl. I don't understand what can cause this dramatic difference to C++, and a factor 4 to Python. My D programs are built with DMD 2.067.1 on MacOS Yosemite, using the flags "-O -release". Is there something I can do to make the program run faster, and still be "idiomatic D"? (I append the whole program for reference) Regards, /Johan Holmberg ======================================= import std.stdio; import std.file; void main(string[] argv) { foreach (fname; argv[1..$]) { auto f = File(fname); int nlines = 0; int nchars = 0; foreach (line; f.byLine) { nlines += 1; nchars += line.length + 1; } writeln(nlines, "\t", nchars, "\t", fname); } } ======================================= |
July 25, 2015 Re: Read text file fast, how? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Johan Holmberg | On 7/25/15 8:19 AM, Johan Holmberg via Digitalmars-d wrote: > Hi! > > I am trying to port a program I have written earlier to D. My previous > versions are in C++ and Python. I was hoping that a D version would be > similar in speed to the C++ version, rather than similar to the Python > version. But currently it isn't. > > Part of the problem may be that I haven't learned the idiomatic way to > do things in D. One such thing is perhaps: how do I read large text > files in an efficient manner in D? > > Currently I have created a little test-program that does the same job as > the UNIX-command "wc -lc", i.e. counting the number of lines and > characters in a file. The timings I get in different languages are: > > D: 15s > C++: 1.1s > Python: 3.7s > Perl: 2.9s I think this harkens back to the problem discussed here: http://stackoverflow.com/questions/28922323/improving-line-wise-i-o-operations-in-d/29153508 As I discuss there, the performance bug has been fixed for 2.068. With your code: $ time wc -l <(repeat 1000000 echo hello) 1000000 /dev/fd/11 wc -l <(repeat 1000000 echo hello) 0.11s user 2.35s system 54% cpu 4.529 total $ time ./test.d <(repeat 1000000 echo hello) 1000000 6000000 /dev/fd/11 ./test.d <(repeat 1000000 echo hello) 0.73s user 1.76s system 64% cpu 3.870 total The compilation was flag free (no -O -inline -release etc). Andrei |
July 25, 2015 Re: Read text file fast, how? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu Attachments:
| On Sat, Jul 25, 2015 at 7:14 PM, Andrei Alexandrescu via Digitalmars-d < digitalmars-d@puremagic.com> wrote:
> On 7/25/15 8:19 AM, Johan Holmberg via Digitalmars-d wrote:
>
>> Hi!
>>
>> I am trying to port a program I have written earlier to D. My previous versions are in C++ and Python. I was hoping that a D version would be similar in speed to the C++ version, rather than similar to the Python version. But currently it isn't.
>>
>> Part of the problem may be that I haven't learned the idiomatic way to do things in D. One such thing is perhaps: how do I read large text files in an efficient manner in D?
>>
>> Currently I have created a little test-program that does the same job as the UNIX-command "wc -lc", i.e. counting the number of lines and characters in a file. The timings I get in different languages are:
>>
>> D: 15s
>> C++: 1.1s
>> Python: 3.7s
>> Perl: 2.9s
>>
>
> I think this harkens back to the problem discussed here:
>
>
> http://stackoverflow.com/questions/28922323/improving-line-wise-i-o-operations-in-d/29153508
>
> As I discuss there, the performance bug has been fixed for 2.068. With your code:
>
> $ time wc -l <(repeat 1000000 echo hello)
> 1000000 /dev/fd/11
> wc -l <(repeat 1000000 echo hello) 0.11s user 2.35s system 54% cpu 4.529
> total
> $ time ./test.d <(repeat 1000000 echo hello)
> 1000000 6000000 /dev/fd/11
> ./test.d <(repeat 1000000 echo hello) 0.73s user 1.76s system 64% cpu
> 3.870 total
>
> The compilation was flag free (no -O -inline -release etc).
>
>
> Andrei
>
>
Thanks, my question seems like a carbon copy of the Stack Overflow article :) Somehow I had missed it when googling.
I download a dmd 2.068 beta, and re-tried with my input file: now the D program takes 1.6s (a 10x improvement).
/johan
|
July 25, 2015 Re: Read text file fast, how? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Johan Holmberg | On 7/25/15 1:53 PM, Johan Holmberg via Digitalmars-d wrote:
> Thanks, my question seems like a carbon copy of the Stack Overflow
> article :) Somehow I had missed it when googling.
>
> I download a dmd 2.068 beta, and re-tried with my input file: now the D
> program takes 1.6s (a 10x improvement).
Great, though it still seems to be behind the C++ version, which is a bummer. -- Andrei
|
July 25, 2015 Re: Read text file fast, how? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On Saturday, 25 July 2015 at 20:12:26 UTC, Andrei Alexandrescu wrote:
> On 7/25/15 1:53 PM, Johan Holmberg via Digitalmars-d wrote:
>> Thanks, my question seems like a carbon copy of the Stack Overflow
>> article :) Somehow I had missed it when googling.
>>
>> I download a dmd 2.068 beta, and re-tried with my input file: now the D
>> program takes 1.6s (a 10x improvement).
>
> Great, though it still seems to be behind the C++ version, which is a bummer. -- Andrei
Do you happen to have a link to that source where you fixed it.
I feel like contributing some reading effort today.
|
July 25, 2015 Re: Read text file fast, how? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Brandon Ragland | On Saturday, 25 July 2015 at 22:40:55 UTC, Brandon Ragland wrote: > On Saturday, 25 July 2015 at 20:12:26 UTC, Andrei Alexandrescu wrote: >> On 7/25/15 1:53 PM, Johan Holmberg via Digitalmars-d wrote: >>> Thanks, my question seems like a carbon copy of the Stack Overflow >>> article :) Somehow I had missed it when googling. >>> >>> I download a dmd 2.068 beta, and re-tried with my input file: now the D >>> program takes 1.6s (a 10x improvement). >> >> Great, though it still seems to be behind the C++ version, which is a bummer. -- Andrei > > Do you happen to have a link to that source where you fixed it. > > I feel like contributing some reading effort today. https://github.com/D-Programming-Language/phobos/pull/3089 |
July 26, 2015 Re: Read text file fast, how? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu Attachments:
| On Sat, Jul 25, 2015 at 10:12 PM, Andrei Alexandrescu via Digitalmars-d < digitalmars-d@puremagic.com> wrote:
> On 7/25/15 1:53 PM, Johan Holmberg via Digitalmars-d wrote:
>
>> Thanks, my question seems like a carbon copy of the Stack Overflow article :) Somehow I had missed it when googling.
>>
>> I download a dmd 2.068 beta, and re-tried with my input file: now the D program takes 1.6s (a 10x improvement).
>>
>
> Great, though it still seems to be behind the C++ version, which is a bummer. -- Andrei
>
>
My C++ program was actually doing C-style IO via <stdio.h>. I didn't think about the distinction C/C++ when reporting the earlier numbers.
If I switch to full C++ style: <fstream> + <string> + C++ version of getline(), then the C++-solution is even slower than Python: 5.2s. I think it is the C++ libraries of Clang on MacOS Yosemite that are slow.
This prompted me to re-run the tests on a Linux machine (Ubuntu 14.04), still with the same input file, a text file with 7M lines and total size of 466MB:
C++ with <stdio.h> style IO: 0.40s
C++ with <fstream> style IO: 0.31s
D 2.067 1.75s
D 2.068 beta 2: 0.69s
Perl: 1.49s
Python: 1.86s
So on Ubuntu, the C++ <fstream> version was clearly best. And the improvement in DMD 2.068 beta "only" a factor of 2.5 from 2.067.
/johan
|
July 26, 2015 Re: Read text file fast, how? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Johan Holmberg | On 7/26/15 10:35 AM, Johan Holmberg via Digitalmars-d wrote:
>
> On Sat, Jul 25, 2015 at 10:12 PM, Andrei Alexandrescu via Digitalmars-d
> <digitalmars-d@puremagic.com <mailto:digitalmars-d@puremagic.com>> wrote:
>
> On 7/25/15 1:53 PM, Johan Holmberg via Digitalmars-d wrote:
>
> Thanks, my question seems like a carbon copy of the Stack Overflow
> article :) Somehow I had missed it when googling.
>
> I download a dmd 2.068 beta, and re-tried with my input file:
> now the D
> program takes 1.6s (a 10x improvement).
>
>
> Great, though it still seems to be behind the C++ version, which is
> a bummer. -- Andrei
>
>
> My C++ program was actually doing C-style IO via <stdio.h>. I didn't
> think about the distinction C/C++ when reporting the earlier numbers.
>
> If I switch to full C++ style: <fstream> + <string> + C++ version of
> getline(), then the C++-solution is even slower than Python: 5.2s. I
> think it is the C++ libraries of Clang on MacOS Yosemite that are slow.
>
> This prompted me to re-run the tests on a Linux machine (Ubuntu 14.04),
> still with the same input file, a text file with 7M lines and total size
> of 466MB:
>
> C++ with <stdio.h> style IO: 0.40s
> C++ with <fstream> style IO: 0.31s
> D 2.067 1.75s
> D 2.068 beta 2: 0.69s
> Perl: 1.49s
> Python: 1.86s
>
> So on Ubuntu, the C++ <fstream> version was clearly best. And the
> improvement in DMD 2.068 beta "only" a factor of 2.5 from 2.067.
>
> /johan
I think we should investigate this and bring performance to par. Anyone interested? -- Andrei
|
July 26, 2015 Re: Read text file fast, how? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Johan Holmberg | On Sunday, 26 July 2015 at 14:36:09 UTC, Johan Holmberg wrote:
> On Sat, Jul 25, 2015 at 10:12 PM, Andrei Alexandrescu via Digitalmars-d < digitalmars-d@puremagic.com> wrote:
>
>>[...]
> My C++ program was actually doing C-style IO via <stdio.h>. I didn't think about the distinction C/C++ when reporting the earlier numbers.
>
> [...]
It would be interesting to see numbers for the stdio.h code in D since it should be easy to translate and would rule it issues with compiler vs library.
|
July 26, 2015 Re: Read text file fast, how? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On Sunday, 26 July 2015 at 15:36:29 UTC, Andrei Alexandrescu wrote: > On 7/26/15 10:35 AM, Johan Holmberg via Digitalmars-d wrote: >> [...] > > I think we should investigate this and bring performance to par. Anyone interested? -- Andrei Here's the link to the fstream libstc++ source for GNU /linux (Ubuntu / Debian) https://gcc.gnu.org/onlinedocs/libstdc++/libstdc++-html-USERS-4.0/fstream-source.html Not to sure who's all familiar with it but it uses the basic_streambuf underneath. |
Copyright © 1999-2021 by the D Language Foundation