March 22, 2007 Re: stdio performance in tango, stdlib, and perl | ||||
---|---|---|---|---|
| ||||
Posted in reply to Vladimir Panteleev | Vladimir Panteleev wrote: > On Thu, 22 Mar 2007 01:40:15 +0200, Andrei Alexandrescu (See Website For Email) <SeeWebsiteForEmail@erdani.org> wrote: > >> Essentially it's about information. The naive loop: >> >> while (readln(line)) { >> write(line); >> } >> >> is guaranteed 100% to produce an accurate copy of its input. The version that chops lines looks like: >> >> while (readln(line)) { >> writeln(line); >> } > > I'd just like to say that the chosen naming convention seems a bit unintuitive to me out of the following reasons: > > 1) it seems odd that what you read with readln(), you need to write with write() and not writeln(). I suppose it is a little, but I think that's more an issue with text IO in general; for instance, even *if* readln discarded the line ending, readln and writeln wouldn't be symmetric anyway! If you expect them to be, then you're in for a nasty surprise :P > 2) Pascal/Delphi/etc. have the ReadLn and WriteLn functions, but Pascal's ReadLn doesn't preserve line endings. Well, that's Pascal/Delphi/etc., not D. > 3) in my personal experience (of a number of smaller and larger console applications), it's much more often that I need to work with the contents of lines (without line endings), rather than with. If you need to copy data while preserving line endings, I would recommend using binary buffers for files - and I've no idea why would you use standard input/output for binary data anyway. That's a valid point; I rarely need the line endings, that said, see [1] :) > 4) it's much easier to add a line ending than to remove it. Actually, it's not. Removing a line ending is as simple as slicing the string. *Adding* a line ending could involve a heap allocation, at least a full copy. What's more, how can you be sure there was a line-ending there at all? What if it's the last line and it didn't have a line ending before EOF? > Based on the above reasons, I would like to suggest to let readln() chop line endings, and perhaps have another function (getline?) which keeps them. [1] There have been a few times I've needed the line-ending, and it's a major pain when your IO library simply refuses to give it to you. It should be that the call gives you the whole line *including* line-endings, but since stripping the line of its ending is so common there should be either another function to do that, or a nice shortcut to get it done. Maybe we need readln and readlt for "read line and trim"... </2c> -- Daniel -- int getRandomNumber() { return 4; // chosen by fair dice roll. // guaranteed to be random. } http://xkcd.com/ v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP http://hackerkey.com/ |
March 22, 2007 Re: stdio performance in tango, stdlib, and perl | ||||
---|---|---|---|---|
| ||||
Posted in reply to torhu | torhu wrote:
> Unless a file is opened in binary mode, '\n' will be translated into '\r\n' on Windows. And stdin, stdout, stderr is by default in ascii (not binary) mode.
But I don't think this is the case in Tango, so Cout(line)("\n") should also be
changed for the benchmarks.
|
March 22, 2007 Re: stdio performance in tango, stdlib, and perl | ||||
---|---|---|---|---|
| ||||
Posted in reply to Deewiant | Deewiant wrote:
> torhu wrote:
>
>>Unless a file is opened in binary mode, '\n' will be translated into
>>'\r\n' on Windows. And stdin, stdout, stderr is by default in ascii
>>(not binary) mode.
>
>
> But I don't think this is the case in Tango, so Cout(line)("\n") should also be
> changed for the benchmarks.
At the behest of andrei, Cin line-parsing now has an option to include the incoming line-terminator. That makes the "\n" somewhat redundant?
|
March 22, 2007 Re: stdio performance in tango, stdlib, and perl | ||||
---|---|---|---|---|
| ||||
Posted in reply to James Dennett | James Dennett wrote:
> I'm at a loss to understand why you would write what you
> did. It seems to be a straw man, but maybe there was
> something else to it -- frustration that people assume
> that D must be slower than C++?
Maybe it is a bit of frustration on my part. I often run into people who, when faced with benchmarks showing that conventional D runs code faster than conventional C++, tell me in various ways that it can't be true. I must have:
1) written bad C++ code
2) lied
3) used a sabotaged C++ compiler
4) written some magic optimization that only works on that carefully crafted benchmark
So, I have some justification in saying what I did about the conventional wisdom of C++. I also know that the top tier of experienced C++ programmers are well aware such conventional wisdom is not true.
I have a lot of experience in making C++ code run fast. It doesn't come easy, it takes a lot of work back and forth with a profiler. It usually involves going around the C++ runtime library. That experience has certainly strongly influenced the design of D. I don't wish to have to write custom I/O just to get good I/O performance. I don't wish to keep doing all the clever string hacks trying to make 0 terminated strings fast.
I want the natural, straightforward D code to be (at least close to) the best performing way to implement an algorithm.
|
March 22, 2007 Re: stdio performance in tango, stdlib, and perl | ||||
---|---|---|---|---|
| ||||
Posted in reply to Daniel Keep | On Thu, 22 Mar 2007 11:03:12 +0200, Daniel Keep <daniel.keep.lists@gmail.com> wrote: >> 4) it's much easier to add a line ending than to remove it. > > Actually, it's not. Removing a line ending is as simple as slicing the string. *Adding* a line ending could involve a heap allocation, at least a full copy. I was actually talking about the complexity of the source, not the efficiency of the generated code. When readln gives you the line with a line ending, you have three cases: 1) a CR/LF line ending (Windows) 2) LF line ending (Unix) 3) no line ending at all (EOF) You'd need to account for every of these when removing the line endings - and write this code every time you're writing an app which just needs the contents of lines from standard input - which, as you have agreed, is quite common. > What's more, how can you be sure there was a line-ending there at all? What if it's the last line and it didn't have a line ending before EOF? IMHO, most tools which work with standard input don't really need to know if the last line has a line break at the end :) -- Best regards, Vladimir mailto:thecybershadow@gmail.com |
March 22, 2007 Re: stdio performance in tango, stdlib, and perl | ||||
---|---|---|---|---|
| ||||
Posted in reply to kris | kris wrote: > Deewiant wrote: >> torhu wrote: >> >>> Unless a file is opened in binary mode, '\n' will be translated into '\r\n' on Windows. And stdin, stdout, stderr is by default in ascii (not binary) mode. >> >> >> But I don't think this is the case in Tango, so Cout(line)("\n") >> should also be >> changed for the benchmarks. > > At the behest of andrei, Cin line-parsing now has an option to include the incoming line-terminator. That makes the "\n" somewhat redundant? Only if you've got the latest SVN revision of Tango. If not, use tango.io.FileConst.NewlineString (side note: for easier access, perhaps Print.Eol should be public and assigned to this) in place of "\n". |
March 22, 2007 Re: stdio performance in tango, stdlib, and perl | ||||
---|---|---|---|---|
| ||||
Posted in reply to Vladimir Panteleev | Vladimir Panteleev wrote: > On Thu, 22 Mar 2007 11:03:12 +0200, Daniel Keep <daniel.keep.lists@gmail.com> wrote: > >>> 4) it's much easier to add a line ending than to remove it. >> Actually, it's not. Removing a line ending is as simple as slicing the string. *Adding* a line ending could involve a heap allocation, at least a full copy. > > I was actually talking about the complexity of the source, not the efficiency of the generated code. > When readln gives you the line with a line ending, you have three cases: > 1) a CR/LF line ending (Windows) > 2) LF line ending (Unix) > 3) no line ending at all (EOF) > > You'd need to account for every of these when removing the line endings - and write this code every time you're writing an app which just needs the contents of lines from standard input - which, as you have agreed, is quite common. import std.string; auto line = readln().chomp(); :) >> What's more, how can you be sure there was a line-ending there at all? What if it's the last line and it didn't have a line ending before EOF? > > IMHO, most tools which work with standard input don't really need to know if the last line has a line break at the end :) > -- int getRandomNumber() { return 4; // chosen by fair dice roll. // guaranteed to be random. } http://xkcd.com/ v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP http://hackerkey.com/ |
March 22, 2007 Re: stdio performance in tango, stdlib, and perl | ||||
---|---|---|---|---|
| ||||
Posted in reply to Vladimir Panteleev | Vladimir Panteleev wrote:
> When readln gives you the line with a line ending, you have three cases:
> 1) a CR/LF line ending (Windows)
> 2) LF line ending (Unix)
> 3) no line ending at all (EOF)
Actually it is even four:
4) CR line ending (Mac)
But that's just for files coming from the old Mac OS (9),
normally Mac OS X uses Unix linefeeds for line endings...
--anders
|
March 22, 2007 Re: stdio performance in tango, stdlib, and perl | ||||
---|---|---|---|---|
| ||||
Posted in reply to Anders F Björklund | Anders F Björklund wrote:
> Vladimir Panteleev wrote:
>
>> When readln gives you the line with a line ending, you have three cases:
>> 1) a CR/LF line ending (Windows)
>> 2) LF line ending (Unix)
>> 3) no line ending at all (EOF)
>
> Actually it is even four:
> 4) CR line ending (Mac)
>
> But that's just for files coming from the old Mac OS (9),
> normally Mac OS X uses Unix linefeeds for line endings...
I have some of these also. Legacy applications are not the most, but they work, and for me that's it.
Ciao
|
March 22, 2007 OT: ptime [WAS: Re: stdio performance in tango, stdlib, and perl] | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu (See Website For Email) | Andrei Alexandrescu (See Website For Email) wrote: > > I passed a 31 MB text file (containing a dictionary that I'm using in my research) through each of the programs above. The output was set to /dev/null. I've ran the same program multiple times before the actual test, so everything is cached and the process becomes computationally-bound. Here are the results summed for 10 consecutive runs (averaged over 5 epochs): > > 13.9s Tango > 6.6s Perl > 5.0s std.stdio For what it's worth, I created a Win32 version of the Unix 'time' command recently. Not too complicated, but if anyone is interested, I have it here: http://www.invisibleduck.org/~sean/tmp/ptime.zip It's a quick and dirty implementation, but works for how I typically use it. |
Copyright © 1999-2021 by the D Language Foundation