stdio performance in tango, stdlib, and perl (page 6) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » stdio performance in tango, stdlib, and perl (page 6)

March 22, 2007

Re: stdio performance in tango, stdlib, and perl

Posted by Daniel Keep
in reply to Vladimir Panteleev

Daniel Keep

Posted in reply to Vladimir Panteleev

Vladimir Panteleev wrote:
> On Thu, 22 Mar 2007 01:40:15 +0200, Andrei Alexandrescu (See Website For Email) <SeeWebsiteForEmail@erdani.org> wrote:
> 
>> Essentially it's about information. The naive loop:
>>
>> while (readln(line)) {
>>    write(line);
>> }
>>
>> is guaranteed 100% to produce an accurate copy of its input. The version that chops lines looks like:
>>
>> while (readln(line)) {
>>    writeln(line);
>> }
> 
> I'd just like to say that the chosen naming convention seems a bit unintuitive to me out of the following reasons:
> 
> 1) it seems odd that what you read with readln(), you need to write with write() and not writeln().

I suppose it is a little, but I think that's more an issue with text IO in general; for instance, even *if* readln discarded the line ending, readln and writeln wouldn't be symmetric anyway!  If you expect them to be, then you're in for a nasty surprise :P

> 2) Pascal/Delphi/etc. have the ReadLn and WriteLn functions, but Pascal's ReadLn doesn't preserve line endings.

Well, that's Pascal/Delphi/etc., not D.

> 3) in my personal experience (of a number of smaller and larger console applications), it's much more often that I need to work with the contents of lines (without line endings), rather than with. If you need to copy data while preserving line endings, I would recommend using binary buffers for files - and I've no idea why would you use standard input/output for binary data anyway.

That's a valid point; I rarely need the line endings, that said, see [1] :)

> 4) it's much easier to add a line ending than to remove it.

Actually, it's not.  Removing a line ending is as simple as slicing the string.  *Adding* a line ending could involve a heap allocation, at least a full copy.

What's more, how can you be sure there was a line-ending there at all? What if it's the last line and it didn't have a line ending before EOF?

> Based on the above reasons, I would like to suggest to let readln() chop line endings, and perhaps have another function (getline?) which keeps them.

[1]

There have been a few times I've needed the line-ending, and it's a major pain when your IO library simply refuses to give it to you.  It should be that the call gives you the whole line *including* line-endings, but since stripping the line of its ending is so common there should be either another function to do that, or a nice shortcut to get it done.

Maybe we need readln and readlt for "read line and trim"...

</2c>

	-- Daniel

-- 
int getRandomNumber()
{
    return 4; // chosen by fair dice roll.
              // guaranteed to be random.
}

http://xkcd.com/

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/

March 22, 2007

Re: stdio performance in tango, stdlib, and perl

Posted by Deewiant
in reply to torhu

Deewiant

Posted in reply to torhu

torhu wrote:
> Unless a file is opened in binary mode, '\n' will be translated into '\r\n' on Windows.  And stdin, stdout, stderr is by default in ascii (not binary) mode.

But I don't think this is the case in Tango, so Cout(line)("\n") should also be
changed for the benchmarks.

March 22, 2007

Re: stdio performance in tango, stdlib, and perl

Posted by kris
in reply to Deewiant

kris

Posted in reply to Deewiant

Deewiant wrote:
> torhu wrote:
> 
>>Unless a file is opened in binary mode, '\n' will be translated into
>>'\r\n' on Windows.  And stdin, stdout, stderr is by default in ascii
>>(not binary) mode.
> 
> 
> But I don't think this is the case in Tango, so Cout(line)("\n") should also be
> changed for the benchmarks.

At the behest of andrei, Cin line-parsing now has an option to include the incoming line-terminator. That makes the "\n" somewhat redundant?

March 22, 2007

Re: stdio performance in tango, stdlib, and perl

Posted by Walter Bright
in reply to James Dennett

Walter Bright

Posted in reply to James Dennett

James Dennett wrote:
> I'm at a loss to understand why you would write what you
> did.  It seems to be a straw man, but maybe there was
> something else to it -- frustration that people assume
> that D must be slower than C++?

Maybe it is a bit of frustration on my part. I often run into people who, when faced with benchmarks showing that conventional D runs code faster than conventional C++, tell me in various ways that it can't be true. I must have:

1) written bad C++ code
2) lied
3) used a sabotaged C++ compiler
4) written some magic optimization that only works on that carefully crafted benchmark

So, I have some justification in saying what I did about the conventional wisdom of C++. I also know that the top tier of experienced C++ programmers are well aware such conventional wisdom is not true.

I have a lot of experience in making C++ code run fast. It doesn't come easy, it takes a lot of work back and forth with a profiler. It usually involves going around the C++ runtime library. That experience has certainly strongly influenced the design of D. I don't wish to have to write custom I/O just to get good I/O performance. I don't wish to keep doing all the clever string hacks trying to make 0 terminated strings fast.

I want the natural, straightforward D code to be (at least close to) the best performing way to implement an algorithm.

March 22, 2007

Re: stdio performance in tango, stdlib, and perl

Posted by Vladimir Panteleev
in reply to Daniel Keep

Vladimir Panteleev

Posted in reply to Daniel Keep

On Thu, 22 Mar 2007 11:03:12 +0200, Daniel Keep <daniel.keep.lists@gmail.com> wrote:

>> 4) it's much easier to add a line ending than to remove it.
>
> Actually, it's not.  Removing a line ending is as simple as slicing the string.  *Adding* a line ending could involve a heap allocation, at least a full copy.

I was actually talking about the complexity of the source, not the efficiency of the generated code.
When readln gives you the line with a line ending, you have three cases:
1) a CR/LF line ending (Windows)
2) LF line ending (Unix)
3) no line ending at all (EOF)

You'd need to account for every of these when removing the line endings - and write this code every time you're writing an app which just needs the contents of lines from standard input - which, as you have agreed, is quite common.

> What's more, how can you be sure there was a line-ending there at all? What if it's the last line and it didn't have a line ending before EOF?

IMHO, most tools which work with standard input don't really need to know if the last line has a line break at the end :)

-- 
Best regards,
  Vladimir                          mailto:thecybershadow@gmail.com

March 22, 2007

Re: stdio performance in tango, stdlib, and perl

Posted by Deewiant
in reply to kris

Deewiant

Posted in reply to kris

kris wrote:
> Deewiant wrote:
>> torhu wrote:
>>
>>> Unless a file is opened in binary mode, '\n' will be translated into '\r\n' on Windows.  And stdin, stdout, stderr is by default in ascii (not binary) mode.
>>
>>
>> But I don't think this is the case in Tango, so Cout(line)("\n")
>> should also be
>> changed for the benchmarks.
> 
> At the behest of andrei, Cin line-parsing now has an option to include the incoming line-terminator. That makes the "\n" somewhat redundant?

Only if you've got the latest SVN revision of Tango. If not, use tango.io.FileConst.NewlineString (side note: for easier access, perhaps Print.Eol should be public and assigned to this) in place of "\n".

March 22, 2007

Re: stdio performance in tango, stdlib, and perl

Posted by Daniel Keep
in reply to Vladimir Panteleev

Daniel Keep

Posted in reply to Vladimir Panteleev


Vladimir Panteleev wrote:
> On Thu, 22 Mar 2007 11:03:12 +0200, Daniel Keep <daniel.keep.lists@gmail.com> wrote:
> 
>>> 4) it's much easier to add a line ending than to remove it.
>> Actually, it's not.  Removing a line ending is as simple as slicing the string.  *Adding* a line ending could involve a heap allocation, at least a full copy.
> 
> I was actually talking about the complexity of the source, not the efficiency of the generated code.
> When readln gives you the line with a line ending, you have three cases:
> 1) a CR/LF line ending (Windows)
> 2) LF line ending (Unix)
> 3) no line ending at all (EOF)
> 
> You'd need to account for every of these when removing the line endings - and write this code every time you're writing an app which just needs the contents of lines from standard input - which, as you have agreed, is quite common.

import std.string;

auto line = readln().chomp();

:)

>> What's more, how can you be sure there was a line-ending there at all? What if it's the last line and it didn't have a line ending before EOF?
> 
> IMHO, most tools which work with standard input don't really need to know if the last line has a line break at the end :)
>

-- 
int getRandomNumber()
{
    return 4; // chosen by fair dice roll.
              // guaranteed to be random.
}

http://xkcd.com/

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/

March 22, 2007

Re: stdio performance in tango, stdlib, and perl

Posted by Anders F Björklund
in reply to Vladimir Panteleev

Anders F Björklund

Posted in reply to Vladimir Panteleev

Vladimir Panteleev wrote:

> When readln gives you the line with a line ending, you have three cases:
> 1) a CR/LF line ending (Windows)
> 2) LF line ending (Unix)
> 3) no line ending at all (EOF)

Actually it is even four:
4) CR line ending (Mac)

But that's just for files coming from the old Mac OS (9),
normally Mac OS X uses Unix linefeeds for line endings...

--anders

March 22, 2007

Re: stdio performance in tango, stdlib, and perl

Posted by Roberto Mariottini
in reply to Anders F Björklund

Roberto Mariottini

Posted in reply to Anders F Björklund

Anders F Björklund wrote:
> Vladimir Panteleev wrote:
> 
>> When readln gives you the line with a line ending, you have three cases:
>> 1) a CR/LF line ending (Windows)
>> 2) LF line ending (Unix)
>> 3) no line ending at all (EOF)
> 
> Actually it is even four:
> 4) CR line ending (Mac)
> 
> But that's just for files coming from the old Mac OS (9),
> normally Mac OS X uses Unix linefeeds for line endings...

I have some of these also. Legacy applications are not the most, but they work, and for me that's it.

Ciao

March 22, 2007

OT: ptime [WAS: Re: stdio performance in tango, stdlib, and perl]

Posted by Sean Kelly
in reply to Andrei Alexandrescu (See Website For Email)

Sean Kelly

Posted in reply to Andrei Alexandrescu (See Website For Email)

Andrei Alexandrescu (See Website For Email) wrote:
> 
> I passed a 31 MB text file (containing a dictionary that I'm using in my research) through each of the programs above. The output was set to /dev/null. I've ran the same program multiple times before the actual test, so everything is cached and the process becomes computationally-bound. Here are the results summed for 10 consecutive runs (averaged over 5 epochs):
> 
> 13.9s        Tango
> 6.6s        Perl
> 5.0s        std.stdio

For what it's worth, I created a Win32 version of the Unix 'time' command recently.  Not too complicated, but if anyone is interested, I have it here: http://www.invisibleduck.org/~sean/tmp/ptime.zip  It's a quick and dirty implementation, but works for how I typically use it.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation