March 22, 2007

Vladimir Panteleev wrote:
> On Thu, 22 Mar 2007 01:40:15 +0200, Andrei Alexandrescu (See Website For Email) <SeeWebsiteForEmail@erdani.org> wrote:
> 
>> Essentially it's about information. The naive loop:
>>
>> while (readln(line)) {
>>    write(line);
>> }
>>
>> is guaranteed 100% to produce an accurate copy of its input. The version that chops lines looks like:
>>
>> while (readln(line)) {
>>    writeln(line);
>> }
> 
> I'd just like to say that the chosen naming convention seems a bit unintuitive to me out of the following reasons:
> 
> 1) it seems odd that what you read with readln(), you need to write with write() and not writeln().

I suppose it is a little, but I think that's more an issue with text IO in general; for instance, even *if* readln discarded the line ending, readln and writeln wouldn't be symmetric anyway!  If you expect them to be, then you're in for a nasty surprise :P

> 2) Pascal/Delphi/etc. have the ReadLn and WriteLn functions, but Pascal's ReadLn doesn't preserve line endings.

Well, that's Pascal/Delphi/etc., not D.

> 3) in my personal experience (of a number of smaller and larger console applications), it's much more often that I need to work with the contents of lines (without line endings), rather than with. If you need to copy data while preserving line endings, I would recommend using binary buffers for files - and I've no idea why would you use standard input/output for binary data anyway.

That's a valid point; I rarely need the line endings, that said, see [1] :)

> 4) it's much easier to add a line ending than to remove it.

Actually, it's not.  Removing a line ending is as simple as slicing the string.  *Adding* a line ending could involve a heap allocation, at least a full copy.

What's more, how can you be sure there was a line-ending there at all? What if it's the last line and it didn't have a line ending before EOF?

> Based on the above reasons, I would like to suggest to let readln() chop line endings, and perhaps have another function (getline?) which keeps them.

[1]

There have been a few times I've needed the line-ending, and it's a major pain when your IO library simply refuses to give it to you.  It should be that the call gives you the whole line *including* line-endings, but since stripping the line of its ending is so common there should be either another function to do that, or a nice shortcut to get it done.

Maybe we need readln and readlt for "read line and trim"...

</2c>

	-- Daniel

-- 
int getRandomNumber()
{
    return 4; // chosen by fair dice roll.
              // guaranteed to be random.
}

http://xkcd.com/

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/
March 22, 2007
torhu wrote:
> Unless a file is opened in binary mode, '\n' will be translated into '\r\n' on Windows.  And stdin, stdout, stderr is by default in ascii (not binary) mode.

But I don't think this is the case in Tango, so Cout(line)("\n") should also be
changed for the benchmarks.
March 22, 2007
Deewiant wrote:
> torhu wrote:
> 
>>Unless a file is opened in binary mode, '\n' will be translated into
>>'\r\n' on Windows.  And stdin, stdout, stderr is by default in ascii
>>(not binary) mode.
> 
> 
> But I don't think this is the case in Tango, so Cout(line)("\n") should also be
> changed for the benchmarks.

At the behest of andrei, Cin line-parsing now has an option to include the incoming line-terminator. That makes the "\n" somewhat redundant?
March 22, 2007
James Dennett wrote:
> I'm at a loss to understand why you would write what you
> did.  It seems to be a straw man, but maybe there was
> something else to it -- frustration that people assume
> that D must be slower than C++?

Maybe it is a bit of frustration on my part. I often run into people who, when faced with benchmarks showing that conventional D runs code faster than conventional C++, tell me in various ways that it can't be true. I must have:

1) written bad C++ code
2) lied
3) used a sabotaged C++ compiler
4) written some magic optimization that only works on that carefully crafted benchmark

So, I have some justification in saying what I did about the conventional wisdom of C++. I also know that the top tier of experienced C++ programmers are well aware such conventional wisdom is not true.

I have a lot of experience in making C++ code run fast. It doesn't come easy, it takes a lot of work back and forth with a profiler. It usually involves going around the C++ runtime library. That experience has certainly strongly influenced the design of D. I don't wish to have to write custom I/O just to get good I/O performance. I don't wish to keep doing all the clever string hacks trying to make 0 terminated strings fast.

I want the natural, straightforward D code to be (at least close to) the best performing way to implement an algorithm.
March 22, 2007
On Thu, 22 Mar 2007 11:03:12 +0200, Daniel Keep <daniel.keep.lists@gmail.com> wrote:

>> 4) it's much easier to add a line ending than to remove it.
>
> Actually, it's not.  Removing a line ending is as simple as slicing the string.  *Adding* a line ending could involve a heap allocation, at least a full copy.

I was actually talking about the complexity of the source, not the efficiency of the generated code.
When readln gives you the line with a line ending, you have three cases:
1) a CR/LF line ending (Windows)
2) LF line ending (Unix)
3) no line ending at all (EOF)

You'd need to account for every of these when removing the line endings - and write this code every time you're writing an app which just needs the contents of lines from standard input - which, as you have agreed, is quite common.

> What's more, how can you be sure there was a line-ending there at all? What if it's the last line and it didn't have a line ending before EOF?

IMHO, most tools which work with standard input don't really need to know if the last line has a line break at the end :)

-- 
Best regards,
  Vladimir                          mailto:thecybershadow@gmail.com
March 22, 2007
kris wrote:
> Deewiant wrote:
>> torhu wrote:
>>
>>> Unless a file is opened in binary mode, '\n' will be translated into '\r\n' on Windows.  And stdin, stdout, stderr is by default in ascii (not binary) mode.
>>
>>
>> But I don't think this is the case in Tango, so Cout(line)("\n")
>> should also be
>> changed for the benchmarks.
> 
> At the behest of andrei, Cin line-parsing now has an option to include the incoming line-terminator. That makes the "\n" somewhat redundant?

Only if you've got the latest SVN revision of Tango. If not, use tango.io.FileConst.NewlineString (side note: for easier access, perhaps Print.Eol should be public and assigned to this) in place of "\n".
March 22, 2007

Vladimir Panteleev wrote:
> On Thu, 22 Mar 2007 11:03:12 +0200, Daniel Keep <daniel.keep.lists@gmail.com> wrote:
> 
>>> 4) it's much easier to add a line ending than to remove it.
>> Actually, it's not.  Removing a line ending is as simple as slicing the string.  *Adding* a line ending could involve a heap allocation, at least a full copy.
> 
> I was actually talking about the complexity of the source, not the efficiency of the generated code.
> When readln gives you the line with a line ending, you have three cases:
> 1) a CR/LF line ending (Windows)
> 2) LF line ending (Unix)
> 3) no line ending at all (EOF)
> 
> You'd need to account for every of these when removing the line endings - and write this code every time you're writing an app which just needs the contents of lines from standard input - which, as you have agreed, is quite common.

import std.string;

auto line = readln().chomp();

:)

>> What's more, how can you be sure there was a line-ending there at all? What if it's the last line and it didn't have a line ending before EOF?
> 
> IMHO, most tools which work with standard input don't really need to know if the last line has a line break at the end :)
>

-- 
int getRandomNumber()
{
    return 4; // chosen by fair dice roll.
              // guaranteed to be random.
}

http://xkcd.com/

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/
March 22, 2007
Vladimir Panteleev wrote:

> When readln gives you the line with a line ending, you have three cases:
> 1) a CR/LF line ending (Windows)
> 2) LF line ending (Unix)
> 3) no line ending at all (EOF)

Actually it is even four:
4) CR line ending (Mac)

But that's just for files coming from the old Mac OS (9),
normally Mac OS X uses Unix linefeeds for line endings...

--anders
March 22, 2007
Anders F Björklund wrote:
> Vladimir Panteleev wrote:
> 
>> When readln gives you the line with a line ending, you have three cases:
>> 1) a CR/LF line ending (Windows)
>> 2) LF line ending (Unix)
>> 3) no line ending at all (EOF)
> 
> Actually it is even four:
> 4) CR line ending (Mac)
> 
> But that's just for files coming from the old Mac OS (9),
> normally Mac OS X uses Unix linefeeds for line endings...

I have some of these also. Legacy applications are not the most, but they work, and for me that's it.

Ciao
March 22, 2007
Andrei Alexandrescu (See Website For Email) wrote:
> 
> I passed a 31 MB text file (containing a dictionary that I'm using in my research) through each of the programs above. The output was set to /dev/null. I've ran the same program multiple times before the actual test, so everything is cached and the process becomes computationally-bound. Here are the results summed for 10 consecutive runs (averaged over 5 epochs):
> 
> 13.9s        Tango
> 6.6s        Perl
> 5.0s        std.stdio

For what it's worth, I created a Win32 version of the Unix 'time' command recently.  Not too complicated, but if anyone is interested, I have it here: http://www.invisibleduck.org/~sean/tmp/ptime.zip  It's a quick and dirty implementation, but works for how I typically use it.