Thread overview
[Issue 8084] New: std.stdio.ByLine is not true input range
May 11, 2012
Kenji Hara
May 12, 2012
timon.gehr@gmx.ch
May 12, 2012
Brad Roberts
May 12, 2012
Kenji Hara
May 12, 2012
Jonathan M Davis
May 13, 2012
Kenji Hara
May 11, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8084

           Summary: std.stdio.ByLine is not true input range
           Product: D
           Version: D2
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Phobos
        AssignedTo: nobody@puremagic.com
        ReportedBy: k.hara.pg@gmail.com


--- Comment #0 from Kenji Hara <k.hara.pg@gmail.com> 2012-05-11 10:55:07 PDT ---
version(A) and version(B) should output same result, but A is completely
broken.

import std.array;
import std.stdio;
void main()
{
    string fname = __FILE__;
    version(A)
    {
        auto lines = File(fname).byLine().array();
        foreach (ln; lines)
            writeln(ln);
    }
    version(B)
    {
        foreach (ln; File(fname).byLine())
            writeln(ln);
    }
}

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
May 11, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8084


Steven Schveighoffer <schveiguy@yahoo.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
                 CC|                            |schveiguy@yahoo.com
         Resolution|                            |WONTFIX


--- Comment #1 from Steven Schveighoffer <schveiguy@yahoo.com> 2012-05-11 12:10:30 PDT ---
In order for byLine to be efficient, it must reuse the buffer it's reading.  In order for the above code to be equivlent, it must dup every line, which is hugely inefficient if you aren't going to use them beyond the scope of the foreach statement.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
May 12, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8084


timon.gehr@gmx.ch changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |timon.gehr@gmx.ch


--- Comment #2 from timon.gehr@gmx.ch 2012-05-12 07:11:54 PDT ---
This is true, but byLine is not a true InputRange, therefore it is questionable whether or not it should expose an InputRange interface. int opApply(scope int delegate(scope string ln)) would be more descriptive.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
May 12, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8084


Brad Roberts <braddr@puremagic.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |braddr@puremagic.com


--- Comment #3 from Brad Roberts <braddr@puremagic.com> 2012-05-12 09:49:25 PDT ---
What part of the definition of an InputRange does it violate?

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
May 12, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8084



--- Comment #4 from Kenji Hara <k.hara.pg@gmail.com> 2012-05-12 09:55:49 PDT ---
(In reply to comment #1)
> In order for byLine to be efficient, it must reuse the buffer it's reading.  In order for the above code to be equivlent, it must dup every line, which is hugely inefficient if you aren't going to use them beyond the scope of the foreach statement.

OK. ByLine.front returns the slice of internal line buffer, so it specifies 'temporary' data. We should get dup in each iteration.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
May 12, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8084


Jonathan M Davis <jmdavisProg@gmx.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jmdavisProg@gmx.com


--- Comment #5 from Jonathan M Davis <jmdavisProg@gmx.com> 2012-05-12 11:38:24 PDT ---
> OK. ByLine.front returns the slice of internal line buffer, so it specifies
'temporary' data.

So? It's returning a reference type, so it could change on you. It _is_ potentially problematic in some cases, but I don't see how that makes it so that it isn't an input range

> We should get dup in each iteration.

It's specifically _avoiding_ that for efficiency purposes.

Now, maybe having ByLine's front return the same array with changes values rather than a new array makes it so that it doesn't work with a lot of range-based functions, effectively making it useless as a range, even if it _does_ follow the API. If that's the case, maybe we should make it use opApply for the loop case (having it use the current non-duping behavior) and have front do a dup (and if front as pure, that result could even be implicitly converted to an immutable result, avoiding having to dup it twice to get an immutable copy).

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
May 13, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8084



--- Comment #6 from Steven Schveighoffer <schveiguy@yahoo.com> 2012-05-12 19:38:27 PDT ---
ByLine is a true input range, it has empty, popFront and front.

If it was not a true input range, foreach would not work on it.

I also think isInputRange would pass.

The major issue I see with ByLine is that its default type of front is char[]. It really should be const(char)[].  And in fact, I don't think char[] should be possible.

Perhaps, this bug can be redone to address that.

(In reply to comment #4)
> OK. ByLine.front returns the slice of internal line buffer, so it specifies 'temporary' data. We should get dup in each iteration.

If you mean ByLine should dup for each iteration, that is extremely wasteful. What if you only need the data during the loop and not afterward?

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
May 13, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8084



--- Comment #7 from Kenji Hara <k.hara.pg@gmail.com> 2012-05-12 19:50:30 PDT ---
(In reply to comment #5)
(In reply to comment #6)
> (In reply to comment #4)
> > OK. ByLine.front returns the slice of internal line buffer, so it specifies 'temporary' data. We should get dup in each iteration.

I mean we should dup the front of ByLine in *user code* if it is used after whole iteration, not ByLine should return dup-ed front.

And I can agree that ByLine is *true* input range because it has empty, front, and popFront. Therefore, the summary was wrong.

Now I think it is correct that this issue was marked as 'resolved wontfix'.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------