Thread overview
File.byLine ought to return dups?
Jun 04, 2010
Graham Fawcett
Jun 04, 2010
Graham Fawcett
Jun 04, 2010
Adam Ruppe
June 04, 2010
Hi folks,

I expected the following program to print the lines of a file in reverse:

    // bad.d
    import std.stdio;
    import std.range;

    void main() {
      auto f = File("bad.d");
      foreach(char[] line; retro(array(f.byLine())))
        writeln(line);
    }

However, this produces very unusual output: fragments of the same lines are printed repeatedly. I suspect it's because the byLine() 'generator' is not dup'ing the arrays it reads from the file.

This works as expected:

    // good.d
    import std.stdio;
    import std.range;

    Retro!(char[][]) retroLines(File f) {
     char[][] lines;
      foreach(line; f.byLine())
        lines ~= line.dup;         // note the .dup
      return retro(lines);
     }

    void main() {
      auto f = File("good.d");
      foreach(line; retroLines(f))
        writeln(line);
    }

If you remove the '.dup', then this behaves badly as well.

So is this a bug in File.byLine, or am I just using it badly? :)

Thanks,
Graham
June 04, 2010
On Fri, 04 Jun 2010 15:27:03 -0400, Graham Fawcett <fawcett@uwindsor.ca> wrote:

> Hi folks,
>
> I expected the following program to print the lines of a file in
> reverse:
>
>     // bad.d
>     import std.stdio;
>     import std.range;
>    void main() {
>       auto f = File("bad.d");
>       foreach(char[] line; retro(array(f.byLine())))
>         writeln(line);
>     }
>
> However, this produces very unusual output: fragments of the same
> lines are printed repeatedly. I suspect it's because the byLine()
> 'generator' is not dup'ing the arrays it reads from the file.
>
> This works as expected:
>
>     // good.d
>     import std.stdio;
>     import std.range;
>    Retro!(char[][]) retroLines(File f) {
>      char[][] lines;
>       foreach(line; f.byLine())
>         lines ~= line.dup;         // note the .dup
>       return retro(lines);
>      }
>    void main() {
>       auto f = File("good.d");
>       foreach(line; retroLines(f))
>         writeln(line);
>     }
>
> If you remove the '.dup', then this behaves badly as well.
>
> So is this a bug in File.byLine, or am I just using it badly? :)

The latter.  File is re-using the buffer for each line, so you are seeing the data get overwritten.  This is for performance reasons.  Not everyone wants incur heap allocations for every line of a file ;)  As you showed, it's possible to get the desired behavior if you need it.  The reverse would be impossible.

Now, that being said, a nice addition would be to create a duper range that allows you to do one expression:

foreach(char[] line; retro(array(duper(f.byLine()))))

-Steve
June 04, 2010
Looking at the code, it appears to be by design, though not explictly documented - it probably should be.

Something to note is if you specifically ask for a string in the foreach, it will refuse to compile - the compiler knows it is a mutable char[] too.
June 04, 2010
Sorry for the double-post to .announce -- I had deleted my .announce post, but obviously not thoroughly enough. I'll follow up on the list.

Graham


On Fri, 04 Jun 2010 15:41:43 -0400, Steven Schveighoffer wrote:

> On Fri, 04 Jun 2010 15:27:03 -0400, Graham Fawcett <fawcett@uwindsor.ca> wrote:
> 
>> Hi folks,
>>
>> I expected the following program to print the lines of a file in reverse:
>>
>>     // bad.d
>>     import std.stdio;
>>     import std.range;
>>    void main() {
>>       auto f = File("bad.d");
>>       foreach(char[] line; retro(array(f.byLine())))
>>         writeln(line);
>>     }
>>
>> However, this produces very unusual output: fragments of the same lines are printed repeatedly. I suspect it's because the byLine() 'generator' is not dup'ing the arrays it reads from the file.
>>
>> This works as expected:
>>
>>     // good.d
>>     import std.stdio;
>>     import std.range;
>>    Retro!(char[][]) retroLines(File f) {
>>      char[][] lines;
>>       foreach(line; f.byLine())
>>         lines ~= line.dup;         // note the .dup
>>       return retro(lines);
>>      }
>>    void main() {
>>       auto f = File("good.d");
>>       foreach(line; retroLines(f))
>>         writeln(line);
>>     }
>>
>> If you remove the '.dup', then this behaves badly as well.
>>
>> So is this a bug in File.byLine, or am I just using it badly? :)
> 
> The latter.  File is re-using the buffer for each line, so you are seeing the data get overwritten.  This is for performance reasons.  Not everyone wants incur heap allocations for every line of a file ;)  As you showed, it's possible to get the desired behavior if you need it. The reverse would be impossible.
> 
> Now, that being said, a nice addition would be to create a duper range that allows you to do one expression:
> 
> foreach(char[] line; retro(array(duper(f.byLine()))))
> 
> -Steve