File.byLine should return dups?

Jun 04, 2010

Graham Fawcett

Jun 04, 2010

Steven Schveighoffer

Jun 04, 2010

Jun 04, 2010

Jun 04, 2010

Jun 06, 2010

Jun 04, 2010

Hi folks, I expected this program to print the lines of a file in reverse order: // bad.d import std.stdio; import std.range; void main() { auto f = File("bad.d"); foreach(char[] line; retro(array(f.byLine()))) writeln(line); } ...but instead it prints a jumble of line fragments. I suspect that it's because byLine() is not dup'ing the arrays it's reading from the file? This version works: // good.d import std.stdio; import std.range; Retro!(char[][]) retroLines(File f) { char[][] lines; foreach(line; f.byLine()) lines ~= line.dup; // note the .dup! return retro(lines); } void main() { auto f = File("good.d"); foreach(line; retroLines(f)) writeln(line); } ..but if you remove the '.dup' then it prints a similar mess. So, is there a bug in byLine(), or am I just using it wrong? Best, Graham

On Fri, 04 Jun 2010 15:35:06 -0400, Graham Fawcett <fawcett@uwindsor.ca> wrote: > Hi folks, > > I expected this program to print the lines of a file in reverse order: > > // bad.d > import std.stdio; > import std.range; > void main() { > auto f = File("bad.d"); > foreach(char[] line; retro(array(f.byLine()))) > writeln(line); > } > > ...but instead it prints a jumble of line fragments. I suspect that > it's because byLine() is not dup'ing the arrays it's reading from the > file? > > This version works: > > // good.d > import std.stdio; > import std.range; > Retro!(char[][]) retroLines(File f) { > char[][] lines; > foreach(line; f.byLine()) > lines ~= line.dup; // note the .dup! > return retro(lines); > } > void main() { > auto f = File("good.d"); > foreach(line; retroLines(f)) > writeln(line); > } > > ..but if you remove the '.dup' then it prints a similar mess. > > So, is there a bug in byLine(), or am I just using it wrong? The latter. File is re-using the buffer for each line, so you are seeing the data get overwritten. This is for performance reasons. Not everyone wants incur heap allocations for every line of a file As you showed, it's possible to get the desired behavior if you need it. The reverse would be impossible. Now, that being said, a nice addition would be to create a duper range that allows you to do one expression: foreach(char[] line; retro(array(duper(f.byLine())))) -Steve Copied from my .announce reply :)

Hi Steven, On Fri, 04 Jun 2010 15:43:18 -0400, Steven Schveighoffer wrote: > On Fri, 04 Jun 2010 15:35:06 -0400, Graham Fawcett <fawcett@uwindsor.ca> wrote: > >> Hi folks, >> >> I expected this program to print the lines of a file in reverse order: >> >> // bad.d >> import std.stdio; >> import std.range; >> void main() { >> auto f = File("bad.d"); >> foreach(char[] line; retro(array(f.byLine()))) >> writeln(line); >> } >> >> ...but instead it prints a jumble of line fragments. I suspect that it's because byLine() is not dup'ing the arrays it's reading from the file? >> >> This version works: >> >> // good.d >> import std.stdio; >> import std.range; >> Retro!(char[][]) retroLines(File f) { >> char[][] lines; >> foreach(line; f.byLine()) >> lines ~= line.dup; // note the .dup! >> return retro(lines); >> } >> void main() { >> auto f = File("good.d"); >> foreach(line; retroLines(f)) >> writeln(line); >> } >> >> ..but if you remove the '.dup' then it prints a similar mess. >> >> So, is there a bug in byLine(), or am I just using it wrong? > > The latter. File is re-using the buffer for each line, so you are seeing the data get overwritten. This is for performance reasons. Not everyone wants incur heap allocations for every line of a file As you showed, it's possible to get the desired behavior if you need it. The reverse would be impossible. > Thanks for the response. Yes, it makes mode to favour performance for the common case, and as you say it's not hard to resolve this issue. > Now, that being said, a nice addition would be to create a duper range that allows you to do one expression: > > foreach(char[] line; retro(array(duper(f.byLine())))) Yes -- duper (and iduper) for the win! Great idea. > -Steve > > Copied from my .announce reply :) Sorry again for the .announce posting. :P Graham

On 06/04/2010 02:58 PM, Graham Fawcett wrote: [snip] > Thanks for the response. Yes, it makes mode to favour performance for > the common case, and as you say it's not hard to resolve this issue. > >> Now, that being said, a nice addition would be to create a duper range >> that allows you to do one expression: >> >> foreach(char[] line; retro(array(duper(f.byLine())))) > > Yes -- duper (and iduper) for the win! Great idea. I wonder what we can do to automatically reject such programs. Andrei