Thread overview
How do I iteratively replace lines in a file?
Mar 19, 2011
Andrej Mitrovic
Mar 20, 2011
Jonathan M Davis
Mar 20, 2011
Ali Çehreli
Mar 20, 2011
Kai Meyer
Mar 20, 2011
Andrej Mitrovic
Mar 20, 2011
Kai Meyer
March 19, 2011
I'm trying to do something like the following:

File inputfile;
foreach (string name; dirEntries(r".\subdir\", SpanMode.shallow))
{
    if (!(isFile(name) && getExt(name) == "d"))
    {
        continue;
    }

    inputfile = File(name, "a+");

    foreach (line; inputfile.byLine)
    {
        if (line == "import foo.d")
        {
            inputfile.write("import bar.d");  // or ideally `line = "import bar.d"`
        }
    }
}

That obviously won't work. I think I might need to use the `fseek` function to keep track of where I am in the file, or something like that. File I/O in D is no fun..
March 20, 2011
On Saturday 19 March 2011 16:51:19 Andrej Mitrovic wrote:
> I'm trying to do something like the following:
> 
> File inputfile;
> foreach (string name; dirEntries(r".\subdir\", SpanMode.shallow))
> {
>     if (!(isFile(name) && getExt(name) == "d"))
>     {
>         continue;
>     }
> 
>     inputfile = File(name, "a+");
> 
>     foreach (line; inputfile.byLine)
>     {
>         if (line == "import foo.d")
>         {
>             inputfile.write("import bar.d");  // or ideally `line = "import
> bar.d"` }
>     }
> }
> 
> That obviously won't work. I think I might need to use the `fseek` function to keep track of where I am in the file, or something like that. File I/O in D is no fun..

I think that most of the D file I/O stuff is built around the idea of reading in the whole file and writing out a whole file rather than editing a file - certainly the range-based stuff works that way at the moment. You can probably use std.stdio.File.seek to seek to the appropriate position and then write there, but I believe that all of the range-based stuff currently is really only for reading a file.

Personally, I only ever read in whole files and write out whole files without any kind of interleaving, but while that generally works great, it doesn't scale once you start dealing with large files. It's likely an area that D's file I/O could use some improvement. That may or may not need to be part of the stream stuff though.

- Jonathan M Davis
March 20, 2011
On 03/19/2011 04:51 PM, Andrej Mitrovic wrote:
> I'm trying to do something like the following:
>
> File inputfile;
> foreach (string name; dirEntries(r".\subdir\", SpanMode.shallow))
> {
>      if (!(isFile(name)&&  getExt(name) == "d"))
>      {
>          continue;
>      }
>
>      inputfile = File(name, "a+");
>
>      foreach (line; inputfile.byLine)
>      {
>          if (line == "import foo.d")
>          {
>              inputfile.write("import bar.d");  // or ideally `line = "import bar.d"`
>          }
>      }
> }
>
> That obviously won't work. I think I might need to use the `fseek` function to keep track of where I am in the file, or something like that.

That's not a good idea with text files.

Even for binary files, the file must have a well defined format. It is not possible to insert or remove bytes from a file due to low level reasons. The file systems that I am aware of don't provide such interfaces.

And writing after fseek would overwrite existing data.

Like Jonathan M Davis said, the best is to read from the source and write to the destination.

Ali

March 20, 2011
On 03/19/2011 05:51 PM, Andrej Mitrovic wrote:
> I'm trying to do something like the following:
>
> File inputfile;
> foreach (string name; dirEntries(r".\subdir\", SpanMode.shallow))
> {
>      if (!(isFile(name)&&  getExt(name) == "d"))
>      {
>          continue;
>      }
>
>      inputfile = File(name, "a+");
>
>      foreach (line; inputfile.byLine)
>      {
>          if (line == "import foo.d")
>          {
>              inputfile.write("import bar.d");  // or ideally `line = "import bar.d"`
>          }
>      }
> }
>
> That obviously won't work. I think I might need to use the `fseek` function to keep track of where I am in the file, or something like that. File I/O in D is no fun..


The only problem with your approach that a "line" is an abstract concept. In a filesystem, there are only blocks of bytes. When you write (flush) a byte to a file, the file transaction is actually an entire block at a time (ext3 defaults to a 4k block, for example.) Lines are just an array of bytes. When dealing with (relatively) fast memory, modifying a line is pretty transparent. If you open a 1GB file and add bytes at the very beginning, the filesystem is quite likely to write out the entire file again.

I would suggest you write out to a temporary file, and then move the file on top of the original file.

foreach(name ...)
{
  inputfile = File(name, "r");
  outputfile = File("/tmp/" ~ name, "a");
  foreach(line ...)
  {
    do something to line
    outputfile.write(line);
  }
  outputfile.close();
  rename("/tmp" ~ name, name);
}

This will allow you to manipulate line by line, but it won't be in-place. This is the type of approach that a lot of text editors take, and a very common work around. If you were to encounter a language that allows you to read and write lines iteratively and in-place like this in a file, I'll bet you they are writing your changes to a temp file, and moving the file over the top of the original at the end (perhaps when you close()).

March 20, 2011
Yeah, I've already done exactly as you guys proposed. Note however that `inputfile` and `outputfile` should be declared inside the foreach loop. Either that or you have to call `close()` explicitly. If you don't do that, file handles don't get released, and you'll eventually get back a stdio error such as "too many file handles opened". You could loose files this way. I know this because it just happened yesterday while testing. :p

Anywho, I needed a quick script to append a semicolon to import lines because I managed to screw up some files when using sed to replace some lines. It's a quick hack but worked for me:

import std.stdio;
import std.file;
import std.stdio;
import std.path;
import std.string;

void main()
{
    File inputfile;
    File outputfile;
    string newname;

    foreach (string name; dirEntries(r".", SpanMode.breadth))
    {
        if (!(isFile(name) && getExt(name) == "d"))
        {
            continue;
        }

        newname = name.idup ~ "backup";
        if (exists(newname))
        {
            remove(newname);
        }

        rename(name, newname);

        inputfile = File(newname, "r");
        outputfile = File(name, "w");

        foreach (line; inputfile.byLine)
        {
            if ((line.startsWith("private import") ||
line.startsWith("import")) &&
                !line.endsWith(",") &&
                !line.endsWith(";"))
            {
                outputfile.writeln(line ~ ";");
            }
            else
            {
                outputfile.writeln(line);
            }
        }

        inputfile.close();
        outputfile.close();
    }

    foreach (string name; dirEntries(r".", SpanMode.breadth))
    {
        if (getExt(name) == "dbackup")
        {
            remove(name);
        }
    }
}
March 20, 2011
On 03/20/2011 09:46 AM, Andrej Mitrovic wrote:
> Yeah, I've already done exactly as you guys proposed. Note however
> that `inputfile` and `outputfile` should be declared inside the
> foreach loop. Either that or you have to call `close()` explicitly. If
> you don't do that, file handles don't get released, and you'll
> eventually get back a stdio error such as "too many file handles
> opened". You could loose files this way. I know this because it just
> happened yesterday while testing. :p
>
> Anywho, I needed a quick script to append a semicolon to import lines
> because I managed to screw up some files when using sed to replace
> some lines. It's a quick hack but worked for me:
>
> import std.stdio;
> import std.file;
> import std.stdio;
> import std.path;
> import std.string;
>
> void main()
> {
>      File inputfile;
>      File outputfile;
>      string newname;
>
>      foreach (string name; dirEntries(r".", SpanMode.breadth))
>      {
>          if (!(isFile(name)&&  getExt(name) == "d"))
>          {
>              continue;
>          }
>
>          newname = name.idup ~ "backup";
>          if (exists(newname))
>          {
>              remove(newname);
>          }
>
>          rename(name, newname);
>
>          inputfile = File(newname, "r");
>          outputfile = File(name, "w");
>
>          foreach (line; inputfile.byLine)
>          {
>              if ((line.startsWith("private import") ||
> line.startsWith("import"))&&
>                  !line.endsWith(",")&&
>                  !line.endsWith(";"))
>              {
>                  outputfile.writeln(line ~ ";");
>              }
>              else
>              {
>                  outputfile.writeln(line);
>              }
>          }
>
>          inputfile.close();
>          outputfile.close();
>      }
>
>      foreach (string name; dirEntries(r".", SpanMode.breadth))
>      {
>          if (getExt(name) == "dbackup")
>          {
>              remove(name);
>          }
>      }
> }


Funny, I would have just fixed it with sed.

sed -ir 's/^(import.*)/\1;' *.d

Infact, I think sed is actually a great example of an application that you apply a search and replace on a per-line basis. I'd be curious if somebody knows how their '-i' flag (for in-place) works. Based on the man page, I'll bet it opens the source read-only, and opens the destination write-only like Andrej's example.
       -i[SUFFIX], --in-place[=SUFFIX]
              edit files in place (makes backup if extension supplied)

The SUFFIX option just renames the original instead of deleting at the end.