Thread overview
Why doesn't std.file.exists follow symbolic links?
Jul 02, 2021
Jack Applegame
Jul 02, 2021
jfondren
Jul 02, 2021
Jack Applegame
Jul 02, 2021
jfondren
Jul 02, 2021
Jack Applegame
Jul 02, 2021
jfondren
Jul 02, 2021
Vladimir Panteleev
July 02, 2021
import std.stdio : writeln;
import std.file : exists, write, symlink, remove;

void main() {
    write("file.txt", "hello");
    symlink("file.txt", "link.txt");
    writeln(exists("link.txt")); // true
    remove("file.txt");
    writeln(exists("link.txt")); // true, why?
}

In other languages (including C++) similar functions follow symbolic links.

July 02, 2021

On Friday, 2 July 2021 at 12:09:20 UTC, Jack Applegame wrote:

>
import std.stdio : writeln;
import std.file : exists, write, symlink, remove;

void main() {
    write("file.txt", "hello");
    symlink("file.txt", "link.txt");
    writeln(exists("link.txt")); // true
    remove("file.txt");
    writeln(exists("link.txt")); // true, why?
}

In other languages (including C++) similar functions follow symbolic links.

Some thoughts:

  1. This is a dubious test anyway as the status of the file can change
    immediately after the test. If at all possible the better way to deal
    with a file system is to "ask for forgiveness" (gracefully react to
    errors) rather than "ask for permission" (use tests like this and then
    be surprised by an error that can still happen).

  2. Saying that a symlink "doesn't exist" when it clearly does exist
    could also be confusing.

  3. System software that's trying to make secure use of the filesystem
    should really be using the openat() and other *at syscalls with
    dir fds. The kernel APIs have developed a lot in the past few decades
    and one reason I prefer D over traditional 'scripting languages' is
    that those languages have all refused to track these developments, so
    e.g. D but not Perl can swap two files atomically (with renameat2),
    without worrying about race conditions where a process might notice
    that one of the files doesn't exist.

  4. Actually for the reasons above, if std.file.exists were freshly
    made I think it would also be completely fine to change it if indeed
    stat() is the more popular underlying call...

  5. ... but it's been like this since 2015. So people who wanted to
    know what the function actually did have already peeked into the
    library, saw it was lstat() on POSIX, and are now relying on that.

I would resolving this in the direction of clearly documenting the
interaction with symlinks.

July 02, 2021

On Friday, 2 July 2021 at 12:32:21 UTC, jfondren wrote:

>

On Friday, 2 July 2021 at 12:09:20 UTC, Jack Applegame wrote:

>
import std.stdio : writeln;
import std.file : exists, write, symlink, remove;

void main() {
    write("file.txt", "hello");
    symlink("file.txt", "link.txt");
    writeln(exists("link.txt")); // true
    remove("file.txt");
    writeln(exists("link.txt")); // true, why?
}

In other languages (including C++) similar functions follow symbolic links.

Some thoughts:

  1. This is a dubious test anyway as the status of the file can change
    immediately after the test. If at all possible the better way to deal
    with a file system is to "ask for forgiveness" (gracefully react to
    errors) rather than "ask for permission" (use tests like this and then
    be surprised by an error that can still happen).

This is a completely different topic. The above code is just a demonstration that std.file.exists does not follow symbolic links, and not the real code.

>
  1. Saying that a symlink "doesn't exist" when it clearly does exist
    could also be confusing.

I do not think so. The symbolic link should be transparent by default.

>
  1. System software that's trying to make secure use of the filesystem
    should really be using the openat() and other *at syscalls with
    dir fds. The kernel APIs have developed a lot in the past few decades
    and one reason I prefer D over traditional 'scripting languages' is
    that those languages have all refused to track these developments, so
    e.g. D but not Perl can swap two files atomically (with renameat2),
    without worrying about race conditions where a process might notice
    that one of the files doesn't exist.

This is also a completely different topic.

>
  1. Actually for the reasons above, if std.file.exists were freshly
    made I think it would also be completely fine to change it if indeed
    stat() is the more popular underlying call...

  2. ... but it's been like this since 2015. So people who wanted to know what the function actually did have already peeked into the library, saw it was lstat() on POSIX, and are now relying on that.

I would resolving this in the direction of clearly documenting the interaction with symlinks.

Maybe you're right. I don't know how to fix this correctly.

July 02, 2021

On Friday, 2 July 2021 at 12:09:20 UTC, Jack Applegame wrote:

>

In other languages (including C++) similar functions follow symbolic links.

To try to answer the "why":

https://github.com/dlang/phobos/pull/1142

Looks like eight years ago I thought that not using lstat would somehow break code in that circumstance, but it's difficult to figure out the details given that the sands of time have eroded the previous iterations of that pull request.

July 02, 2021

On Friday, 2 July 2021 at 14:11:22 UTC, Jack Applegame wrote:

>

This is a completely different topic. The above code is just a demonstration that std.file.exists does not follow symbolic links, and not the real code.

...

>

This is also a completely different topic.

The unifying topic is "there is a correct way to work with the
filesystem, and exists() isn't it, so who cares if languages vary on
the implementation of a wrong way to work with the filesystem?"

A naive user of any implementation of exists() is going to have a lot
more to worry about. A non-naive user of it will be aware of how it is
implemented.

July 02, 2021

On Friday, 2 July 2021 at 15:04:33 UTC, jfondren wrote:

>

On Friday, 2 July 2021 at 14:11:22 UTC, Jack Applegame wrote:

>

This is a completely different topic. The above code is just a demonstration that std.file.exists does not follow symbolic links, and not the real code.

...

>

This is also a completely different topic.

The unifying topic is "there is a correct way to work with the
filesystem, and exists() isn't it, so who cares if languages vary on
the implementation of a wrong way to work with the filesystem?"

I disagree. In many simple cases, this option is quite acceptable:

void main() {
    try {
        ...
        auto data = readData(file_name);
        ...
    } catch(Exception e) {
        // Fatal error
    }
}

auto readData(string file_name) {
    ...
    if(exists(file_name)) {
        ...
        read_file(file_name);
        ...
    } else {
        ...
        create_file(file_name);
        ...
    }
    ...
}

>

A naive user of any implementation of exists() is going to have a lot
more to worry about. A non-naive user of it will be aware of how it is
implemented.

I am a "naive user" of exists() in production and have not encountered any problems with it.

Why do people think that any program should be written as if it will work on the International Space Station?

July 02, 2021

On Friday, 2 July 2021 at 15:30:43 UTC, Jack Applegame wrote:

>

I am a "naive user" of exists() in production and have not encountered any problems with it.

OK, let's add a third category:

  1. someone who uses exists() without an awareness of race conditions.

(I argue this person has more to worry about than symlink resolution.)

  1. someone who uses exists() with an acceptance of race conditions, but whose
    familiarity with similar functionality from other languages results in an
    unpleasant surprise with D.

(I argue that this is a documentation problem. Incidentally, stuff like
https://github.com/dlang/phobos/blob/master/std/file.d#L1957 should really just
be in the generated phobos docs. That's useful information and very much
like the topic at hand. Perhaps there are also people who expected exists()
to be implemented with access)

  1. someone who distrusts these abstractions of the POSIX API and therefore
    doesn't use them without confirming exactly how they're implemented.

(I've offended you by presenting this as the "non-naive" counterpoint to #1.)