Jump to page: 1 2 3
Thread overview
[Issue 3848] New: functions in std.file don't take symbolic links into account
Feb 24, 2010
Jonathan M Davis
Feb 24, 2010
Jonathan M Davis
Feb 25, 2010
Jonathan M Davis
Feb 26, 2010
Jonathan M Davis
Feb 26, 2010
Jonathan M Davis
Feb 28, 2010
Jonathan M Davis
Mar 01, 2010
Sobirari Muhomori
Mar 03, 2010
Jonathan M Davis
Mar 03, 2010
Jonathan M Davis
Mar 03, 2010
Jonathan M Davis
Mar 03, 2010
Jonathan M Davis
Jun 22, 2010
Jonathan M Davis
Jun 22, 2010
Jonathan M Davis
Jun 22, 2010
Jonathan M Davis
Aug 08, 2010
Jonathan M Davis
Aug 08, 2010
Jonathan M Davis
Aug 08, 2010
Don
Aug 08, 2010
Jonathan M Davis
Sep 24, 2010
Jonathan M Davis
Sep 24, 2010
Jonathan M Davis
Oct 30, 2010
Jonathan M Davis
Oct 30, 2010
Jonathan M Davis
Jan 19, 2011
Jonathan M Davis
February 24, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=3848

           Summary: functions in std.file don't take symbolic links into
                    account
           Product: D
           Version: 2.040
          Platform: Other
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Phobos
        AssignedTo: nobody@puremagic.com
        ReportedBy: jmdavisProg@gmail.com


--- Comment #0 from Jonathan M Davis <jmdavisProg@gmail.com> 2010-02-24 00:24:31 PST ---
Created an attachment (id=572)
Changes for std.file.d to make it work with symlinks.

At present, if you call isfile() or isdir() on a symbolic link, it reports on what the link points to rather than the link itself. Presumably this is fine on Windows since as far as I know, Windows doesn't have symbolic links. But it is a big problem for Linux.

The other problem is rmdirRecursive(). At present, I believe that it currently would follow symbolic links to directories rather than just deleting the symlink itself. Ideally, it would not follow symlinks.



I have included a patch which I believe fixes the problems. Whether it's the
best solution to the problem, I don't know. I added getLinkAttributes() which
uses lstat instead of stat and make isdir() and isfile() use
getLinkAttributes() rather than getAttributes(). I also added issymlink() to go
with isdir() and isfile(). DirEntry was changed to use lstat as well. That
should fix, isfile(), isdir(), and rmdirRecursive(). Most of the other
functions like lastModified() continue to use fstat, so they continue to give
information on the file which is pointed to rather than the symlink, which I
think is what we want.

The one wart is that having DirEntry use lstat means that all of the attributes that it gives are for the symlink itself which is not always what we want. So, I added a followLink() function which returns a DirEntry with the values for stat if it's a symbolic link or just returns a copy of the DirEntry if it's not. followLink() can almost certainly been done in a cleaner manner. I'd have used init, but I don't know how to get dirent for a single file rather than a whole directory. So, it uses stat and then checks the mode to figure out which value to set d_type for. DT_WHT is currently not used (it's commented out) because I have no clue what it represents, so I don't know what mode value from stat it corresponds to.

In any case, if my patch isn't quite what you'd want to use, hopefully it's at least a step in the right direction. Hopefully the patch is in an appropriate format. It has changes for std.file.d

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
February 24, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=3848


Andrei Alexandrescu <andrei@metalanguage.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
                 CC|                            |andrei@metalanguage.com


--- Comment #1 from Andrei Alexandrescu <andrei@metalanguage.com> 2010-02-24 05:11:32 PST ---
Thanks, I'll look into it. Just like you, I'm unclear whether isdir() vs.
isfile() should automatically dereference the link. I think we should keep the
current behavior and add an islink() test that tells whether the thing is
actually a link.

I think removal should remove the actual link.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
February 24, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=3848



--- Comment #2 from Jonathan M Davis <jmdavisProg@gmail.com> 2010-02-24 08:46:30 PST ---
The main problem with having isdir() and isfile() continuing to use stat, and
having the programmer just call issymlink() to check whether it's a link is
that that's a call to stat _and_ a call to lstat, and from what I understand
stat and lstat are a bit expensive (presumably because they have to talk to
disk). Of course, if there's a high chance that a programmer will have to check
whether a symlink points to a file or a directory, then they'll end up calling
both isdir()/isfile() and issymlink() and so will have a call to both stat and
lstat anyway, so it may be better to keep isdir() and isfile() using stat.
Really, it depends on the most likely use case. rmdirRecursive() is definitely
more efficient using just lstat though.

Personally, I wish that there were a function which returned information for the file which is pointed to while telling you whether it's a symlink or not, but unfortunately, from what I've found, there is no such beast.

From an efficiency perspective, it would be nice if there were a function which returned a DirEntry for a single file so that you can query that as to whether a file is a file, dir, or symlink without calling stat or lstat every time you call isfile(), isdir(), or issymlink(), but I couldn't figure out how to do that, so I don't know if it can be reasonably done (since presumably it would be silly and expensive to get the contents of the directory holding the file just to get the DirEntry for the file itself).

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
February 25, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=3848



--- Comment #3 from Jonathan M Davis <jmdavisProg@gmail.com> 2010-02-24 23:31:35 PST ---
Actually, after playing around with this a bit more, I think that I'm leaning
towards it being more useful for isdir() and isfile() to use stat. It can
result in less efficient code, but it's a lot easier to deal with the symlinks
that way. If isdir() and isfile() use lstat, you have to call getAttributes()
on the symlink to find out what it is, and that means anding the return value
with the appropriate flags, and that's not terribly user-friendly.

In fact, it would actually be nice if getAttributes() returned an enum or struct rather a uint. I believe that it's quite OS-dependent as it is, and you have to look at file.d to have any clue what the return value represents. Ideally, you wouldn't have to look a function's source code to figure out how to deal with its return value. But the whole issue of getAttributes() is more or less a separate (albeit connected) issue.

In any case, I think that from a user-friendliness perspective, you're right in
your assessment that it would be better to have isdir() and isfile() use stat
rather than lstat. It does, however, mean that you have to be that much more
careful with isdir() or we risk recursive issues or removing stuff in the
directory that it points to rather than just the link in functions like
rmdirRecursive().

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
February 26, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=3848


Jonathan M Davis <jmdavisProg@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
 Attachment #572 is|0                           |1
           obsolete|                            |


--- Comment #4 from Jonathan M Davis <jmdavisProg@gmail.com> 2010-02-26 00:32:57 PST ---
Created an attachment (id=573)
Next attempt at fixing problem.

Okay. I tried to fix this more cleanly this time. The result is that DirEntry has changed a fair bit, but now isdir() and isfile() use stat rather than lstat, and you can now get a DirEntry for a specific file without getting all the DirEntries for a directory. I don't know whether it will be to your liking, and I have not tested the Windows code at all, but from the testing I did of the linux code, it appears to work (though it could always use more unittests).

By making DirEntry as lazy/efficient as possible and making it possible to get one for any file, it should fix the efficiency issue of calling both stat and lstat as much as is possible (though it would still be nice if there were a posix function which did stat but also told you whether it was a symlink).

There are also versions of isdir(), isfile(), and issymlink() which take an
attributes (uint) value to make them the attributes values easier to use and
facilitate rmdirRecursive() only calling lstat. I didn't add any for block
devices and such, but perhaps they would be good to add as well.

DirEntry is using properties now with its member variables being private, allowing for guarantees as to the state of its members. I don't know whether you'd consider that a problem or not, but I think that it's cleaner.

I have attached the updated std.file.d.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
February 26, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=3848



--- Comment #5 from Jonathan M Davis <jmdavisProg@gmail.com> 2010-02-26 00:35:59 PST ---
Created an attachment (id=574)
Patch file for next attempt at fixing problem (instead of the full file).

Here's a patch if you'd prefer that to the fully changed file. You'd need to apply dos2unix to the original file.d first for it work.

Hopefully this at least comes close to solving the problem for you and saves you some work. Thanks.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
February 28, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=3848



--- Comment #6 from Jonathan M Davis <jmdavisProg@gmail.com> 2010-02-28 02:43:05 PST ---
I would point out that my changes appear to fix bug 1635, so fixing this bug could result in that one getting fixed at the same time.

I also noticed that I should have named the direntry() function dirEntry(). I
was not paying enough attention to the capitalization for dirEntries() and
thought that it was direntries(), so my attempt to make their capitalization
match failed.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
March 01, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=3848



--- Comment #7 from Sobirari Muhomori <dfj1esp02@sneakemail.com> 2010-03-01 07:04:11 PST ---
(In reply to comment #1)
> Thanks, I'll look into it. Just like you, I'm unclear whether isdir() vs.
> isfile() should automatically dereference the link. I think we should keep the
> current behavior and add an islink() test that tells whether the thing is
> actually a link.
Actually windows gone the same way: if you want to examine symlink, you should state it explicitly.

(In reply to comment #2)
> The main problem with having isdir() and isfile() continuing to use stat, and
> having the programmer just call issymlink() to check whether it's a link is
> that that's a call to stat _and_ a call to lstat, and from what I understand
> stat and lstat are a bit expensive (presumably because they have to talk to
> disk).
They don't talk to disk, they talk to disk cache.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
March 03, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=3848


Jonathan M Davis <jmdavisProg@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
 Attachment #573 is|0                           |1
           obsolete|                            |


--- Comment #8 from Jonathan M Davis <jmdavisProg@gmail.com> 2010-03-02 23:39:06 PST ---
Created an attachment (id=577)
Now fixes DirIterator and listdir() as well.

I don't know anything about the whole reading from disk reading from disk cache thing, but it's still I/O of some variety and, as I understand it, is not a cheap operation, so it's desirable to avoid extraneous stat and lstat calls regardless.

Also, I figured out that DirIterator and listdir() did not deal with symlinks correctly. In particular, they always followed them, which could be very bad in some situations. So, I'm attaching another version which fixes that problem. It also makes DirEntry on Linux a bit better. It occurred to me that we might as well just save the result from stat rather than copying some of its values to member variables to be used for the properties. That way, we can make the resulting struct available. It makes the DirEntry struct a bit larger, but it seems useful enough to me that it should be worth it. However, I didn't add any more of stat's properties (like inode or block size) to DirEntry since those are likely used rarely enough that the programmer can just access the stat struct in such cases. That and it allows DirEntry's API to be mostly the same between platforms.

As far as I've determined, this fix works on Linux (certainly, all of the unit tests pass, and all of the other testing that I've done looks good), but I haven't tested the Windows code at all. Most of the changes are to the Posix side though, so odds are that the Windows side is okay.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
March 03, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=3848


Jonathan M Davis <jmdavisProg@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
 Attachment #574 is|0                           |1
           obsolete|                            |


--- Comment #9 from Jonathan M Davis <jmdavisProg@gmail.com> 2010-03-02 23:40:53 PST ---
Created an attachment (id=578)
Patch version of "Now fixes DirIterator and listdir() as well."

Here's the patch version of the most recent fix. Again, you'll need to run dos2unix on the original std.file.d before applying it.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
« First   ‹ Prev
1 2 3