Re: rm -r sometimes produces errors under NFS

bug-coreutils

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: rm -r sometimes produces errors under NFS

From:	Linda Walsh
Subject:	Re: rm -r sometimes produces errors under NFS
Date:	Thu, 15 Mar 2007 16:26:52 -0700
User-agent:	Thunderbird 1.5.0.10 (Windows/20070221)



Jim Meyering wrote:

Vincent Lefevre <address@hidden> wrote:

I've attached the log. Here are the contents of the archive:


Your log shows ...
  ...
  access("test/config.h.in", W_OK)        = -1 ESTALE (Stale NFS file handle)
  unlink("/proc/self/fd/4/config.h.in")   = -1 ENOENT (No such file or 
directory)


and
Jim Meyering wrote:
> I based my statement on what I know of POSIX, e.g., from this part
> of the rm specification:
>       4. If the *current file* is a directory, ...
>       If the *current file* is not a directory, rm shall perform actions
>       equivalent to the unlink() function defined in the System
>       Interfaces volume of IEEE Std 1003.1-200x called with a pathname
>       of the current file used as the path argument. If this
>       fails for any reason, rm shall write a diagnostic message
>       to standard error, do nothing more with the current file,
>       and go on to any remaining files.

-----
        Not to reopen any wounds, but not having a POSIX spec in front
of me, I can't tell what is meant by "current file" above in the POSIX
spec you quoted, but I'm wondering if this could be a _possible_
interpretation:
   The output of "access" (in your first quote) shows us that the tested
filename is not a valid identifier (ESTALE).  Could it not be argued that
for purposes of determining the "next", "current file" to delete, ESTALE
invalidates the attempt to use current name and rm should skip to the next
file in the sequence.  I.e. the 2nd unlink should never be performed because
the "access" call tells us, that, for whatever reason, the file we are about
to delete isn't a valid pathname at the point we are testing it (via "access").

    The "ESTALE" doesn't even have to be returned from the server, I don't 
think.
Sometime back, nfs used multiple "biosd" clients talking to multiple nfsd's on a
server, so you have at least 2 opportunities for a "local" fs-proxy to be out
of sync with a peer.  But certainly one nfsd
 to pass requests to a remote
server.  Isn't it possible the synchronization problem could happen like:

myprog ->  unlink(fn) -> biosd1 (-> nfsd1; biosd1 returns "ok"
                                     before "nfsd1"completes request)
myprog -> readdir -> biosd2 (returns cur state of dir before
                                      fn unlink status is known)
            <server returns status to biosd1, client state updated>
myprog ->  access(fn) -> biosd1 ( returns ESTALE)
...
-------
     I agree rm should report a failed "unlink", but if the "access" returns
"ESTALE", shouldn't that be taken as a hint that the filename we are trying to
access has been updated and that an "unlink" should not be attempted (because
the filename is invalid?)

Just a thought that might reduce error messages while still being POSIX
compliant?

Linda

[Prev in Thread]

Current Thread

[Next in Thread]

Re: rm -r sometimes produces errors under NFS, (continued)

Prev by Date: Re: mv, cp ask anyway though bound to fail
Next by Date: Re: sort -n info page
Previous by thread: Re: rm -r sometimes produces errors under NFS
Next by thread: Re: rm -r sometimes produces errors under NFS
Index(es):
- Date
- Thread