bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: rm -r sometimes produces errors under NFS


From: Jim Meyering
Subject: Re: rm -r sometimes produces errors under NFS
Date: Sat, 10 Mar 2007 12:41:27 +0100

Vincent Lefevre <address@hidden> wrote:
> On 2007-03-09 00:44:55 +0100, Jim Meyering wrote:
>> Realize that for most people (everyone except you, afaik),
>> rm works just fine.
>
> Yes, for most people, rm works fine. But the problem exists (I had
> it on 3 different NFS servers in the past few years). And for your
> information, other users have reported the same problem, e.g.
>
> http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=994291&admit=-682735245+1173400109463+28353475

That can't be relevant to any recent rewinddir-related change.
That report involves Redhat EL 3.0 U4 W, so they're probably
using a Red Hat-patched coreutils-4.5.3.  The comments imply
that at least one person there suspected that NFS caching was the
root cause.

>> Please step back a moment and consider whether you have an unusual
>> NFS setup, since you are the only one to report such a problem.
>
> Correction: I'm the only one who has reported it at the right place
> (well, perhaps not the right place, seeing how this problem is
> considered here...). It is well know that most users don't report
> bugs, or report them at a different place, more likely searching
> for an immediate workaround. This is also my case, sometimes. You
> can see here the first time I had this problem (this was with GNU
> fileutils 4.0p, in 2001):
>
> http://groups.google.com/group/fr.comp.os.unix/browse_thread/thread/2e526832a2f3947d/

That can't be related to rm's use of rewinddir, either.
Back in 2001, rm was not using the same algorithm.
You can avoid these "ENOENT" (No such file or directory) errors
simply by using -f.

> Also note that the problem occurred much more frequently with the
> coreutils snapshot (6.8+) than with the current Debian version (5.97).
> And I doubt that many people use the snapshot version.
>
> And I'm also one of those who use the machines the most intensively
> (I'm often the only one to report bugs, but they are sometimes
> eventually identified and fixed).

And I appreciate the testing.
However, you may find that people are more responsive
and more willing to go out of their way to help you
if you cultivate a less abrasive manner.

>> Better still, write a script that will demonstrate the problem,
>> given a small number of inputs (e.g., directory, hostname) and ask
>> people to run it and report any problem they see.
>
> The problem is that it is difficult to reproduce under different
> conditions, in particular if the number of inputs is small. BTW,
> I can no longer reproduce the problem with my testcase that was
> 100% reproducible a few days ago (though under the same conditions

This points the finger squarely at NFS, imho.

> on my side, and the machines haven't rebooted). It probably depends
> on the load of the machine or the network (as very often, when the
> bug depends on race conditions).

Perhaps it depends on a hard-to-reproduce NFS bug, too.
That is why I am reluctant to make a significant change to solve
your problem without first hearing that it affects more people.

>> I admit that the "rm skips rmdir" may be technically contrary to POSIX,
>> but unless there's a more realistic way to trigger misbehavior, then I
>> won't try to change it.  However, if you develop a clean, non-invasive
>> patch to make rm conform to the letter of POSIX, and add a test script,
>> I'll consider it.
>
> A suggestion concerning the "rm skips rmdir": Consider that ENOENT
> errors should not block rmdir (and other errors do). Indeed such an

Of course.  If your NFS trouble turns out to be common, then this
will have to be addressed.  Until then, the only scenario for which
it makes a difference is highly contrived and harmless.  However,
if the fix is truly tiny and non-invasive, then it might be worth
making, independent of the NFS issue.

> "error" doesn't mean that an existing file couldn't be unlinked,
> just that the file didn't exist. And to implement that, only an
> additional flag is necessary, isn't it? (But I haven't looked at
> the coreutils source very deeply).




reply via email to

[Prev in Thread] Current Thread [Next in Thread]