bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: rm -r sometimes produces errors under NFS


From: Jim Meyering
Subject: Re: rm -r sometimes produces errors under NFS
Date: Wed, 07 Mar 2007 01:13:11 +0100

Vincent Lefevre <address@hidden> wrote:
> On 2007-03-06 23:41:30 +0100, Jim Meyering wrote:
>> Vincent Lefevre <address@hidden> wrote:
>> > No need to store names: if it's the second pass, all the files have
>> > already been unlinked.
>>
>> Not necessarily.  Have you looked at the code?
>> New files may have been added since the original opendir
>> or since the most recent rewinddir.  We'd need some way
>> to distinguish those new names from the ones we've already
>> successfully unlinked.
>
> But my point is that if there are new files, an unlink on them
> shouldn't return an ENOENT error ("No such file or directory").

You want to ignore only certain ENOENT errors.
With the current implementation, knowing which
to ignore would require recording which
names have been successfully unlinked.

>> >> > In fact, it isn't necessarily useful to remember anything.
>> >> > When rm attempts to remove a file in a recurse phase,
>> >> > no errors should be reported if the file doesn't exist.
>> >>
>> >> No.  Any POSIX-conforming rm implementation is required to
>> >> report such errors, unless you specify -f.
>> >
>> > Wrong. In the recurse phase, if rm tries to unlink a file, this means
>> > that the file has existed. So, this wouldn't be contrary to POSIX.
>>
>> Your conclusion is invalid.
>> What if some other process removed it first?
>
> AFAIK, POSIX doesn't say that there should be an error in this
> particular case,

Yes, it does.
This part of the standard is clear: "If this fails for any reason ..."

> and IMHO, it is better and more consistent *not*
> to return an error. Indeed, consider the following case:

Doing that would be inconsistent with other implementations, too.
Try to imagine why -f has the ENOENT-ignoring semantics.

> 1. "rm -r dir" is started.
> 2. A second process removes some file in some subdirectory of dir,
>    and the "rm -r" process hasn't had the time to see it.
> 3. "rm -r" terminates (without any error).
>
> Why would you want "rm -r" to return an error if some file is removed
> by a second process between the time "rm -r" does the readdir and the
> time unlink is performed on this file by "rm -r", but have no errors
> in the case I've described above?

To know that your rm was competing with another file-removing process.
To conform with POSIX.
To be consistent with all other rm implementations.

This aspect of rm is *not* new.  It has always been this way,
and will most assuredly not change.  This is independent of your
NFS-related problems.

>> If you're still convinced you have a case, you're going to have to
>> start quoting the standard. I based my statement on what I know of
>> POSIX, e.g., from this part of the rm specification:
>>
>>       4. If the current file is a directory, ...
>>       If the current file is not a directory, rm shall perform actions
>>       equivalent to the unlink() function defined in the System
>>       Interfaces volume of IEEE Std 1003.1-200x called with a pathname
>>       of the current file used as the path argument. If this
>>       fails for any reason, rm shall write a diagnostic message
>>       to standard error, do nothing more with the current file,
>>       and go on to any remaining files.
>
> IMHO, the fact that unlink returns an ENOENT error in the recurse phase
> *because of the implementation algorithm* should not be regarded as a
> failure.

That implementation can rewinddir safely because readdir returned NULL
with no error.

> Also note that in the NFS case, the errors are due to the rewind, but
> Point 2c for rm in POSIX is[*]:
>
>   For each entry contained in file, other than dot or dot-dot, the
>   four steps listed here (1 to 4) shall be taken with the entry as
>   if it were a file operand. The rm utility shall not traverse
>   directories by following symbolic links into other parts of the
>   hierarchy, but shall remove the links themselves.
>
> [*] http://www.opengroup.org/onlinepubs/009695399/utilities/rm.html
> (I don't know if this is the latest version...)
>
> So, if you want a strict interpretation of "For each entry", I don't
> think a rewind is allowed.

We've come full circle (maybe twice :-).
It comes down to having consistent results from unlink and readdir.
When readdir returns NULL with no error, I can safely call rewinddir
and then use readdir to iterate through any new entries.
Your system violates that assumption.

> Otherwise the consequences should be taken
> into account with care.
>
> Also, this doesn't explain why the directory itself isn't removed
> (after the "rm -r test", I get the errors, but an empty directory
> remains).

If any unlink fails, rm does not try to call rmdir on any
parent directory.  If your case (all unlinks failed with
ENOENT) were common, it might be worth handling it by attempting
the rmdir if that's the only type of failure.

>> > But there's still a race condition (unrelated to NFS) in the rm code.
>>
>> Something new?  Please give details.
>
> This is precisely what I've said above:
>
> 1. rm sees the filename in the directory stream.
> 2. A second process removes the file.
> 3. rm does an unlink on the filename and gets an ENOENT error.
>
> rm returns an error, but IMHO this is incorrect: Since at the time of
> the unlink, the file doesn't exist, then it should not be regarded as
> an entry of the directory (just as if the file were removed before rm
> could see it in the recurse phase). Said otherwise, to decide of a
> failure, rm should use the latest information available, and here the
> latest information is given by the return code of unlink.

What you've described is the desired behavior.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]