bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#11100: Racy code in copy.c


From: NeilBrown
Subject: bug#11100: Racy code in copy.c
Date: Thu, 29 Mar 2012 10:03:27 +1100

On Wed, 28 Mar 2012 18:07:51 +0200 Jim Meyering <address@hidden> wrote:

> NeilBrown wrote:
> > On Tue, 27 Mar 2012 15:40:25 +0200 Jim Meyering <address@hidden> wrote:
> >
> >> Philipp Thomas wrote:
> >> > I'd like to pass on observations from my collegue Neil Brown:
> >> >
> >> > in src/copy.c, copy_reg() is passed "bool *new_dst".
> >> >
> >> > This is 'false' if the file already exists, in which case it attempts to
> >> > open the file with O_WRONLY | O_TRUNC | O_BINARY.
> >> > If it is 'true', only then does it use O_CREAT (and others).
> >> >
> >> > Somewhere up the call chain - I'm not sure where - new_dst is set if 
> >> > 'stat'
> >> > on the file succeeds.  The above mentioned code assumes that the file 
> >> > still
> >> > exists.  This is racy - particularly for NFS where deletions from other
> >> > clients can take a while to appear.
> >>
> >> Thanks for the report.
> >> However, much of what cp does is mandated by the POSIX spec, including
> >> that inevitable-looking (POSIX-mandated) TOCTOU race.  Here's part of that
> >> spec, from 
> >> http://pubs.opengroup.org/onlinepubs/9699919799/utilities/cp.html:
> ...
> >> If you can find a way to make cp work sensibly in your specific case,
> >> yet without impacting any other use case, please let us know.
> >
> > The above doesn't specify how you determine if the dest file exists.
> > You could do this with stat plus open(O_WRONLY).
> > Only try the open on REG files.
> 
> It seems that any change here would be working around the non-POSIX
> nature of NFS: note that POSIX seems clear[*] that stat must tell us
> whether a file exists (barring "additional or alternate file access
> control mechanisms" which aren't an issue here).

stat() cannot tell whether a file "exists" - only whether it "existed" very
recently.  open() can ensure that the file still exists - though there is no
guarantee that the name still exists...

> 
> I.e., this is an RFE: avoid NFS races due to inherent
> POSIX-non-compliance of NFS, rather than a bug.

Maybe .....

Suppose a 'cp' and an 'rm' are racing.  i.e you cp to 'foo' at much the same
time that someone else does 'rm foo'.
What results are valid?
Certain it is valid for both to succeed and 'foo' not to exist if 'rm foo'
happened last.
Similarly it is valid for both to succeed and foo to be a newly created file
if 'cp' happened last.
But is it valid for 'rm' to succeed and 'cp' to report that it couldn't
create the target because "no such file or directory" ?

I would suggest not, though I doubt the spec is at all clear on this.
However that is what the current code allows.

NFS doesn't exactly introduce new behaviour, it just makes a particular race
condition much easier to hit.

So I'll not press that it is a "bug" in cp, but I would suggest that it is an
opportunity to improve behaviour in a corner-case.


> 
> >        if ((use_stat
> > -           ? stat (dst_name, &dst_sb)
> > +           ? (stat (dst_name, &dst_sb) < 0 ? -1 :
> > +         (fd = open (dst_name, O_WRONLY)) < 0 ? -1 : 0)
> >             : lstat (dst_name, &dst_sb))
> >            != 0)
> 
> At first glance, that might be reasonable: the additional open
> is incurred only after a failed stat.

This isn't what the proposed code does.  The open is incurred after a
*successful* stat (though it should only be tried if stat reported S_IFREG).
And it is not intended as an 'additional' open.  Rather the open that we
would expect anyway is being performed earlier to avoid a race condition.

> I'll look more closely in a week or two if no one else investigates.
> 
> One thing that would help a lot is a test case (even using gdb, strace,
> stap, inotify, fuse, etc.) that consistently triggers the failure without
> the need for an NFS set-up.

gdb cp
run cp foo # this succeeds ofcourse - 'foo' is a copy of 'cp'
break copy.c:1679  # this is 'have_dst_lstat = !use_stat;'
shell rm foo
c

Result:

/home/git/coreutils/src/cp: cannot create regular file 'foo': No such file or 
directory

With NFS, the 'rm' happens on a different machine and due to the caching
protocol of NFS the unlink is not visible to the 'stat', but it is to the
subsequent 'open' - it is as though it happened between those two calls.

NeilBrown

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]