bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#18499: Possible mv race for hardlinks (rhbz #1141368 )


From: Pádraig Brady
Subject: bug#18499: Possible mv race for hardlinks (rhbz #1141368 )
Date: Fri, 14 Nov 2014 23:36:47 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0

On 13/11/14 16:06, Boris Ranto wrote:
> On Thu, 2014-11-13 at 15:41 +0000, Pádraig Brady wrote:
>> On 13/11/14 13:58, Boris Ranto wrote:
>>> On Wed, 2014-09-24 at 17:18 +0100, Pádraig Brady wrote:
>>>> On 09/18/2014 11:52 AM, Ondrej Vasik wrote:
>>>>> Hi,
>>>>> as reported in https://bugzilla.redhat.com/show_bug.cgi?id=1141368 ,
>>>>> there is a possible race condition in mv in the case of hardlinks.
>>>>>
>>>>> Bug is reproducible even on ext4 filesystem (I guess it is filesystem
>>>>> independent), even with the latest coreutils. There was already attempt
>>>>> to fix the non-atomicity of mv for hardlinks
>>>>> ( http://thread.gmane.org/gmane.comp.gnu.coreutils.bugs/12985 ), from
>>>>> the current reproducer it looks like the fix is probably incomplete.
>>>>
>>>> The reason mv does the problematic unlink() is because it needs to simulate
>>>> the rename, as rename() has this IMHO incorrect, though defined and 
>>>> documented behavior:
>>>>
>>>>   "If oldpath and newpath are existing hard links referring to the same
>>>>    file, then rename() does nothing, and returns a success status."
>>>>
>>>> For arbitration of the rename between separate processes,
>>>> the kernel needs to be involved.  I noticed the very recent
>>>> renameat2() system call added to Linux which supports flags.
>>>> Miklos, do you think this is something that could be
>>>> handled by renameat2() either implicitly or explicitly with a flag?
>>>>
>>>> thanks,
>>>> Pádraig.
>>>>
>>>>
>>>>
>>>
>>> Hi all,
>>>
>>> I've looked into this and I believe that the whole idea that 'mv a b'
>>> where a and b are the same file (but different hard links) would unlink
>>> one of the files is flawed -- I think we should return an error message
>>> (something like we do for mv a a) and exit in this case. I'll add a
>>> little more of my reasoning, here:
>>>
>>> As Pádraig stated in his comment (and as is stated in rename man page):
>>>
>>>   "If oldpath and newpath are existing hard links referring to the same
>>>    file, then rename() does nothing, and returns a success status."
>>>
>>> This is probably due to this documented behaviour from rename man page:
>>>
>>>   "rename() renames a file, moving it between directories if
>>>    required. Any other  hard links to the file (as created using
>>>    link(2)) are unaffected."
>>>
>>> I.e. rename does not touch other hard links directly from its definition
>>> (hence, it can't do anything if it gets the same hard linked file
>>> twice). This actually means that (without a locking mechanism) there is
>>> no way for us to do mv a b for hard linked files atomically -- the two
>>> separate manual unlinks would race no matter what we do.
>>>
>>> The renameat2() function does not help here either (at least from what I
>>> could find about it). It only allows us to do atomic switch of the files
>>> which does not really help in this case.
>>
>> A flag could be added for this case though right?
>> I.E. to handle the case of racing renames,
>> one file would always be left in place.
>> In other words to support this functionality we would need
>> the kernel support from renameat().
>>
> 
> Yes, technically it could. Maybe, it would be worth a try to propose it
> for kernel.
> 
>>> Hence, I think that the best thing for us to do here is to print the
>>> "mv: 'a' and 'b' are the same hard linked file' and exit with an error
>>> code if we detect this behaviour. At least if we want to preserve the
>>> atomicity of the operation.
>>
>> We've three options that I see:
>>
>> 1. Simulate but be susceptible to data loss races (what we do now)
>> 2. Ignore rename() issue and leave both files present (what FreeBSD does)
>> 3. Fail with a "files are identical" error (what Solaris does)
>>

I see this was previously discussed in http://bugs.gnu.org/10686
and it was mentioned there that the 2008 POSIX standard explicitly
states that any of the three approaches above is appropriate.

> Personally, I like the Solaris way the best (if we can't get the system
> call that would assure the atomicity in this case) -- i.e. let the user
> know that we can't do that. It is also in tune with what the link man
> page says for hard links:
> 
>    "This  new  name  may be used exactly as the old one for any
>     operation;"
> 
> As we error out on 'mv a a', we should probably also threat 'mv a b'
> where a and b are the same file that way.
> 
> As opposed to FreeBSD solution, the user will know that he needs to run
> 'rm a' to accomplish what he actually wanted to do in the first place.
> 
>> Now 1 is also an "issue" when the source and dest files are on separate file 
>> systems.
>> I.E. mv will need to unlink() in that case.  So from a high level, mv is 
>> foregoing the
>> atomicity of the rename in this edge case to provide consistent behavior 
>> across
>> all cases.  The question boils down to whether atomicity in this edge case 
>> is required.
>> Could you expand on a use case that requires this atomicity?
>>
> There actually is a real world use case. The original red hat bugzilla
> (1141368) was filed because people were hitting this issue in a
> distributed environment (via GlusterFS) -- i.e. two people were renaming
> files at the same time from different locations.

Following those bugs indicates it was an internal test case that was failing.
I.E. it's an unlikely edge case.

I'm in a bit of a quandary with this one.
Seeing as this (consistent high level) behavior has been in use for the last 11 
years:
http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=fc6073d6
without a "real" customer complaint, I'm tending to leave as is,
but push for better behavior from the kernel.

I.E. add a RENAME_REPLACE flag to renameat2(), which will
do the rename atomically for hardlinks, so we could do something like:

  #if !HAVE_RENAME_REPLACE
  if (same_file (a, b))
    unlink (src); /* as we do now.  */
  #endif
  rename (a, b);

thanks,
Pádraig.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]