qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] semantics of FIEMAP without FIEMAP_FLAG_SYNC (was Re: [


From: Paolo Bonzini
Subject: Re: [Qemu-devel] semantics of FIEMAP without FIEMAP_FLAG_SYNC (was Re: [PATCH v5 13/14] nbd: Implement NBD_CMD_WRITE_ZEROES on server)
Date: Thu, 15 Feb 2018 17:42:39 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.2

On 15/02/2018 17:40, Vladimir Sementsov-Ogievskiy wrote:
> Hi all.
> 
> Two years later, is there are any news on the topic?
> 
> I can't understand the following thing:
> 
>  - FIEMAP without FLAG_SYNC is unsafe
>  - FIEMAP with FLAG_SYNC is safe but slow
>  - so, we've dropped FIEMAP and use only lseek. So, it means that lseek
> is safe _and_ fast (at least, faster than FIEMAP with FLAG_SYNC)...
> 
> but how is it possible, if lseek and fiemap share same code in the kernel?

Do they?  Maybe not for all file systems.

> Do your code with
> 
>     /* Found an extent, and we're inside it.  */
>     *next = f.fe.fe_logical + f.fe.fe_length;
>     if (f.fe.fe_flags & FIEMAP_EXTENT_UNWRITTEN) {
>         return BDRV_BLOCK_DATA|BDRV_BLOCK_ZERO;
>     } else {
>         return BDRV_BLOCK_DATA;
>     }
> 
> provide safe block_status based on FIEMAP without FLAG_SYNC?

No, in fact we found data corruption with FIEMAP.

Paolo

> ------
> may help: link to discussion start:
> http://lists.gnu.org/archive/html/qemu-devel/2016-07/msg04641.html
> 
> 22.07.2016 13:41, Paolo Bonzini wrote:
>>>>> i.e. the use of fiemap to duplicate the exact layout of a file
>>>>> from userspace is only posisble if you can /guarantee/ the source
>>>>> file has not changed in any way during the copy operation at the
>>>>> pointin time you finalise the destination data copy.
>>>> We don't do exactly that, exactly because it's messy when you have
>>>> concurrent accesses (which shouldn't be done but you never know).
>>> Which means you *cannot make the assumption it won't happen*.
>>> FIEMAP is not guaranteed to tell you exactly where the data in the
>>> file is that you need to copy is and that nothing you can do from
>>> userspace changes that. I can't say it any clearer than that.
>> You've said it very clearly.  But I'm not saying "fix the damn
>> FIEMAP", I'm
>> saying "this is what we need, lseek doesn't provide it, FIEMAP comes
>> close but really doesn't".  If the solution is to fix FIEMAP, if it's
>> possible at all, that'd be great.  But any other solution is okay.
>>
>> Do you at least agree that it's possible to use the kind of information
>> in struct fiemap_extent (extent start, length and flags) in a way that is
>> not racy, or at least not any different from SEEK_DATA and SEEK_HOLE's
>> raciness?  I'm not saying that you'd get that information from FIEMAP.
>> It's just the kind of information that I'd like to get.
>>
>> (BTW, the documentation of FIEMAP is horrible.  It does not say at all
>> that FIEMAP_FLAG_SYNC is needed to return extents that match what
>> SEEK_HOLE/SEEK_DATA would return.  The obvious reading is that
>> FIEMAP_FLAG_SYNC would avoid returning FIEMAP_EXTENT_DELALLOC extents,
>> and that in turn would not be a problem if you don't need fe_physical.
>> Perhaps it would help if fiemap.txt said started with *why* would one
>> use FIEMAP, not just what it does).
>>
>>>> When doing a copy, we use(d to use) FIEMAP the same way as you'd use
>>>> lseek,
>>>> querying one extent at a time.  If you proceed this way, all of these
>>>> can cause the same races:
>>>>
>>>> - pread(ofs=10MB, len=10MB) returns all zeroes, so the 10MB..20MB is
>>>> not copied
>>>>
>>>> - pread(ofs=10MB, len=10MB) returns non-zero data, so the 10MB..20MB is
>>>> copied
>>>>
>>>> - lseek(SEEK_DATA, 10MB) returns 20MB, so the 10MB..20MB area is not
>>>> copied
>>>>
>>>> - lseek(SEEK_HOLE, 10MB) returns 20MB, so the 10MB..20MB area is
>>>> copied
>>>>
>>>> - ioctl(FIEMAP at 10MB) returns an extent starting at 20MB, so
>>>> the 10MB..20MB area is not copied
>>> No, FIEMAP is not guaranteed to behave like this. what is returned
>>> is filesystem dependent. Fielsystems that don't support holes will
>>> return data extents. Filesystems that support compression might
>>> return a compressed data extent rather than a hole. Encrypted files
>>> might not expose holes at all, so people can't easily find known
>>> plain text regions in the encrypted data. Filesystems could report
>>> holes as deduplicated data, etc.  What do you do when FIEMAP returns
>>> "OFFLINE" to indicate that the data is located elsewhere and will
>>> need to be retrieved by the HSM operating on top of the filesystem
>>> before layout can be determined?
>> lseek(SEEK_DATA) might also say you're not on a hole because the file
>> is compressed/encrypted/deduplicated/offline/whatnot.  So lseek is
>> also filesystem dependent, isn't it?  It also doesn't work on block
>> devices, so it's really file descriptor dependent.  That's not news.
>>
>> Of course read, lseek and FIEMAP will not return exactly the same
>> information.  The point is that they're subject to exactly the same
>> races, and that struct fiemap_extent *can* be parsed conservatively.
>> The code I attached to the previous message does that, if it finds any
>> extent kind other than an unwritten extent it just treats it as data.
>>
>> Based on this it would be nice to understand the reason why FIEMAP needs
>> FIEMAP_FLAG_SYNC to return meaningful values (the meaningful values
>> are there, they're just what lseek or read use), or to have a more
>> powerful function than just lseek(SEEK_DATA/SEEK_HOLE).  All we want is
>> to be able to distinguish between the three fallocate modes.
>>
>>> The assumptions being made about FIEMAP behaviour will only lead to
>>> user data corruption, as they already have several times in the past.
>> Indeed, FIEMAP as is ranks just above gets() in usability.  But there's
>> no reason why it has to be that way.
>>
>> Paolo
>>
> 
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]