qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] semantics of FIEMAP without FIEMAP_FLAG_SYNC (was Re: [


From: Paolo Bonzini
Subject: Re: [Qemu-block] semantics of FIEMAP without FIEMAP_FLAG_SYNC (was Re: [Qemu-devel] [PATCH v5 13/14] nbd: Implement NBD_CMD_WRITE_ZEROES on server)
Date: Fri, 22 Jul 2016 06:41:24 -0400 (EDT)

> > > i.e. the use of fiemap to duplicate the exact layout of a file
> > > from userspace is only posisble if you can /guarantee/ the source
> > > file has not changed in any way during the copy operation at the
> > > pointin time you finalise the destination data copy.
> > 
> > We don't do exactly that, exactly because it's messy when you have
> > concurrent accesses (which shouldn't be done but you never know).
> 
> Which means you *cannot make the assumption it won't happen*.
> FIEMAP is not guaranteed to tell you exactly where the data in the
> file is that you need to copy is and that nothing you can do from
> userspace changes that. I can't say it any clearer than that.

You've said it very clearly.  But I'm not saying "fix the damn FIEMAP", I'm
saying "this is what we need, lseek doesn't provide it, FIEMAP comes
close but really doesn't".  If the solution is to fix FIEMAP, if it's
possible at all, that'd be great.  But any other solution is okay.

Do you at least agree that it's possible to use the kind of information
in struct fiemap_extent (extent start, length and flags) in a way that is
not racy, or at least not any different from SEEK_DATA and SEEK_HOLE's
raciness?  I'm not saying that you'd get that information from FIEMAP.
It's just the kind of information that I'd like to get. 

(BTW, the documentation of FIEMAP is horrible.  It does not say at all
that FIEMAP_FLAG_SYNC is needed to return extents that match what
SEEK_HOLE/SEEK_DATA would return.  The obvious reading is that
FIEMAP_FLAG_SYNC would avoid returning FIEMAP_EXTENT_DELALLOC extents,
and that in turn would not be a problem if you don't need fe_physical.
Perhaps it would help if fiemap.txt said started with *why* would one
use FIEMAP, not just what it does).

> > When doing a copy, we use(d to use) FIEMAP the same way as you'd use lseek,
> > querying one extent at a time.  If you proceed this way, all of these
> > can cause the same races:
> > 
> > - pread(ofs=10MB, len=10MB) returns all zeroes, so the 10MB..20MB is
> > not copied
> > 
> > - pread(ofs=10MB, len=10MB) returns non-zero data, so the 10MB..20MB is
> > copied
> > 
> > - lseek(SEEK_DATA, 10MB) returns 20MB, so the 10MB..20MB area is not
> > copied
> > 
> > - lseek(SEEK_HOLE, 10MB) returns 20MB, so the 10MB..20MB area is
> > copied
> > 
> > - ioctl(FIEMAP at 10MB) returns an extent starting at 20MB, so
> > the 10MB..20MB area is not copied
> 
> No, FIEMAP is not guaranteed to behave like this. what is returned
> is filesystem dependent. Fielsystems that don't support holes will
> return data extents. Filesystems that support compression might
> return a compressed data extent rather than a hole. Encrypted files
> might not expose holes at all, so people can't easily find known
> plain text regions in the encrypted data. Filesystems could report
> holes as deduplicated data, etc.  What do you do when FIEMAP returns
> "OFFLINE" to indicate that the data is located elsewhere and will
> need to be retrieved by the HSM operating on top of the filesystem
> before layout can be determined?

lseek(SEEK_DATA) might also say you're not on a hole because the file
is compressed/encrypted/deduplicated/offline/whatnot.  So lseek is
also filesystem dependent, isn't it?  It also doesn't work on block
devices, so it's really file descriptor dependent.  That's not news.

Of course read, lseek and FIEMAP will not return exactly the same
information.  The point is that they're subject to exactly the same
races, and that struct fiemap_extent *can* be parsed conservatively.
The code I attached to the previous message does that, if it finds any
extent kind other than an unwritten extent it just treats it as data.

Based on this it would be nice to understand the reason why FIEMAP needs
FIEMAP_FLAG_SYNC to return meaningful values (the meaningful values
are there, they're just what lseek or read use), or to have a more
powerful function than just lseek(SEEK_DATA/SEEK_HOLE).  All we want is
to be able to distinguish between the three fallocate modes.

> The assumptions being made about FIEMAP behaviour will only lead to
> user data corruption, as they already have several times in the past.

Indeed, FIEMAP as is ranks just above gets() in usability.  But there's
no reason why it has to be that way.

Paolo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]