i.e. the use of fiemap to duplicate the exact layout of a file
from userspace is only posisble if you can /guarantee/ the source
file has not changed in any way during the copy operation at the
pointin time you finalise the destination data copy.
We don't do exactly that, exactly because it's messy when you have
concurrent accesses (which shouldn't be done but you never know).
Which means you *cannot make the assumption it won't happen*.
FIEMAP is not guaranteed to tell you exactly where the data in the
file is that you need to copy is and that nothing you can do from
userspace changes that. I can't say it any clearer than that.
You've said it very clearly. But I'm not saying "fix the damn
FIEMAP", I'm
saying "this is what we need, lseek doesn't provide it, FIEMAP comes
close but really doesn't". If the solution is to fix FIEMAP, if it's
possible at all, that'd be great. But any other solution is okay.
Do you at least agree that it's possible to use the kind of information
in struct fiemap_extent (extent start, length and flags) in a way that is
not racy, or at least not any different from SEEK_DATA and SEEK_HOLE's
raciness? I'm not saying that you'd get that information from FIEMAP.
It's just the kind of information that I'd like to get.
(BTW, the documentation of FIEMAP is horrible. It does not say at all
that FIEMAP_FLAG_SYNC is needed to return extents that match what
SEEK_HOLE/SEEK_DATA would return. The obvious reading is that
FIEMAP_FLAG_SYNC would avoid returning FIEMAP_EXTENT_DELALLOC extents,
and that in turn would not be a problem if you don't need fe_physical.
Perhaps it would help if fiemap.txt said started with *why* would one
use FIEMAP, not just what it does).
When doing a copy, we use(d to use) FIEMAP the same way as you'd use
lseek,
querying one extent at a time. If you proceed this way, all of these
can cause the same races:
- pread(ofs=10MB, len=10MB) returns all zeroes, so the 10MB..20MB is
not copied
- pread(ofs=10MB, len=10MB) returns non-zero data, so the 10MB..20MB is
copied
- lseek(SEEK_DATA, 10MB) returns 20MB, so the 10MB..20MB area is not
copied
- lseek(SEEK_HOLE, 10MB) returns 20MB, so the 10MB..20MB area is
copied
- ioctl(FIEMAP at 10MB) returns an extent starting at 20MB, so
the 10MB..20MB area is not copied
No, FIEMAP is not guaranteed to behave like this. what is returned
is filesystem dependent. Fielsystems that don't support holes will
return data extents. Filesystems that support compression might
return a compressed data extent rather than a hole. Encrypted files
might not expose holes at all, so people can't easily find known
plain text regions in the encrypted data. Filesystems could report
holes as deduplicated data, etc. What do you do when FIEMAP returns
"OFFLINE" to indicate that the data is located elsewhere and will
need to be retrieved by the HSM operating on top of the filesystem
before layout can be determined?
lseek(SEEK_DATA) might also say you're not on a hole because the file
is compressed/encrypted/deduplicated/offline/whatnot. So lseek is
also filesystem dependent, isn't it? It also doesn't work on block
devices, so it's really file descriptor dependent. That's not news.
Of course read, lseek and FIEMAP will not return exactly the same
information. The point is that they're subject to exactly the same
races, and that struct fiemap_extent *can* be parsed conservatively.
The code I attached to the previous message does that, if it finds any
extent kind other than an unwritten extent it just treats it as data.
Based on this it would be nice to understand the reason why FIEMAP needs
FIEMAP_FLAG_SYNC to return meaningful values (the meaningful values
are there, they're just what lseek or read use), or to have a more
powerful function than just lseek(SEEK_DATA/SEEK_HOLE). All we want is
to be able to distinguish between the three fallocate modes.
The assumptions being made about FIEMAP behaviour will only lead to
user data corruption, as they already have several times in the past.
Indeed, FIEMAP as is ranks just above gets() in usability. But there's
no reason why it has to be that way.
Paolo