qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] semantics of FIEMAP without FIEMAP_FLAG_SYNC (was Re: [


From: Dave Chinner
Subject: Re: [Qemu-devel] semantics of FIEMAP without FIEMAP_FLAG_SYNC (was Re: [PATCH v5 13/14] nbd: Implement NBD_CMD_WRITE_ZEROES on server)
Date: Thu, 21 Jul 2016 21:43:42 +1000
User-agent: Mutt/1.5.21 (2010-09-15)

On Wed, Jul 20, 2016 at 03:35:17PM +0200, Niels de Vos wrote:
> On Wed, Jul 20, 2016 at 10:30:25PM +1000, Dave Chinner wrote:
> > On Wed, Jul 20, 2016 at 05:19:37AM -0400, Paolo Bonzini wrote:
> > > Adding ext4 and XFS guys (Lukas and Dave respectively).  As a quick 
> > > recap, the
> > > issue here is the semantics of FIEMAP and SEEK_HOLE/SEEK_DATA, which we 
> > > use in
> > > "qemu-img map".  This command prints metadata about a virtual disk 
> > > image---which
> > > in the case of a raw image amounts to detecting holes and unwritten 
> > > extents.
> > > 
> > > First, it seems like SEEK_HOLE and SEEK_DATA really should be 
> > > "SEEK_NONZERO" and
> > > "SEEK_ZERO", on both ext4 and XFS.    You can see that unwritten extents 
> > > are
> > > reported by "qemu-img map" as holes:
> > 
> > Correctly so. seek hole/data knows nothing about the underlying
> > filesystem storage implementation. A "hole" is defined as a region
> > of zeros, and the filesystem is free call anything it kows for
> > certain contains zeros as a hole.
> > 
> > FYI, SEEK_HOLE/DATA were named after the the pre-existing Solaris
> > seek API that uses these definitions for hole and data....
> > 
> > >     $ dd if=/dev/urandom of=test.img bs=1M count=100
> > >     $ fallocate -z -o 10M -l 10M test.img
> > >     $ du -h test.img
> > >     $ qemu-img map --output=json test.img
> > >     [{ "start": 0, "length": 10485760, "depth": 0, "zero": false, "data": 
> > > true, "offset": 0},
> > >     { "start": 10485760, "length": 10485760, "depth": 0, "zero": true, 
> > > "data": false, "offset": 10485760},
> > >     { "start": 20971520, "length": 83886080, "depth": 0, "zero": false, 
> > > "data": true, "offset": 20971520}]
> > 
> > > 
> > > On the second line, zero=true data=false identifies a hole.  The right 
> > > output
> > > would either have zero=true data=true (unwritten extent) or just
> > > 
> > >     [{ "start": 0, "length": 104857600, "depth": 0, "zero": false, 
> > > "data": true, "offset": 0},
> > >
> > > since the zero flag is advisory (it doesn't try to detect zeroes beyond 
> > > what the
> > > filesystem says).
> > 
> > Not from SEEK_HOLE/SEEK_DATA. All it conveys is "this is the start
> > of a range of zeros" and "this is the start of a range of data". And
> > for filesystems that don't specifically implement these seek
> > operations, zeros are considered data. i.e. SEEK_HOLE will take you
> > to the end of file, SEEK_DATA returns the current position....
> > 
> > i.e. unwritten extents contain no data, so they are semantically
> > identical to holes for the purposes of seeking and hence SEEK_DATA
> > can skip over them.
> > 
> > > The reason why we disabled FIEMAP was a combination of a corruption and 
> > > performance
> > > issue.  The data corruption bug was at 
> > > https://bugs.launchpad.net/qemu/+bug/1368815
> > > and it was reported on Ubuntu Trusty (kernel 3.13 based on the release 
> > > notes at
> > > https://wiki.ubuntu.com/TrustyTahr/ReleaseNotes).  We corrected that by 
> > > using
> > > FIEMAP_FLAG_SYNC, based on a similar patch to coreutils.  This turned out 
> > > to be too
> > > slow, so we dropped FIEMAP altogether.
> > 
> > Yes, because FIEMAP output is only useful for diagnostic purposes as
> > it can be stale even before the syscall returns to userspace. i.e.
> > it can't be used in userspace for optimising file copies....
> 
> Oh... And I was surprised to learn that "cp" does use FIEMAP and not
> SEEK_HOLE/SEEK_DATA. You give a strong suggestion to fix that.
>   http://git.savannah.gnu.org/cgit/coreutils.git/tree/src/extent-scan.c#n87

My understanding was that FIEMAP was disabled almost immediately
after the data corruption problems were understood, and it hasn't
been turned back on since. I don't see FIEMAP calls in strace when I
copy sparse files, even when I use --sparse=auto which is supposed
to trigger the sparse source file checks using FIEMAP.

So, yeah, while there's FIEMAP code present, that doesn't mean it's
actually used. And, yes, there are comments in coreutils commits in
2011 saying that the FIEMAP workarounds are temporary until
SEK_HOLE/DATA are supported, which were added to the kernel in 2011
soon after the 'cp-corrupts-data-with-fiemap' debacle came to light.
Five years later, and the fiemap code is stil there?

Cheers,

Dave.
-- 
Dave Chinner
address@hidden



reply via email to

[Prev in Thread] Current Thread [Next in Thread]