qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH for-3.0] file-posix: Fix write_zeroes with unmap


From: Nir Soffer
Subject: Re: [Qemu-devel] [PATCH for-3.0] file-posix: Fix write_zeroes with unmap on block devices
Date: Thu, 26 Jul 2018 20:34:27 +0300

On Thu, Jul 26, 2018 at 7:58 PM Kevin Wolf <address@hidden> wrote:

> Am 26.07.2018 um 18:46 hat Nir Soffer geschrieben:
> > On Thu, Jul 26, 2018 at 7:10 PM Eric Blake <address@hidden> wrote:
> >
> > > On 07/26/2018 10:33 AM, Kevin Wolf wrote:
> > >
> > > >
> > > > As far as I know, the comment you quoted is accurate for BLKDISCARD
> and
> > > > BLKZEROOUT, but not for the fallocate() flags.
> > > >
> > >
> > > I sure wish the man pages were more explicit on what guarantees each
> > > flag offers.
> > >
> > > >> Hmm - that thread also mentions FALLOC_FL_NO_HIDE_STALE, which is a
> new
> > > flag
> > > >> not present/documented on Fedora 28. I wonder if it helps, too.
> > > >>
> > > >>>
> > > >>> FALLOC_FL_ZERO_RANGE in contrast implements write_zeroes without
> unmap.
> > > >>
> > > >> I thought the opposite: FALLOC_FL_ZERO_RANGE guarantees that you
> read
> > > back
> > > >> 0, using whatever is most efficient under the hood (in the case of
> block
> > > >> devices, unmapping that reliably reads back as zero is favored).
> > > >
> > > > See the code I quoted above, FALLOC_FL_ZERO_RANGE calls
> > > > blkdev_issue_zeroout() with BLKDEV_ZERO_NOUNMAP internally.
> > >
> > > Having to resort to reading the kernel code to learn what the
> guarantees
> > > are is annoying (it's nice that GPL guarantees that we CAN do that, but
> > > it means the man pages could use some TLC so that we don't have to).
> > > Especially since the kernel code has changed over time.
> > >
> > > But your paste of current kernel code is rather hard to argue against.
> >
> > I don't think we should depend on undocumented kernel code.
>
> Yes, the man page says "filesystem blocks" instead of just "blocks". If
> you're in a nit-picking mood, that excludes block devices. But I think
> it's pretty obvious what the intended meaning is and how it consistently
> extends to block devices, and the code confirms it.
>
> That's by far not "undocumented kernel code" in my book.
>

Maybe it is more correct to say this is poorly documented.

The manual is very specific about supported filesystems, but there nothing
about
block devices:

   Deallocating file space


       Specifying  the  FALLOC_FL_PUNCH_HOLE flag (available since Linux
       2.6.38) in mode deallocates space (i.e., creates a hole) in the
       byte range starting at offset and continuing for len bytes.  Within
the
       specified range, partial filesystem blocks are zeroed, and whole
       filesystem blocks are removed from the file.  After a successful
call,
       subsequent  reads from this range will return zeroes.

       The  FALLOC_FL_PUNCH_HOLE  flag  must be ORed with
FALLOC_FL_KEEP_SIZE
       in mode; in other words, even when punching off the end of the file,
the
       file size (as reported by stat(2)) does not change.

       Not all filesystems support FALLOC_FL_PUNCH_HOLE; if a filesystem
       doesn't support the operation, an error is returned.  The operation
is
       supported  on  at  least  the  following filesystems:

       *  XFS (since Linux 2.6.38)
       *  ext4 (since Linux 3.0)
       *  Btrfs (since Linux 3.7)
       *  tmpfs(5) (since Linux 3.5)

I expect to see something like:

    * block devices (since Linux 4.9)

Anyway this looks the same level of documentation as FALLOC_FL_ZERO_RANGE,
so if we are ok with using it, FALLOC_FL_PUNCH_HOLE is fine.

Does this change mean that oVirt can start to enable discard for VM when
disks are using wipe-after-delete?

Currently we disable discard if a disk is marked for wiping, since discard
does not
guarantee that discarded ranges are zeroed.

Nir


reply via email to

[Prev in Thread] Current Thread [Next in Thread]