qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [Qemu-devel] [PATCH for-3.0] file-posix: Fix write_zero


From: Nir Soffer
Subject: Re: [Qemu-block] [Qemu-devel] [PATCH for-3.0] file-posix: Fix write_zeroes with unmap on block devices
Date: Thu, 26 Jul 2018 20:34:27 +0300

On Thu, Jul 26, 2018 at 7:58 PM Kevin Wolf <address@hidden> wrote:
Am 26.07.2018 um 18:46 hat Nir Soffer geschrieben:
> On Thu, Jul 26, 2018 at 7:10 PM Eric Blake <address@hidden> wrote:
>
> > On 07/26/2018 10:33 AM, Kevin Wolf wrote:
> >
> > >
> > > As far as I know, the comment you quoted is accurate for BLKDISCARD and
> > > BLKZEROOUT, but not for the fallocate() flags.
> > >
> >
> > I sure wish the man pages were more explicit on what guarantees each
> > flag offers.
> >
> > >> Hmm - that thread also mentions FALLOC_FL_NO_HIDE_STALE, which is a new
> > flag
> > >> not present/documented on Fedora 28. I wonder if it helps, too.
> > >>
> > >>>
> > >>> FALLOC_FL_ZERO_RANGE in contrast implements write_zeroes without unmap.
> > >>
> > >> I thought the opposite: FALLOC_FL_ZERO_RANGE guarantees that you read
> > back
> > >> 0, using whatever is most efficient under the hood (in the case of block
> > >> devices, unmapping that reliably reads back as zero is favored).
> > >
> > > See the code I quoted above, FALLOC_FL_ZERO_RANGE calls
> > > blkdev_issue_zeroout() with BLKDEV_ZERO_NOUNMAP internally.
> >
> > Having to resort to reading the kernel code to learn what the guarantees
> > are is annoying (it's nice that GPL guarantees that we CAN do that, but
> > it means the man pages could use some TLC so that we don't have to).
> > Especially since the kernel code has changed over time.
> >
> > But your paste of current kernel code is rather hard to argue against.
>
> I don't think we should depend on undocumented kernel code.

Yes, the man page says "filesystem blocks" instead of just "blocks". If
you're in a nit-picking mood, that excludes block devices. But I think
it's pretty obvious what the intended meaning is and how it consistently
extends to block devices, and the code confirms it.

That's by far not "undocumented kernel code" in my book.

Maybe it is more correct to say this is poorly documented.

The manual is very specific about supported filesystems, but there nothing about
block devices:

   Deallocating file space                                                                                                                                                                    
       Specifying  the  FALLOC_FL_PUNCH_HOLE flag (available since Linux
       2.6.38) in mode deallocates space (i.e., creates a hole) in the 
       byte range starting at offset and continuing for len bytes.  Within the 
       specified range, partial filesystem blocks are zeroed, and whole
       filesystem blocks are removed from the file.  After a successful  call,
       subsequent  reads from this range will return zeroes.

       The  FALLOC_FL_PUNCH_HOLE  flag  must be ORed with FALLOC_FL_KEEP_SIZE
       in mode; in other words, even when punching off the end of the file, the 
       file size (as reported by stat(2)) does not change.

       Not all filesystems support FALLOC_FL_PUNCH_HOLE; if a filesystem
       doesn't support the operation, an error is returned.  The operation is
       supported  on  at  least  the  following filesystems:

       *  XFS (since Linux 2.6.38)
       *  ext4 (since Linux 3.0)
       *  Btrfs (since Linux 3.7)
       *  tmpfs(5) (since Linux 3.5)

I expect to see something like:

    * block devices (since Linux 4.9)

Anyway this looks the same level of documentation as FALLOC_FL_ZERO_RANGE,
so if we are ok with using it, FALLOC_FL_PUNCH_HOLE is fine.

Does this change mean that oVirt can start to enable discard for VM when
disks are using wipe-after-delete?

Currently we disable discard if a disk is marked for wiping, since discard does not
guarantee that discarded ranges are zeroed. 

Nir

reply via email to

[Prev in Thread] Current Thread [Next in Thread]