qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH for-3.0] file-posix: Fix write_zeroes with unmap


From: Kevin Wolf
Subject: Re: [Qemu-devel] [PATCH for-3.0] file-posix: Fix write_zeroes with unmap on block devices
Date: Thu, 26 Jul 2018 20:09:37 +0200
User-agent: Mutt/1.9.1 (2017-09-22)

Am 26.07.2018 um 19:34 hat Nir Soffer geschrieben:
> On Thu, Jul 26, 2018 at 7:58 PM Kevin Wolf <address@hidden> wrote:
> > Am 26.07.2018 um 18:46 hat Nir Soffer geschrieben:
> > > I don't think we should depend on undocumented kernel code.
> >
> > Yes, the man page says "filesystem blocks" instead of just "blocks". If
> > you're in a nit-picking mood, that excludes block devices. But I think
> > it's pretty obvious what the intended meaning is and how it consistently
> > extends to block devices, and the code confirms it.
> >
> > That's by far not "undocumented kernel code" in my book.
> 
> Maybe it is more correct to say this is poorly documented.
> 
> The manual is very specific about supported filesystems, but there
> nothing about block devices:

Well, as specific as "_at least_ the following filesystems" can be.

>        Not all filesystems support FALLOC_FL_PUNCH_HOLE; if a filesystem
>        doesn't support the operation, an error is returned.  The operation
> is
>        supported  on  at  least  the  following filesystems:
> 
>        *  XFS (since Linux 2.6.38)
>        *  ext4 (since Linux 3.0)
>        *  Btrfs (since Linux 3.7)
>        *  tmpfs(5) (since Linux 3.5)
> 
> I expect to see something like:
> 
>     * block devices (since Linux 4.9)

Yup, the man page does need an update. But it says "at least", so
technically it's not wrong.

> Anyway this looks the same level of documentation as FALLOC_FL_ZERO_RANGE,
> so if we are ok with using it, FALLOC_FL_PUNCH_HOLE is fine.
> 
> Does this change mean that oVirt can start to enable discard for VM when
> disks are using wipe-after-delete?
> 
> Currently we disable discard if a disk is marked for wiping, since
> discard does not guarantee that discarded ranges are zeroed.

How do you wipe the disk? Does this really involve a write_zeroes
operation in qemu? But even if so, we always implemented that correctly
(modulo (mostly theoretical?) inaccuracy of BLKDISCARDZEROS in older
kernels). This patch fixes that we fall back to inefficient explicit
zero writing instead of offloading with recent kernels, but we didn't
just discard.

The explanation I've heard before from your team was that you're wiping
the disk with regular writes, but they feared that this doesn't take
effect for blocks that have previously been discarded. This is bullshit
without any basis in reality and has never been correct. When you zero
out data regions, they read as zeros, period. It doens't make any
difference whether the block had been discarded previously.

So either way, I feel like I already explained it to your team a
thousand times, but once again: Get rid of that behaviour yesterday. It
doesn't make sense and it never made sense.

Kevin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]