[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-block] [Qemu-devel] [PATCH for-3.0] file-posix: Fix write_zero
From: |
Kevin Wolf |
Subject: |
Re: [Qemu-block] [Qemu-devel] [PATCH for-3.0] file-posix: Fix write_zeroes with unmap on block devices |
Date: |
Thu, 26 Jul 2018 17:33:56 +0200 |
User-agent: |
Mutt/1.9.1 (2017-09-22) |
Am 26.07.2018 um 17:23 hat Eric Blake geschrieben:
> On 07/26/2018 10:06 AM, Kevin Wolf wrote:
>
> > > > +#ifdef CONFIG_FALLOCATE_PUNCH_HOLE
> > > > + ret = do_fallocate(s->fd, FALLOC_FL_PUNCH_HOLE |
> > > > FALLOC_FL_KEEP_SIZE,
> > > > + aiocb->aio_offset, aiocb->aio_nbytes);
> > >
> > > Umm, doesn't this have to use FALLOC_FL_ZERO_RANGE? FALLOC_FL_PUNCH_HOLE
> > > deallocs, but is not required to write zeroes.
> >
> > Yes, it is. See the man page:
> >
> > Specifying the FALLOC_FL_PUNCH_HOLE flag (available since Linux
> > 2.6.38) in mode deallocates space (i.e., creates a hole) in the byte
> > range starting at offset and continuing for len bytes. Within the
> > specified range, partial filesystem blocks are zeroed, and whole
> > filesystem blocks are removed from the file. After a successful
> > call, subsequent reads from this range will return zeroes.
>
> That's true for file-system fds, but not for block device fds.
It is true for block device fds, too. Look at fs/block_dev.c,
specifically blkdev_fallocate():
switch (mode) {
case FALLOC_FL_ZERO_RANGE:
case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE:
error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
GFP_KERNEL, BLKDEV_ZERO_NOUNMAP);
break;
case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE:
error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
GFP_KERNEL,
BLKDEV_ZERO_NOFALLBACK);
break;
case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE |
FALLOC_FL_NO_HIDE_STALE:
error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
GFP_KERNEL, 0);
break;
default:
return -EOPNOTSUPP;
}
> As pointed out by Nir,
>
> > https://patchwork.kernel.org/patch/9903757/
> Which says, among other things:
>
> >> Do we also know that the blocks were discarded as we do with
> >> BLKDISCARD ?
> >
> > There never was a way to know for sure.
> >
> > ATA DSM TRIM and SCSI UNMAP are hints by definition. We attempted to
> > bend their semantics towards getting predictable behavior but ultimately
> > failed. Too many corner cases.
> >
> >> As I mentioned before. We relied on discard_zeroes_data in mkfs.ext4
> >> to make sure that inode tables are zeroed after discard.
> >
> > The point is that you shouldn't have an if (discard_zeroes_data)
> > conditional in the first place.
> >
> > - If you need to dellocate a block range and you don't care about its
> > contents in the future, use BLKDISCARD / FL_PUNCH_HOLE.
> >
> > - If you need to zero a block range, use BLKZEROOUT / FL_ZERO_RANGE.
>
> PUNCH_HOLE deallocates; but can only guarantee a read back of zero on file
> systems.
As far as I know, the comment you quoted is accurate for BLKDISCARD and
BLKZEROOUT, but not for the fallocate() flags.
> Hmm - that thread also mentions FALLOC_FL_NO_HIDE_STALE, which is a new flag
> not present/documented on Fedora 28. I wonder if it helps, too.
>
> >
> > FALLOC_FL_ZERO_RANGE in contrast implements write_zeroes without unmap.
>
> I thought the opposite: FALLOC_FL_ZERO_RANGE guarantees that you read back
> 0, using whatever is most efficient under the hood (in the case of block
> devices, unmapping that reliably reads back as zero is favored).
See the code I quoted above, FALLOC_FL_ZERO_RANGE calls
blkdev_issue_zeroout() with BLKDEV_ZERO_NOUNMAP internally.
Kevin
- [Qemu-block] [PATCH for-3.0] file-posix: Fix write_zeroes with unmap on block devices, Kevin Wolf, 2018/07/26
- Re: [Qemu-block] [Qemu-devel] [PATCH for-3.0] file-posix: Fix write_zeroes with unmap on block devices, Eric Blake, 2018/07/26
- Re: [Qemu-block] [Qemu-devel] [PATCH for-3.0] file-posix: Fix write_zeroes with unmap on block devices, Nir Soffer, 2018/07/26
- Re: [Qemu-block] [Qemu-devel] [PATCH for-3.0] file-posix: Fix write_zeroes with unmap on block devices, Kevin Wolf, 2018/07/26
- Re: [Qemu-block] [Qemu-devel] [PATCH for-3.0] file-posix: Fix write_zeroes with unmap on block devices, Eric Blake, 2018/07/26
- Re: [Qemu-block] [Qemu-devel] [PATCH for-3.0] file-posix: Fix write_zeroes with unmap on block devices,
Kevin Wolf <=
- Re: [Qemu-block] [Qemu-devel] [PATCH for-3.0] file-posix: Fix write_zeroes with unmap on block devices, Eric Blake, 2018/07/26
- Re: [Qemu-block] [Qemu-devel] [PATCH for-3.0] file-posix: Fix write_zeroes with unmap on block devices, Nir Soffer, 2018/07/26
- Re: [Qemu-block] [Qemu-devel] [PATCH for-3.0] file-posix: Fix write_zeroes with unmap on block devices, Kevin Wolf, 2018/07/26
- Re: [Qemu-block] [Qemu-devel] [PATCH for-3.0] file-posix: Fix write_zeroes with unmap on block devices, Nir Soffer, 2018/07/26
- Re: [Qemu-block] [Qemu-devel] [PATCH for-3.0] file-posix: Fix write_zeroes with unmap on block devices, Eric Blake, 2018/07/26
- Re: [Qemu-block] [Qemu-devel] [PATCH for-3.0] file-posix: Fix write_zeroes with unmap on block devices, Kevin Wolf, 2018/07/26
- Re: [Qemu-block] [Qemu-devel] [PATCH for-3.0] file-posix: Fix write_zeroes with unmap on block devices, Eric Blake, 2018/07/26
Re: [Qemu-block] [PATCH for-3.0] file-posix: Fix write_zeroes with unmap on block devices, Nir Soffer, 2018/07/26