[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PULL 5/7] file-posix: Support BDRV_REQ_NO_FALLBACK for
From: |
Eric Blake |
Subject: |
Re: [Qemu-devel] [PULL 5/7] file-posix: Support BDRV_REQ_NO_FALLBACK for zero writes |
Date: |
Wed, 14 Aug 2019 21:44:28 -0500 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.7.0 |
On 3/26/19 10:51 AM, Kevin Wolf wrote:
> We know that the kernel implements a slow fallback code path for
> BLKZEROOUT, so if BDRV_REQ_NO_FALLBACK is given, we shouldn't call it.
> The other operations we call in the context of .bdrv_co_pwrite_zeroes
> should usually be quick, so no modification should be needed for them.
> If we ever notice that there are additional problematic cases, we can
> still make these conditional as well.
Are there cases where fallocate(FALLOC_FL_ZERO_RANGE) falls back to slow
writes? It may be fast on some file systems, but when used on a block
device, that may equally trigger slow fallbacks. The man page is not
clear on that fact; I suspect that there may be cases in there that need
to be made conditional (it would be awesome if the kernel folks would
give us another FALLOC_ flag when we want to guarantee no fallback).
By the way, is there an easy setup to prove (maybe some qemu-img convert
command on a specially-prepared source image) whether the no fallback
flag makes a difference? I'm about to cross-post a series of patches to
nbd/qemu/nbdkit/libnbd that adds a new NBD_CMD_FLAG_FAST_ZERO which fits
the bill of BDRV_REQ_NO_FALLBACK, but would like to include some
benchmark numbers in my cover letter if I can reproduce a setup where it
matters.
And this patch has a bug:
> +++ b/block/file-posix.c
> @@ -652,7 +652,7 @@ static int raw_open_common(BlockDriverState *bs, QDict
> *options,
> }
> #endif
>
> - bs->supported_zero_flags = BDRV_REQ_MAY_UNMAP;
> + bs->supported_zero_flags = BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK;
> ret = 0;
> fail:
> if (filename && (bdrv_flags & BDRV_O_TEMPORARY)) {
> @@ -1500,14 +1500,19 @@ static ssize_t
> handle_aiocb_write_zeroes_block(RawPosixAIOData *aiocb)
{
int ret = -ENOTSUP;
BDRVRawState *s = aiocb->bs->opaque;
if (!s->has_write_zeroes) {
return -ENOTSUP;
> }
At this point, ret is -ENOTSUP.
>
> #ifdef BLKZEROOUT
> - do {
> - uint64_t range[2] = { aiocb->aio_offset, aiocb->aio_nbytes };
> - if (ioctl(aiocb->aio_fildes, BLKZEROOUT, range) == 0) {
> - return 0;
> - }
> - } while (errno == EINTR);
> + /* The BLKZEROOUT implementation in the kernel doesn't set
> + * BLKDEV_ZERO_NOFALLBACK, so we can't call this if we have to avoid slow
> + * fallbacks. */
> + if (!(aiocb->aio_type & QEMU_AIO_NO_FALLBACK)) {
> + do {
> + uint64_t range[2] = { aiocb->aio_offset, aiocb->aio_nbytes };
> + if (ioctl(aiocb->aio_fildes, BLKZEROOUT, range) == 0) {
> + return 0;
> + }
> + } while (errno == EINTR);
>
> - ret = translate_err(-errno);
> + ret = translate_err(-errno);
> + }
If the very first call to this function is with NO_FALLBACK, then this
'if' is skipped,
> #endif
>
> if (ret == -ENOTSUP) {
s->has_write_zeroes = false;
}
and we set s->has_write_zeroes to false, permanently disabling any
BLKZEROOUT attempts in future calls, even if the future calls no longer
pass the NO_FALLBACK flag.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3226
Virtualization: qemu.org | libvirt.org
signature.asc
Description: OpenPGP digital signature
- Re: [Qemu-devel] [PULL 5/7] file-posix: Support BDRV_REQ_NO_FALLBACK for zero writes,
Eric Blake <=