[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-block] [Qemu-devel] Assertion failure on qcow2 disk with clust
Re: [Qemu-block] [Qemu-devel] Assertion failure on qcow2 disk with cluster_size != 64k
Tue, 25 Oct 2016 10:39:24 +0200
Am 25.10.2016 um 01:06 hat Ed Swierk geschrieben:
> On Mon, Oct 24, 2016 at 2:21 PM, Eric Blake <address@hidden> wrote:
> > How are you getting max_transfer == 65536? I can't reproduce it with
> > the following setup:
> > $ qemu-img create -f qcow2 -o cluster_size=1M file 10M
> > $ qemu-io -f qcow2 -c 'w 7m 1k' file
> > $ qemu-io -f qcow2 -c 'w -z 8003584 2093056' file
> > although I did confirm that the above sequence was enough to get the
> > -ENOTSUP failure and fall into the code calculating max_transfer.
> > I'm guessing that you are using something other than a file system as
> > the backing protocol for your qcow2 image. But do you really have a
> > protocol that takes AT MOST 64k per transaction, while still trying to a
> > cluster size of 1M in the qcow2 format? That's rather awkward, as it
> > means that you are required to do 16 transactions per cluster (the whole
> > point of using larger clusters is usually to get fewer transactions). I
> > think we need to get to a root cause of why you are seeing such a small
> > max_transfer, before I can propose the right patch, since I haven't been
> > able to reproduce it locally yet (although I admit I haven't tried to
> > see if blkdebug could reliably introduce artificial limits to simulate
> > your setup). And it may turn out that I just have to fix the
> > bdrv_co_do_pwrite_zeroes() code to loop multiple times if the size of
> > the unaligned head really does exceed the max_transfer size that the
> > underlying protocol is able to support, rather than assuming that the
> > unaligned head/tail always fit in a single fallback write.
> In this case I'm using a qcow2 image that's stored directly in a raw
> dm-crypt/LUKS container, which is itself a loop device on an ext4
> It appears loop devices (with or without dm-crypt/LUKS) report a
> 255-sector maximum per request via the BLKSECTGET ioctl, which qemu
> rounds down to 64k in raw_refresh_limits(). However this maximum
> appears to be just a hint: bdrv_driver_pwritev() succeeds even with a
> 385024-byte buffer of zeroes.
I suppose what happens is that we get short writes, but the raw-posix
driver actually has the loop to deal with this, so eventually we return
with the whole thing written.
Considering the presence of this loop, maybe we shouldn't set
bs->bl.max_transfer at all for raw-posix. Hm, except that for Linux AIO
we might actually need it.
> As for the 1M cluster size, this is a temporary workaround for another
> qemu issue (the default qcow2 L2 table cache size performs well with
> random reads covering only up to 8 GB of image data with 64k clusters;
> beyond that the L2 table cache thrashes). I agree this is not an
> optimal configuration for writes.
You can configure the qcow2 cache size without changing the cluster
size (though of course larger cluster sizes help the total metadata size
to stay smaller for larger image sizes):