qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] Q: Report of leaked clusters with qcow2 when disk is resize


From: Darren Kenny
Subject: [Qemu-devel] Q: Report of leaked clusters with qcow2 when disk is resized with a live VM
Date: Wed, 13 Sep 2017 12:48:16 +0100
User-agent: Postbox 5.0.19 (Windows/20170908)

Hi,

It was observed during some testing of Qemu 2.9 that it appeared that if you
resized a qcow2 block device while the VM is running, that an qemu-img check
would report that there were leaked clusters.

The steps to reproduce are:

- First create the test image:

    # /usr/bin/qemu-img create -f qcow2 test.qcow2 10G
    Formatting 'test.qcow2', fmt=qcow2 size=10737418240 encryption=off
    cluster_size=65536 lazy_refcounts=off refcount_bits=16

    # qemu-img check test.qcow2
    No errors were found on the image.

- Now run a VM based here on Oracle Linux 7, but the disto really isn't
  important here, since the test disk is not even mounted in the VM at this
  point in time:

    # /usr/bin/qemu-kvm \
        -name 'test-vm' \
        -monitor stdio  \
-drive id=drive_image1,if=none,snapshot=on,format=qcow2,file=./ol73-64.qcow2 \ -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=0x4 \
        -drive id=drive_test,if=none,format=qcow2,file=./stg.qcow2 \
-device virtio-blk-pci,id=stg,drive=drive_test,bootindex=1,serial=TARGET_DISK0,bus=pci.0,addr=0x5 \ -net bridge,br=br1 -net nic,model=virtio,macaddr=52:54:00:90:91:92 \
        -m 4096  \
        -smp 2,maxcpus=2,cores=1,threads=1,sockets=2  \
        -vnc :0

- Resize the img size to 15g from qemu monitor (on stdio after above command)

    QEMU 2.5.0 monitor - type 'help' for more information
    (qemu) block_resize drive_test 15360

- Now, in a separate terminal, while leaving the VM running, check the img
  again from host side:

    # qemu-img check ./test.qcow2
    Leaked cluster 3 refcount=1 reference=0

    1 leaked clusters were found on the image.
    This means waste of disk space, but no harm to data.
    Image end offset: 327680

As it suggests above, this is not really corruption, but it is a bit
misleading, and could make people think there is an issue here
(hence the reason I've been asked to find a fix).

What I observed, then was that if I powered down the VM, or even just quit the
VM, that the subsequent check of the disk would say that everything was just
fine, and there no longer were leaked clusters.

In looking at the code in qcow2_truncate() it would appear that in the case
where prealloc has the value PREALLOC_MODE_OFF, that we don't flush the
metadata to disk - which seems to be the case here.

If I ignore the if test, and always execute the block in block/qcow2.c,
lines 3250 to 3258:

  if (prealloc != PREALLOC_MODE_OFF) {
      /* Flush metadata before actually changing the image size */
      ret = bdrv_flush(bs);
      if (ret < 0) {
          error_setg_errno(errp, -ret,
"Failed to flush the preallocated area to disk");
          return ret;
      }
  }

causing the flush to always be done, then the check will succeed when the VM
is still running.

While I know that this resolves the issue, I can only imagine that there was
some reason that this check for !PREALLOC_MODE_OFF was being done in the
first place.

So, I'm hoping that someone here might be able to explain to me why that check
is needed, but also why it might be wrong to do the flush regardless of the
value of prealloc here.

If it is wrong to do that flush here, then would anyone have suggestions as to
an alternative solution to this issue?

Thanks,

Darren.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]