qemu-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: qemu-img measure


From: Nir Soffer
Subject: Re: qemu-img measure
Date: Thu, 23 Jul 2020 21:44:58 +0300

On Thu, Jul 23, 2020 at 7:55 PM Arik Hadas <ahadas@redhat.com> wrote:
>
>
>
> On Thu, Jul 23, 2020 at 7:31 PM Nir Soffer <nsoffer@redhat.com> wrote:
>>
>> On Thu, Jul 23, 2020 at 6:12 PM Arik Hadas <ahadas@redhat.com> wrote:
>>
>> The best place for this question is qemu-discuss, and CC Kevin and Stefan
>> (author of qemu-img measure).
>>
>> > @Nir Soffer does the following make any sense to you:
>> >
>> > [root@lion01 8c98c94d-bc14-4e24-b89c-4d96c820056d]# qemu-img info 
>> > 73dde1fc-71c1-431a-8762-c2e71ec4cb93
>> > image: 73dde1fc-71c1-431a-8762-c2e71ec4cb93
>> > file format: raw
>> > virtual size: 15 GiB (16106127360 bytes)
>> > disk size: 8.65 GiB
>> >
>> > [root@lion01 8c98c94d-bc14-4e24-b89c-4d96c820056d]# qemu-img measure -O 
>> > qcow2 73dde1fc-71c1-431a-8762-c2e71ec4cb93
>> > required size: 16108814336
>> > fully allocated size: 16108814336
>>
>> This means the file system does not report sparseness info, and without
>> information qemu-img cannot give a safe estimate.
>>
>> I can reproduce this on NFS 3:
>>
>> $ mount | grep export/2
>> nfs1:/export/2 on /rhev/data-center/mnt/nfs1:_export_2 type nfs
>> (rw,relatime,vers=3,rsize=262144,wsize=262144,namlen=255,soft,nolock,nosharecache,proto=tcp,timeo=100,retrans=3,sec=sys,mountaddr=192.168.122.30,mountvers=3,mountport=20048,mountproto=udp,local_lock=all,addr=192.168.122.30)
>>
>> $ cd /rhev/data-center/mnt/nfs1:_export_2
>>
>> $ truncate -s 1g empty.img
>>
>> $ qemu-img measure -O qcow2 empty.img
>> required size: 1074135040
>> fully allocated size: 1074135040
>>
>> $ qemu-img map --output json empty.img
>> [{ "start": 0, "length": 1073741824, "depth": 0, "zero": false,
>> "data": true, "offset": 0}]
>>
>> If we run qemu-img measure with strace, we can see:
>>
>> $ strace qemu-img measure -O qcow2 empty.img 2>&1 | grep SEEK_HOLE
>> lseek(9, 0, SEEK_HOLE)                  = 1073741824
>>
>> This means the byte range from 0 to 1073741824 is data.
>>
>> If we do the same on NFS 4.2:
>>
>> $ mount | grep export/1
>> nfs1:/export/1 on /rhev/data-center/mnt/nfs1:_export_1 type nfs4
>> (rw,relatime,vers=4.2,rsize=262144,wsize=262144,namlen=255,soft,nosharecache,proto=tcp,timeo=100,retrans=3,sec=sys,clientaddr=192.168.122.23,local_lock=none,addr=192.168.122.30)
>>
>> $ cd /rhev/data-center/mnt/nfs1\:_export_1
>> $ qemu-img measure -O qcow2 empty.img
>> required size: 393216
>> fully allocated size: 1074135040
>>
>> Unfortunately oVirt default is not NFS 4.2 yet, and we even warn about
>> changing the stupid default.
>>
>> > qemu-img convert -f raw -O qcow2 73dde1fc-71c1-431a-8762-c2e71ec4cb93 
>> > /tmp/arik.qcow2
>>
>> qemu-img convert detects zeros in the input file, so it can cope with
>> no sparseness info.
>> This is not free of course, copying this image is much slower when we
>> have to read the entire
>> image.
>
>
> It would have been great if 'measure' could also have such an ability to take 
> zeros into account as the 'convert',
> even if it means longer execution time - otherwise when we export VMs to OVAs 
> on such file systems, we may end up allocating the virtual size within the 
> OVA (at least when base volume is a raw volume).

You can file RFE for qemu-img.

>> > [root@lion01 8c98c94d-bc14-4e24-b89c-4d96c820056d]# qemu-img measure -O 
>> > qcow2 /tmp/arik.qcow2
>> > required size: 9359720448
>> > fully allocated size: 16108814336
>>
>> Now we have qcow2 image, so we don't depend on the file system capabilities.
>> This is the advantage of using advanced file format.
>>
>> > shouldn't the 'measure' command be a bit smarter than that? :)
>>
>> I think it cannot be smarter, but maybe qemu folks have a better answer.
>>
>> To measure, qemu-img needs to know how the data is laid out on disk, to 
>> compute
>> the number of clusters in the qcow2 image. Without help from the
>> filesystem the only
>> way to do this is to read the entire image.
>>
>> The solution in oVirt is to allocate the required size (possibly
>> overallocating) and after
>> conversion was finished, reduce the volume to the required size using:
>> http://ovirt.github.io/ovirt-engine-sdk/4.4/services.m.html#ovirtsdk4.services.StorageDomainDiskService.reduce
>>
>> This is much faster than reading the entire image twice.
>
>
> That's sort of what we've started with - creating temporary volumes that were 
> then copied to the OVA
> But this took long time and consumed space on the storage domains so at some 
> point we switched to use the 'measure' command - thinking it would give us 
> the same result as if it was invoked on the 'collapsed' qcow2 volume...
> I guess the apparent size of the 'collapsed' qcow2 volume will be closer to 
> the disk size than to the virtual size - would it make more sense maybe to 
> allocate the space within the OVA according to the disk size (with some 
> buffers) then ?

You cannot predict the size without knowing how many qcow2 clusters you
have, so  the only safe value is what qemu-measure reports.

The root cause is people using NFS < 4.2 or raw preallocated volumes
on block storage.
We cannot measure such images or copy/download them efficiently. People who made
these choices need to deal with the consequences.

I think creating the ova on the server using block storage is not the
best way now.
When we added this feature, we did not have a good way to download complete
disk contents, but in 4.4 we support download of complete disk -
collapsing all the
snapshots to a single image, converting the image format on the fly,
and doing all this
while the vm is running without downtime.

We can start a backup, download all the disks, add ovf, and make a tarball.
Here is a working example you can try now:

$ mkdir disks

$ ./backup_vm.py full --engine-url https://engine3 --username admin@internal \
    --password-file engine3-password --cafile engine3.pem --backup-dir disks \
    ed2e2c59-36d3-41e2-ac7e-f4d33eb69ad4
[   0.0 ] Starting full backup for VM ed2e2c59-36d3-41e2-ac7e-f4d33eb69ad4
[   1.3 ] Waiting until backup cbfd61fa-b4f5-40e2-9aef-eb538a7010c7 is ready
[   3.3 ] Created checkpoint 'fcd24ad4-06ba-4981-a645-2f1fda841fd3'
(to use in --from-checkpoint-uuid for the next incremental backup)
[   3.3 ] Creating image transfer for disk 58daea80-1229-4c6b-b33c-1a4e568c8ad7
[   4.4 ] Image transfer 5a86c808-57b3-4b88-85f7-cc17a87d6646 is ready
Formatting 'disks/58daea80-1229-4c6b-b33c-1a4e568c8ad7.202007232141.full.qcow2',
fmt=qcow2 size=6442450944 cluster_size=65536 lazy_refcounts=off
refcount_bits=16
[ 100.00% ] 6.00 GiB, 9.00 seconds, 682.94 MiB/s
[  13.4 ] Finalizing image transfer
[  18.4 ] Creating image transfer for disk b815fac3-cf93-4d7f-8ae2-2cdf3176e18e
[  19.5 ] Image transfer 3a5a5106-719f-4771-b8a5-5512b1ed60c6 is ready
Formatting 'disks/b815fac3-cf93-4d7f-8ae2-2cdf3176e18e.202007232141.full.qcow2',
fmt=qcow2 size=1073741824000 cluster_size=65536 lazy_refcounts=off
refcount_bits=16
[ 100.00% ] 1000.00 GiB, 0.09 seconds, 10.65 TiB/s
[  19.6 ] Finalizing image transfer
[  20.7 ] Full backup completed successfully

$ ls -lh disks/
total 2.2G
-rw-r--r--. 1 nsoffer nsoffer 2.2G Jul 23 21:41
58daea80-1229-4c6b-b33c-1a4e568c8ad7.202007232141.full.qcow2
-rw-r--r--. 1 nsoffer nsoffer 208K Jul 23 21:41
b815fac3-cf93-4d7f-8ae2-2cdf3176e18e.202007232141.full.qcow2

backup_vm.py is here:
https://github.com/oVirt/ovirt-engine-sdk/blob/master/sdk/examples/backup_vm.py

To do this efficiently with dumb storage we will have to do zero
detection on imageio side,
and support streaming format. We discussed this is the past did not
have a good enough
use case to implement this, but maybe we should.

Nir


Nir




reply via email to

[Prev in Thread] Current Thread [Next in Thread]