Re: [Qemu-devel] [PATCH 2/2] qcow2: Avoid memory over-allocation on comp

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 2/2] qcow2: Avoid memory over-allocation on comp

From:	Eric Blake
Subject:	Re: [Qemu-devel] [PATCH 2/2] qcow2: Avoid memory over-allocation on compressed images
Date:	Wed, 21 Feb 2018 09:59:25 -0600
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0

On 02/21/2018 09:00 AM, Eric Blake wrote:

On 02/21/2018 04:04 AM, Alberto Garcia wrote:

On Tue 20 Feb 2018 11:24:59 PM CET, Eric Blake wrote:

I was also preparing a patch to change this, but you arrived first :-)

So, it's time to cut back on the waste.  A compressed cluster
will NEVER occupy more than an uncompressed cluster

And here, csize can only get smaller than the length picked bynb_csectors. Therefore, csize is GUARANTEED to be <= c->sector_size.
- Solution a: check that csize < s->cluster_size and return an error if
   it's not. However! although QEMU won't produce an image with a
   compressed cluster that is larger than the uncompressed one, the qcow2
   on-disk format in principle allows for that, so arguably we should
   accept it.
No, the qcow2 on-disk format does not have enough bits reserved forthat. It is impossible to store an inflated cluster, because you don'thave enough bits to record it.


Okay, the spec is WRONG, compared to our code base.

That said, we MAY have a bug, more likely to be visible in 512-byteclusters but possible on any size. While the 'number sectors' field ISsufficient for any compressed cluster starting at offset 0 in thecluster, we run into issues if the starting offset is later in thecluster. That is, even though the compressed length of a 512-bytecluster is <= 512 (or we wouldn't compress it), if we start at offset510, we NEED to read the next cluster to get the rest of the compressedstream - but at 512-byte clusters, there are 0 bits reserved for 'numbersectors'. Per the above calculations with the example offset of 510,nb_csectors would be 1 (it can't be anything else for a 512-byte clusterimage), and csize would then be 2 bytes, which is insufficient forreading back enough data to reconstruct the cluster.

In fact, here's a demonstration of a discrepancy, where qemu-img andJohn's qcheck tool [1] disagree about the validity of an image createdby qemu (although it may just be that qcheck hasn't yet learned aboutcompressed clusters):


[1]https://github.com/jnsnow/qcheck

$ f=12345678
$ f=$f$f$f$f # 32
$ f=$f$f$f$f # 128
$ f=$f$f$f$f # 512
$ f=$f$f$f$f # 2k
$ f=$f$f$f$f # 8k
$ f=$f$f$f$f # 32k
$ f=$f$f$f$f # 128k
$ printf "$f" > data
$ qemu-img convert -c -f raw -O qcow2 -o cluster_size=512 \
  data data.qcow2
$ qemu-img check data.qcow2
No errors were found on the image.
256/256 = 100.00% allocated, 100.00% fragmented, 100.00% compressed clusters
Image end offset: 18944
$ ./qcheck data.qcow2
...
== L2 Tables ==

== L2 cluster l1[0] : 0x0000000000000800 ==
Corrupt entries: 63; Non-zero entries: 64; Corrupt:Non-zero ratio: 0.984375
== L2 cluster l1[1] : 0x0000000000000e00 ==
Corrupt entries: 63; Non-zero entries: 64; Corrupt:Non-zero ratio: 0.984375
== L2 cluster l1[2] : 0x0000000000001400 ==
Corrupt entries: 63; Non-zero entries: 64; Corrupt:Non-zero ratio: 0.984375
== L2 cluster l1[3] : 0x0000000000001a00 ==
Corrupt entries: 63; Non-zero entries: 64; Corrupt:Non-zero ratio: 0.984375
L2 tables: Could not complete analysis, 257 problems found


== Reference Count Analysis ==

Refcount analysis: 00 vacant clusters
Refcount analysis: 04 leaked clusters
Refcount analysis: 00 ghost clusters
Refcount analysis: 04 miscounted clusters
Refcount analysis: 8 problems found


== Cluster Counts ==

Metadata: 0x1000
Data: 0x800
Leaked: 0x800
Vacant: 0x0
total: 0x2000
qcheck: 73 problems found

Not true. It is (cluster_bits - 9) or (cluster_size / 512). Remember,x = 62 - (cluster_bits - 8); for a 512-byte cluster, x = 61. The'number sectors' field is then bits x+1 - 61 (but you can't have abitfield occupying bit 62 upto 61; especially since bit 62 is the bitfor compressed cluster).

So instead of blindly reading the spec, I'm now going to single-steppingthrough the 'qemu-img convert' command line above, to see what REALLYhappens:


Line numbers from commit a6e0344fa0

$ gdb --args ./qemu-img convert -c -f raw -O qcow2 -o cluster_size=512data data.qcow2

...
(gdb) b qcow2_alloc_bytes
Breakpoint 1 at 0x57610: file block/qcow2-refcount.c, line 1052.
(gdb) r
Thread 1 "qemu-img" hit Breakpoint 1, qcow2_alloc_bytes (
    address@hidden, address@hidden)
    at block/qcow2-refcount.c:1052
1052    {
(gdb)

So we are compressing 512 bytes down to 15 every time, which means thatafter 34 clusters compressed, we should be at offset 510. Let's resumedebugging:


(gdb) c 34
Will ignore next 33 crossings of breakpoint 1.  Continuing.
[Thread 0x7fffe3cfe700 (LWP 32229) exited]
[New Thread 0x7fffe3cfe700 (LWP 32300)]
[New Thread 0x7fffe25ed700 (LWP 32301)]

Thread 1 "qemu-img" hit Breakpoint 1, qcow2_alloc_bytes (
    address@hidden, address@hidden)
    at block/qcow2-refcount.c:1052
1052    {
(gdb) n
1053        BDRVQcow2State *s = bs->opaque;
(gdb)
1058        BLKDBG_EVENT(bs->file, BLKDBG_CLUSTER_ALLOC_BYTES);
(gdb)
1059        assert(size > 0 && size <= s->cluster_size);
(gdb) p s->free_byte_offset
$2 = 3070
(gdb) p 3070%512
$3 = 510
...
(gdb)
1076        free_in_cluster = s->cluster_size - offset_into_cluster(s, offset);
(gdb)
1078            if (!offset || free_in_cluster < size) {
(gdb) p free_in_cluster
$4 = 2

1079 int64_t new_cluster = alloc_clusters_noref(bs,s->cluster_size);

(gdb)
1080                if (new_cluster < 0) {
(gdb)
1084                if (new_cluster == 0) {
(gdb)

1091 if (!offset || ROUND_UP(offset, s->cluster_size) !=new_cluster) {

(gdb)
1095                    free_in_cluster += s->cluster_size;
(gdb)
1099            assert(offset);

so we got a contiguous cluster, and our goal is to let the caller bleedthe compressed cluster into to the tail of the current sector and intothe head of the next cluster. Continuing:


(gdb) fin
Run till exit from #0  qcow2_alloc_bytes (address@hidden,
    address@hidden) at block/qcow2-refcount.c:1118
[Thread 0x7fffe25ed700 (LWP 32301) exited]
[Thread 0x7fffe3cfe700 (LWP 32300) exited]
qcow2_alloc_compressed_cluster_offset (address@hidden,
    address@hidden, address@hidden)
    at block/qcow2-cluster.c:768
768         if (cluster_offset < 0) {
Value returned is $5 = 3070

(gdb) n
773         nb_csectors = ((cluster_offset + compressed_size - 1) >> 9) -
(gdb)
774                       (cluster_offset >> 9);
(gdb)
773         nb_csectors = ((cluster_offset + compressed_size - 1) >> 9) -
(gdb)
777                           ((uint64_t)nb_csectors << s->csize_shift);
(gdb) l
772     
773         nb_csectors = ((cluster_offset + compressed_size - 1) >> 9) -
774                       (cluster_offset >> 9);
775     
776         cluster_offset |= QCOW_OFLAG_COMPRESSED |
777                           ((uint64_t)nb_csectors << s->csize_shift);
778     
779         /* update L2 table */
780     
781         /* compressed clusters never have the copied flag */
(gdb) p nb_csectors
$6 = 1
(gdb) p s->csize_shift
$7 = 61
(gdb) p/x cluster_offset
$8 = 0xbfe
(gdb) n
776         cluster_offset |= QCOW_OFLAG_COMPRESSED |
(gdb)
783         BLKDBG_EVENT(bs->file, BLKDBG_L2_UPDATE_COMPRESSED);
(gdb) p/x cluster_offset
$9 = 0x6000000000000bfe

Where is s->csize_shift initialized?  In qcow2_do_open():

    s->csize_shift = (62 - (s->cluster_bits - 8));
    s->csize_mask = (1 << (s->cluster_bits - 8)) - 1;
    s->cluster_offset_mask = (1LL << s->csize_shift) - 1;

Revisiting the wording in the spec:

Compressed Clusters Descriptor (x = 62 - (cluster_bits - 8)):

    Bit  0 -  x:    Host cluster offset. This is usually _not_ aligned to a
                    cluster boundary!

       x+1 - 61:    Compressed size of the images in sectors of 512 bytes

which says bits 0-61 are the host cluster offset, and 62-61 is thenumber of sectors. But our code sets s->csize_shift to divide thisdifferently, at 0-60 and 61-61. Which means your earlier claim thatthere are enough 'number sector' bits to allow for up to 2*cluster_sizeas the size of the compressed stream (rather than my claim of exactlycluster_size) is right, and other implementations CAN inflate a cluster(if we don't document otherwise), and that even if they DON'T inflate,they can STILL cause a read larger than a cluster size when the offsetis near the tail of one sector (most likely at 512-byte clusters, butremotely possible at other cluster sizes as well).


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] [PATCH 0/2] qcow2: minor compression improvements, Eric Blake, 2018/02/20
- [Qemu-devel] [PATCH 1/2] qcow2: Prefer byte-based calls into bs->file, Eric Blake, 2018/02/20
  - Re: [Qemu-devel] [PATCH 1/2] qcow2: Prefer byte-based calls into bs->file, Alberto Garcia, 2018/02/21
- [Qemu-devel] [PATCH 2/2] qcow2: Avoid memory over-allocation on compressed images, Eric Blake, 2018/02/20
  - Re: [Qemu-devel] [PATCH 2/2] qcow2: Avoid memory over-allocation on compressed images, Alberto Garcia, 2018/02/21
    - Re: [Qemu-devel] [PATCH 2/2] qcow2: Avoid memory over-allocation on compressed images, Eric Blake, 2018/02/21
    - Re: [Qemu-devel] [PATCH 2/2] qcow2: Avoid memory over-allocation on compressed images, Alberto Garcia, 2018/02/21
    - Re: [Qemu-devel] [PATCH 2/2] qcow2: Avoid memory over-allocation on compressed images, Eric Blake <=
    - Re: [Qemu-devel] [PATCH 2/2] qcow2: Avoid memory over-allocation on compressed images, John Snow, 2018/02/21
  - Re: [Qemu-devel] [PATCH 2/2] qcow2: Avoid memory over-allocation on compressed images, Kevin Wolf, 2018/02/21
    - Re: [Qemu-devel] [PATCH 2/2] qcow2: Avoid memory over-allocation on compressed images, Eric Blake, 2018/02/21
    - Re: [Qemu-devel] [PATCH 2/2] qcow2: Avoid memory over-allocation on compressed images, Kevin Wolf, 2018/02/21
    - Re: [Qemu-devel] [PATCH 2/2] qcow2: Avoid memory over-allocation on compressed images, Eric Blake, 2018/02/21
    - Re: [Qemu-devel] [PATCH 2/2] qcow2: Avoid memory over-allocation on compressed images, Kevin Wolf, 2018/02/21
    - Re: [Qemu-devel] [PATCH 2/2] qcow2: Avoid memory over-allocation on compressed images, Alberto Garcia, 2018/02/22

Prev by Date: Re: [Qemu-devel] [PATCH v1 0/3] s390x/sclp: 64 bit event masks
Next by Date: Re: [Qemu-devel] [PATCH v7 09/23] monitor: allow using IO thread for parsing
Previous by thread: Re: [Qemu-devel] [PATCH 2/2] qcow2: Avoid memory over-allocation on compressed images
Next by thread: Re: [Qemu-devel] [PATCH 2/2] qcow2: Avoid memory over-allocation on compressed images
Index(es):
- Date
- Thread