qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [PATCH v7 6/6] qcow2: Avoid memory over-allocation on c


From: Kevin Wolf
Subject: Re: [Qemu-block] [PATCH v7 6/6] qcow2: Avoid memory over-allocation on compressed images
Date: Fri, 29 Jun 2018 11:03:17 +0200
User-agent: Mutt/1.9.1 (2017-09-22)

Am 28.06.2018 um 21:07 hat Eric Blake geschrieben:
> When reading a compressed image, we were allocating s->cluster_data
> to 32*cluster_size + 512 (possibly over 64 megabytes, for an image
> with 2M clusters).  Let's check out the history:
> 
> Back when qcow2 was first written, we used s->cluster_data for
> everything, including copy_sectors() and encryption, where we want
> to operate on more than one cluster at once.  Obviously, at that
> point, the buffer had to be aligned for other users, even though
> compression itself doesn't require any alignment (the fact that
> the compressed data generally starts mid-sector means that aligning
> our buffer buys us nothing - either the protocol already supports
> byte-based access into whatever offset we want, or we are already
> using a bounce buffer to read a full sector, and copying into
> our destination no longer requires alignment).
> 
> But commit 1b9f1491 (v1.1!) changed things to allocate parallel
> buffers on demand rather than sharing a single buffer, for encryption
> and COW, leaving compression as the final client of s->cluster_data.
> That use was still preserved, because if a single compressed cluster
> is read more than once, we reuse the cache instead of decompressing
> it a second time (someday, we may come up with better caching to
> avoid wasting repeated decompressions while still being more parallel,
> but that is a task for another patch; the XXX comment in
> qcow2_co_preadv for QCOW2_CLUSTER_COMPRESSED is telling).
> 
> Much later, in commit de82815d (v2.2), we noticed that a 64M
> allocation is prone to failure, so we switched over to a graceful
> memory allocation error message.  Elsewhere in the code, we do
> g_malloc(2 * cluster_size) without ever checking for failure, but
> even 4M starts to be large enough that trying to be nice is worth
> the effort, so we want to keep that aspect.
> 
> Then even later, in 3e4c7052 (2.11), we realized that allocating
> a large buffer up front for every qcow2 image is expensive, and
> switched to lazy allocation only for images that actually had
> compressed clusters.  But in the process, we never even bothered
> to check whether what we were allocating still made sense in its
> new context!
> 
> So, it's time to cut back on the waste.  A compressed cluster
> written by qemu will NEVER occupy more than an uncompressed
> cluster, but based on mid-sector alignment, we may still need
> to read 1 cluster + 1 sector in order to recover enough bytes
> for the decompression.  Meanwhile, third-party producers of
> qcow2 may not be as smart, and gzip DOES document that because
> the compression stream adds metadata, and because of the
> pigeonhole principle, there are worst case scenarios where
> attempts to compress will actually inflate an image, by up to
> 0.015% (at most 1 sector larger for an unfortunate 2M
> compression), meaning we should realistically expect to
> sometimes need to read 1 cluster + 2 sectors.
> 
> However, the qcow2 spec permits an all-ones sector count, plus
> a full sector containing the initial offset, for a maximum read
> of 2 full clusters.  Thanks to the way decompression works, even
> though such a value is too large for the actual compressed data
> (for all but 512-byte clusters), it really doesn't matter if we
> read too much (gzip ignores slop, once it has decoded a full
> cluster).  So it's easier to just allocate cluster_data to be as
> large as we can ever possibly see; even if it still wastes up to
> 2M on any image created by qcow2, that's still an improvment of
> 60M less waste than pre-patch.
> 
> Signed-off-by: Eric Blake <address@hidden>
> Reviewed-by: Alberto Garcia <address@hidden>
> 
> ---
> v5: fix math error in commit message [Max]
> v4: fix off-by-one in assertion and commit message [Berto]
> v3: tighten allocation [Berto]
> v2: actually check allocation failure (previous version meant
> to use g_malloc, but ended up posted with g_try_malloc without
> checking); add assertions outside of conditional, improve
> commit message to better match reality now that qcow2 spec bug
> has been fixed
> ---
>  block/qcow2-cluster.c | 27 ++++++++++++++++++---------
>  block/qcow2.c         |  2 +-
>  2 files changed, 19 insertions(+), 10 deletions(-)
> 
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index 4701fbc7a12..b98fe683726 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -1599,20 +1599,29 @@ int qcow2_decompress_cluster(BlockDriverState *bs, 
> uint64_t cluster_offset)
>          sector_offset = coffset & 511;
>          csize = nb_csectors * 512 - sector_offset;
> 
> -        /* Allocate buffers on first decompress operation, most images are
> -         * uncompressed and the memory overhead can be avoided.  The buffers
> -         * are freed in .bdrv_close().
> +        /* Allocate buffers on the first decompress operation; most
> +         * images are uncompressed and the memory overhead can be
> +         * avoided.  The buffers are freed in .bdrv_close().  qemu
> +         * never writes an inflated cluster, and gzip itself never
> +         * inflates a problematic cluster by more than 0.015%, but the
> +         * qcow2 format allows up to 1 byte short of 2 full clusters
> +         * when including the sector containing offset.  gzip ignores
> +         * trailing slop, so just allocate that much up front rather
> +         * than reject third-party images with overlarge csize.
>           */
> +        assert(!!s->cluster_data == !!s->cluster_cache);
> +        assert(csize <= 2 * s->cluster_size);
>          if (!s->cluster_data) {
> -            /* one more sector for decompressed data alignment */
> -            s->cluster_data = qemu_try_blockalign(bs->file->bs,
> -                    QCOW_MAX_CRYPT_CLUSTERS * s->cluster_size + 512);
> +            s->cluster_data = g_try_malloc(2 * s->cluster_size);
>              if (!s->cluster_data) {
>                  return -ENOMEM;
>              }
> -        }
> -        if (!s->cluster_cache) {
> -            s->cluster_cache = g_malloc(s->cluster_size);
> +            s->cluster_cache = g_try_malloc(s->cluster_size);
> +            if (!s->cluster_cache) {
> +                g_free(s->cluster_data);
> +                s->cluster_data = NULL;
> +                return -ENOMEM;
> +            }
>          }

I wonder how much of a difference s->cluster_cache really makes. I
wouldn't expect guests to access the same cluster twice in a row.

Maybe the easiest solution would be switching to temporary buffers that
would have the exact size we need and would be freed at the end of the
request?

Kevin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]