qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC] Re-evaluating subcluster allocation for qcow2 ima


From: Alberto Garcia
Subject: Re: [Qemu-devel] [RFC] Re-evaluating subcluster allocation for qcow2 images
Date: Thu, 27 Jun 2019 17:38:56 +0200
User-agent: Notmuch/0.18.2 (http://notmuchmail.org) Emacs/24.4.1 (i586-pc-linux-gnu)

On Thu 27 Jun 2019 04:19:25 PM CEST, Denis Lunev wrote:

> Right now QCOW2 is not very efficient with default cluster size (64k)
> for fast performance with big disks. Nowadays ppl uses really BIG
> images and 1-2-3-8 Tb disks are really common. Unfortunately ppl want
> to get random IO fast too.  Thus metadata cache should be in memory as
> in the any other case we will get IOPSes halved (1 operation for
> metadata cache read and one operation for real read). For 8 Tb image
> this results in 1 Gb RAM for that. For 1 Mb cluster we get 64 Mb which
> is much more reasonable.

Correct, the L2 metadata size is a well-known problem that has been
discussed extensively, and that has received plenty of attention.

> Though with 1 Mb cluster the reclaim process becomes much-much
> worse. I can not give exact number, unfortunately.  AFAIR the image
> occupies 30-50% more space. Guys, I would appreciate if you will
> correct me here with real numbers.

Correct, because the cluster size is the smallest unit of allocation, so
a 16KB write on an empty area of the image will always allocate a
complete 1MB cluster.

> Thus in respect to this patterns subclusters could give us benefits of
> fast random IO and good reclaim rate.

Exactly, but that fast random I/O would only happen when allocating new
clusters. Once the clusters are allocated it doesn't provide any
additional performance benefit.

> I would consider 64k cluster/8k subcluster as too extreme for me.  In
> reality we would end up with completely fragmented image very soon.

You mean because of the 64k cluster size, or because of the 8k
subcluster size? If it's the former, yes. If it's the latter, it can be
solved by preallocating the cluster with fallocate(). But then you would
lose the benefit of the good reclaim rate.

Berto



reply via email to

[Prev in Thread] Current Thread [Next in Thread]