qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: QEMU RBD is slow with QCOW2 images


From: Stefano Garzarella
Subject: Re: QEMU RBD is slow with QCOW2 images
Date: Thu, 4 Mar 2021 18:32:54 +0100

On Thu, Mar 04, 2021 at 03:59:17PM +0100, Kevin Wolf wrote:
Am 04.03.2021 um 15:08 hat Stefano Garzarella geschrieben:
On Thu, Mar 04, 2021 at 01:05:02PM +0100, Kevin Wolf wrote:
> Am 03.03.2021 um 18:40 hat Stefano Garzarella geschrieben:
> > Hi Jason,
> > as reported in this BZ [1], when qemu-img creates a QCOW2 image on RBD
> > writing data is very slow compared to a raw file.
> >
> > Comparing raw vs QCOW2 image creation with RBD I found that we use a
> > different object size, for the raw file I see '4 MiB objects', for
> > QCOW2 I
> > see '64 KiB objects' as reported on comment 14 [2].
> > This should be the main issue of slowness, indeed forcing in the code 4 MiB
> > object size also for QCOW2 increased the speed a lot.
> >
> > Looking better I discovered that for raw files, we call rbd_create() with
> > obj_order = 0 (if 'cluster_size' options is not defined), so the default
> > object size is used.
> > Instead for QCOW2, we use obj_order = 16, since the default 'cluster_size'
> > defined for QCOW2, is 64 KiB.
>
> Hm, the QemuOpts-based image creation is messy, but why does the rbd
> driver even see the cluster_size option?
>
> The first thing qcow2_co_create_opts() does is splitting the passed
> QemuOpts into options it will process on the qcow2 layer and options
> that are passed to the protocol layer. So if you pass a cluster_size
> option, qcow2 should take it for itself and not pass it to rbd.
>
> If it is passed to rbd, I think that's a bug in the qcow2 driver.

IIUC qcow2 properyl remove it, but when rbd uses qemu_opt_get_size_del(opts,
BLOCK_OPT_CLUSTER_SIZE, 0) the default value of qcow2 format is returned.

Going in depth in qemu_opt_get_size_helper(), I found that qemu_opt_find()
properly returns a NULL pointer, but then we call find_default_by_name()
that returns the default value of qcow2 format (64k).

Ugh, I see why. We're passing the protocol driver a QemuOpts that was
created for a QemuOptsList with the qcow2 default, not for its own
QemuOptsList. This is wrong.

Note that the QemuOptsList is not qcow2_create_opts itself, but a list
that is created with qemu_opts_append() to combine qcow2 and rbd options
into a new QemuOptsList. For overlapping options, the format wins.

I don't think you can change the QemuOptsList of an existing QemuOpts,
nor is there a clone operation that could just copy all options into a
new QemuOpts created for the rbd QemuOptsList, so maybe the easiest
hack^Wsolution would be converting to QDict and back...

Do you mean something like this? (I'll send a proper patch when everything is a little clearer to me :-)

diff --git a/block.c b/block.c
index a1f3cecd75..74b02b32dc 100644
--- a/block.c
+++ b/block.c
@@ -671,13 +671,33 @@ out:
 int bdrv_create_file(const char *filename, QemuOpts *opts, Error **errp)
 {
     BlockDriver *drv;
+    QemuOpts *new_opts;
+    QDict *qdict;
+    int ret;

     drv = bdrv_find_protocol(filename, true, errp);
     if (drv == NULL) {
         return -ENOENT;
     }

-    return bdrv_create(drv, filename, opts, errp);
+    if (!drv->create_opts) {
+        error_setg(errp, "Driver '%s' does not support image creation",
+                   drv->format_name);
+        return -ENOTSUP;
+    }
+
+    qdict = qemu_opts_to_qdict(opts, NULL);
+    new_opts = qemu_opts_from_qdict(drv->create_opts, qdict, errp);
+    if (new_opts == NULL) {
+        ret = -EINVAL;
+        goto out;
+    }
+
+    ret = bdrv_create(drv, filename, new_opts, errp);
+out:
+    qemu_opts_del(new_opts);
+    qobject_unref(qdict);
+    return ret;
 }


> > Using '-o cluster_size=2M' with qemu-img changed only the qcow2 cluster
> > size, since in qcow2_co_create_opts() we remove the 'cluster_size' from
> > QemuOpts calling qemu_opts_to_qdict_filtered().
> > For some reason that I have yet to understand, after this deletion, however
> > remains in QemuOpts the default value of 'cluster_size' for qcow2 (64 KiB),
> > that it's used in qemu_rbd_co_create_opts()
>
> So it seems you came to a similar conclusion. We need to find out where
> the 64k come from and just fix that so that rbd uses its default.

Yes, I tried debugging above, but I'm not sure how to fix it.

Maybe a new parameter in qemu_opt_get_size_helper() to prevent it from
looking for the default value.
Or we should prevent the default value from being added to the
opts->list->desc, but that part is still not very clear to me.

opts->list is already wrong, I think this is what we need to fix.

> > At this point my doubts are:
> > Does it make sense to use the same cluster_size as qcow2 as object_size in
> > RBD?
> > If we want to keep the 2 options separated, how can it be done? > > Should we
> > rename the option in block/rbd.c?
>
> My lazy answer is that you could just use QMP blockdev-create, where you
> create layer by layer separately.
>
> What could possibly be done for the QemuOpts is using the dotted syntax
> like for opening, so you could specify file.cluster_size=... for the
> protocol layer (or data_file.cluster_size=... for the external data
> file etc.)

This would be cool :-)

I'm almost sure that compatibility will make this more complicated than
it sounds, but we could have a try.

Eheh I see your point.

Thanks,
Stefano




reply via email to

[Prev in Thread] Current Thread [Next in Thread]