qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 02/23] block: New BlockBackend


From: Markus Armbruster
Subject: Re: [Qemu-devel] [PATCH 02/23] block: New BlockBackend
Date: Thu, 11 Sep 2014 12:03:56 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux)

Kevin Wolf <address@hidden> writes:

> Am 10.09.2014 um 10:13 hat Markus Armbruster geschrieben:
>> A block device consists of a frontend device model and a backend.
>> 
>> A block backend has a tree of block drivers doing the actual work.
>> The tree is managed by the block layer.
>> 
>> We currently use a single abstraction BlockDriverState both for tree
>> nodes and the backend as a whole.  Drawbacks:
>> 
>> * Its API includes both stuff that makes sense only at the block
>>   backend level (root of the tree) and stuff that's only for use
>>   within the block layer.  This makes the API bigger and more complex
>>   than necessary.  Moreover, it's not obvious which interfaces are
>>   meant for device models, and which really aren't.
>> 
>> * Since device models keep a reference to their backend, the backend
>>   object can't just be destroyed.  But for media change, we need to
>>   replace the tree.  Our solution is to make the BlockDriverState
>>   generic, with actual driver state in a separate object, pointed to
>>   by member opaque.  That lets us replace the tree by deinitializing
>>   and reinitializing its root.  This special need of the root makes
>>   the data structure awkward everywhere in the tree.
>> 
>> The general plan is to separate the APIs into "block backend", for use
>> by device models, monitor and whatever other code dealing with block
>> backends, and "block driver", for use by the block layer and whatever
>> other code (if any) dealing with trees and tree nodes.
>> 
>> Code dealing with block backends, device models in particular, should
>> become completely oblivious of BlockDriverState.  This should let us
>> clean up both APIs, and the tree data structures.
>> 
>> This commit is a first step.  It creates a minimal "block backend"
>> API: type BlockBackend and functions to create, destroy and find them.
>> BlockBackend objects are created and destroyed, but not yet used for
>> anything; that'll come shortly.
>> 
>> BlockBackend is reference-counted.  Its reference count never exceeds
>> one so far, but that's going to change.
>> 
>> Signed-off-by: Markus Armbruster <address@hidden>
>> ---
>>  block/Makefile.objs            |   2 +-
>>  block/block-backend.c          | 110 
>> +++++++++++++++++++++++++++++++++++++++++
>>  blockdev.c                     |  10 +++-
>>  hw/block/xen_disk.c            |  11 +++++
>>  include/qemu/typedefs.h        |   1 +
>>  include/sysemu/block-backend.h |  26 ++++++++++
>>  qemu-img.c                     |  46 +++++++++++++++++
>>  qemu-io.c                      |   8 +++
>>  qemu-nbd.c                     |   3 +-
>>  9 files changed, 214 insertions(+), 3 deletions(-)
>>  create mode 100644 block/block-backend.c
>>  create mode 100644 include/sysemu/block-backend.h
>> 
>> diff --git a/block/Makefile.objs b/block/Makefile.objs
>> index f45f939..a70140b 100644
>> --- a/block/Makefile.objs
>> +++ b/block/Makefile.objs
>> @@ -5,7 +5,7 @@ block-obj-y += qed-check.o
>>  block-obj-$(CONFIG_VHDX) += vhdx.o vhdx-endian.o vhdx-log.o
>>  block-obj-$(CONFIG_QUORUM) += quorum.o
>>  block-obj-y += parallels.o blkdebug.o blkverify.o
>> -block-obj-y += snapshot.o qapi.o
>> +block-obj-y += block-backend.o snapshot.o qapi.o
>>  block-obj-$(CONFIG_WIN32) += raw-win32.o win32-aio.o
>>  block-obj-$(CONFIG_POSIX) += raw-posix.o
>>  block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o
>> diff --git a/block/block-backend.c b/block/block-backend.c
>> new file mode 100644
>> index 0000000..833f7d9
>> --- /dev/null
>> +++ b/block/block-backend.c
>> @@ -0,0 +1,110 @@
>> +/*
>> + * QEMU Block backends
>> + *
>> + * Copyright (C) 2014 Red Hat, Inc.
>> + *
>> + * Authors:
>> + *  Markus Armbruster <address@hidden>,
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2 or
>> + * later.  See the COPYING file in the top-level directory.
>> + */
>
> I think we still have the long-term plan of exposing a block layer
> library that can be consumed by libvirt. As the usage in qemu-io/img/nbd
> shows, this will probably have to use BlockBackends, so this code is part
> of the block layer core.
>
> Considering this, using the LGPL would be more practical. Can you please
> make this change for v2? (Personally, I would have used the MIT license
> that the rest of the block layer uses, which also make copying code
> around cleaner license-wise, but I know you dislike it.)

I do.

Having to accept the Lesser GPL's leaching loophole annoys me, but the
libvirt licensing boat has long sailed.

>> +#include "sysemu/block-backend.h"
>> +#include "block/block_int.h"
>> +
>> +struct BlockBackend {
>> +    char *name;
>> +    int refcnt;
>> +    QTAILQ_ENTRY(BlockBackend) link; /* for blk_backends */
>> +};
>> +
>> +static QTAILQ_HEAD(, BlockBackend) blk_backends =
>> +    QTAILQ_HEAD_INITIALIZER(blk_backends);
>> +
>> +/**
>> + * blk_new:
>> + * @name: name, must not be %NULL or empty
>> + * @errp: return location for an error to be set on failure, or %NULL
>> + *
>> + * Create a new BlockBackend, with a reference count of one.  Fail if
>> + * @name already exists.
>> + *
>> + * Returns: the BlockBackend on success, %NULL on failure
>> + */
>> +BlockBackend *blk_new(const char *name, Error **errp)
>> +{
>> +    BlockBackend *blk = g_new0(BlockBackend, 1);
>> +
>> +    assert(name && name[0]);
>> +    if (blk_by_name(name)) {
>> +        error_setg(errp, "Device with id '%s' already exists", name);
>> +        return NULL;
>
> blk is leaked here.

Fixed.

>> +    }
>> +    blk->name = g_strdup(name);
>> +    blk->refcnt = 1;
>> +    QTAILQ_INSERT_TAIL(&blk_backends, blk, link);
>> +    return blk;
>> +}
>> +
>> +static void blk_delete(BlockBackend *blk)
>> +{
>> +    assert(!blk->refcnt);
>> +    QTAILQ_REMOVE(&blk_backends, blk, link);
>> +    g_free(blk->name);
>> +    g_free(blk);
>> +}
>> +
>> +/**
>> + * blk_ref:
>> + *
>> + * Increment @blk's reference count.
>> + */
>> +void blk_ref(BlockBackend *blk)
>> +{
>> +    blk->refcnt++;
>> +}
>> +
>> +/**
>> + * blk_unref:
>> + *
>> + * Decrement @blk's reference count.  If this drops it to zero,
>> + * destroy @blk.
>> + */
>> +void blk_unref(BlockBackend *blk)
>> +{
>> +    if (blk) {
>> +        g_assert(blk->refcnt > 0);
>
> You're mixing assert() and g_assert() in this patch. Any reason for
> this?

Stupidity?

>       If not, I think plain assert() is clearly in the majority in the
> overall codebase.

Fixed.

>> +        if (!--blk->refcnt) {
>> +            blk_delete(blk);
>> +        }
>> +    }
>> +}
>> +
>> +const char *blk_name(BlockBackend *blk)
>> +{
>> +    return blk->name;
>> +}
>> +
>> +BlockBackend *blk_by_name(const char *name)
>> +{
>> +    BlockBackend *blk;
>> +
>> +    QTAILQ_FOREACH(blk, &blk_backends, link) {
>> +        if (!strcmp(name, blk->name)) {
>> +            return blk;
>> +        }
>> +    }
>> +    return NULL;
>> +}
>
> No comment for these two non-static functions?

I considered the abysmal signal-to-noise ratio of their GTK-Doc-style
function comments, and balked.

Considering we're not using this style in the block layer much, what do
you think about me abandoning this GTK-doc business, and adding
*concise* function comments to all my new public functions instead?

>> +/**
>> + * blk_next:
>> + *
>> + * Returns: the first BlockBackend if @blk is null, else @blk's next
>> + * sibling, which is %NULL for the last BlockBackend
>> + */
>> +BlockBackend *blk_next(BlockBackend *blk)
>> +{
>> +    return blk ? QTAILQ_NEXT(blk, link) : QTAILQ_FIRST(&blk_backends);
>> +}
>> diff --git a/blockdev.c b/blockdev.c
>> index 9fbd888..86596bc 100644
>> --- a/blockdev.c
>> +++ b/blockdev.c
>
> Okay, so here the hard part starts: As long as the BB is completely
> unused, it's very hard to review at which places one must be created and
> deleted.
>
> What was your approach to systematically find all of them?

Good question!  Fortunately, I have an answer ready :)

We want to create a BB exactly when we're creating a named BDS.  "Named"
in the sense of "in bdrv_states".

We want to destroy a BB exactly when we're destroying the BDS that
motivated its creation.

This is a baby step towards having named BDSes owned by a BB.  That'll
be done by PATCH 05.

The places creating a named BDS are all clearly visible in PATCH 01,
because I rename the function doing that to bdrv_new_named().

This patch adds a blk_new() right next to every bdrv_new_named(), except
for qemu-img.c.  qemu-img.c calls bdrv_new_named() in bdrv_new_open().
I can't easily call blk_new() there, because the callers need the new BB
to be able to destroy it, but I can't easily return the new BB in
addition to the new BDS.  So I call blk_new() right before every
bdrv_new_open() instead.

BB destruction isn't quite as obvious, because destruction of named and
nameless BDSes looks the same in the code.  Either you examine all
bdrv_unref() and figure out whether it's named, and if yes, where you
can get the BB you need to unref here.  Or you figure out for every
allocation of a named BDS where it can be destroyed, and add the BB
destruction there.  That's what I did.

>> @@ -30,6 +30,7 @@
>>   * THE SOFTWARE.
>>   */
>>  
>> +#include "sysemu/block-backend.h"
>>  #include "sysemu/blockdev.h"
>>  #include "hw/block/block.h"
>>  #include "block/blockjob.h"
>> @@ -221,6 +222,7 @@ void drive_del(DriveInfo *dinfo)
>>      }
>>  
>>      bdrv_unref(dinfo->bdrv);
>> +    blk_unref(blk_by_name(dinfo->id));
>>      g_free(dinfo->id);
>>      QTAILQ_REMOVE(&drives, dinfo, next);
>>      g_free(dinfo->serial);
>> @@ -301,6 +303,7 @@ static DriveInfo *blockdev_init(const char *file, QDict 
>> *bs_opts,
>>      int ro = 0;
>>      int bdrv_flags = 0;
>>      int on_read_error, on_write_error;
>> +    BlockBackend *blk;
>>      DriveInfo *dinfo;
>>      ThrottleConfig cfg;
>>      int snapshot = 0;
>> @@ -456,6 +459,10 @@ static DriveInfo *blockdev_init(const char *file, QDict 
>> *bs_opts,
>>      }
>>  
>>      /* init */
>> +    blk = blk_new(qemu_opts_id(opts), errp);
>> +    if (!blk) {
>> +        goto early_err;
>> +    }
>>      dinfo = g_malloc0(sizeof(*dinfo));
>>      dinfo->id = g_strdup(qemu_opts_id(opts));
>>      dinfo->bdrv = bdrv_new_named(dinfo->id, &error);
>> @@ -525,6 +532,7 @@ err:
>>  bdrv_new_err:
>>      g_free(dinfo->id);
>>      g_free(dinfo);
>> +    blk_unref(blk);
>>  early_err:
>>      qemu_opts_del(opts);
>>  err_no_opts:
>> @@ -1770,7 +1778,7 @@ int do_drive_del(Monitor *mon, const QDict *qdict, 
>> QObject **ret_data)
>>       */
>>      if (bdrv_get_attached_dev(bs)) {
>>          bdrv_make_anon(bs);
>> -
>> +        blk_unref(blk_by_name(id));
>>          /* Further I/O must not pause the guest */
>>          bdrv_set_on_error(bs, BLOCKDEV_ON_ERROR_REPORT,
>>                            BLOCKDEV_ON_ERROR_REPORT);
>
> Won't we unref the BB a second time now when unplugging the device?
> (drive_del() called in blockdev_auto_del())

Short answer: you're right, there's a bug, and I'll fix it.

Long answer: this part is hairy, because the drive_del command is badly
designed.

For historical reasons, unplugging a device model destroys the block
backends it's attached to, and this is the only way to destroy block
backends.

Aside: we're not carrying that misfeature forward to blockdev-add.

For some device models, the guest can prevent unplug.  Some users need a
way to forcibly revoke device model access to the block backend then, so
the underlying images can be safely used for something else.

drive_del lets you do that.  Unfortunately, it conflates revoking access
with destroying the backend.

Commit 9063f81 makes drive_del immediately destroy the root BDS.  Nice:
the device name becomes available for reuse immediately.  Not so nice:
the device model's pointer to the root BDS dangles, and we're prone to
crash when the memory gets reused.

Commit d22b2f4 fixed that by hiding the root BDS instead of destroying
it.  Destruction only happens on unplug.  "Hiding" means removing it
from bdrv_states and graph_bdrv_states; see bdrv_make_anon().

We should've limited the command to revoking access, avoiding this silly
hiding business.

The obvious thing to do here is match the mess: hide the BB along with
the BDS here, delete it in blockdev_auto_del().

Trouble is that hiding it makes it hard to find in blockdev_auto_del().

I tried to avoid the need to find it there by destroying it here.  On
unplug, drive_del()'s blk_unref(blk_by_name(dinfo->id)) won't do
anything, because blk_by_name() returns NULL.  *Except* when the user
has since added *another* BB with the same name!  Oops...

Simplest possible solution: I hide the BB here, and *leak* it (with a
fat FIXME comment) until it becomes easy enough to find.  I guess I can
find it right in the next patch.

>> diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
>> index 8bac7ff..730a021 100644
>> --- a/hw/block/xen_disk.c
>> +++ b/hw/block/xen_disk.c
>> @@ -39,6 +39,7 @@
>>  #include "hw/xen/xen_backend.h"
>>  #include "xen_blkif.h"
>>  #include "sysemu/blockdev.h"
>> +#include "sysemu/block-backend.h"
>>  
>>  /* ------------------------------------------------------------- */
>>  
>> @@ -852,12 +853,18 @@ static int blk_connect(struct XenDevice *xendev)
>>      blkdev->dinfo = drive_get(IF_XEN, 0, index);
>>      if (!blkdev->dinfo) {
>>          Error *local_err = NULL;
>> +        BlockBackend *blk;
>>          BlockDriver *drv;
>>  
>>          /* setup via xenbus -> create new block driver instance */
>>          xen_be_printf(&blkdev->xendev, 2, "create new bdrv (xenbus 
>> setup)\n");
>> +        blk = blk_new(blkdev->dev, NULL);
>> +        if (!blk) {
>> +            return -1;
>> +        }
>>          blkdev->bs = bdrv_new_named(blkdev->dev, NULL);
>>          if (!blkdev->bs) {
>> +            blk_unref(blk);
>>              return -1;
>>          }
>>  
>> @@ -868,6 +875,7 @@ static int blk_connect(struct XenDevice *xendev)
>>                            error_get_pretty(local_err));
>>              error_free(local_err);
>>              bdrv_unref(blkdev->bs);
>> +            blk_unref(blk);
>>              blkdev->bs = NULL;
>>              return -1;
>>          }
>> @@ -983,6 +991,9 @@ static void blk_disconnect(struct XenDevice *xendev)
>>      if (blkdev->bs) {
>>          bdrv_detach_dev(blkdev->bs, blkdev);
>>          bdrv_unref(blkdev->bs);
>> +        if (!blkdev->dinfo) {
>> +            blk_unref(blk_by_name(blkdev->dev));
>> +        }
>>          blkdev->bs = NULL;
>>      }
>>      xen_be_unbind_evtchn(&blkdev->xendev);
>> diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
>> index 5f20b0e..198da2e 100644
>> --- a/include/qemu/typedefs.h
>> +++ b/include/qemu/typedefs.h
>> @@ -35,6 +35,7 @@ typedef struct MachineClass MachineClass;
>>  typedef struct NICInfo NICInfo;
>>  typedef struct HCIInfo HCIInfo;
>>  typedef struct AudioState AudioState;
>> +typedef struct BlockBackend BlockBackend;
>>  typedef struct BlockDriverState BlockDriverState;
>>  typedef struct DriveInfo DriveInfo;
>>  typedef struct DisplayState DisplayState;
>> diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
>> new file mode 100644
>> index 0000000..3f8371c
>> --- /dev/null
>> +++ b/include/sysemu/block-backend.h
>> @@ -0,0 +1,26 @@
>> +/*
>> + * QEMU Block backends
>> + *
>> + * Copyright (C) 2014 Red Hat, Inc.
>> + *
>> + * Authors:
>> + *  Markus Armbruster <address@hidden>,
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2 or
>> + * later.  See the COPYING file in the top-level directory.
>> + */
>> +
>> +#ifndef BLOCK_BACKEND_H
>> +#define BLOCK_BACKEND_H
>> +
>> +#include "qemu/typedefs.h"
>> +#include "qapi/error.h"
>> +
>> +BlockBackend *blk_new(const char *name, Error **errp);
>> +void blk_ref(BlockBackend *blk);
>> +void blk_unref(BlockBackend *blk);
>> +const char *blk_name(BlockBackend *blk);
>> +BlockBackend *blk_by_name(const char *name);
>> +BlockBackend *blk_next(BlockBackend *blk);
>> +
>> +#endif
>> diff --git a/qemu-img.c b/qemu-img.c
>> index 4490a22..bad3f64 100644
>> --- a/qemu-img.c
>> +++ b/qemu-img.c
>
> Won't comment on each hunk in qemu-img, but in many cases, on
> bdrv_new_open() failure, blk is leaked.

I'll check them systematically.

>> diff --git a/qemu-nbd.c b/qemu-nbd.c
>> index a56ebfc..94b9b49 100644
>> --- a/qemu-nbd.c
>> +++ b/qemu-nbd.c
>> @@ -17,7 +17,7 @@
>>   */
>>  
>>  #include "qemu-common.h"
>> -#include "block/block.h"
>> +#include "sysemu/block-backend.h"
>>  #include "block/block_int.h"
>>  #include "block/nbd.h"
>>  #include "qemu/main-loop.h"
>> @@ -687,6 +687,7 @@ int main(int argc, char **argv)
>>          drv = NULL;
>>      }
>>  
>> +    blk_new("hda", &error_abort);
>>      bs = bdrv_new_named("hda", &error_abort);
>>  
>>      srcpath = argv[optind];
>
> Where is the matching blk_unref?

Right next to the bdrv_unref(): nowhere :)

If you like, I can throw in a preliminary patch adding the bdrv_unref().
Then add the matching blk_unref() in patch.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]