[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH 1/2] Fix Block Hotplug race with drive_del()
From: |
Markus Armbruster |
Subject: |
Re: [Qemu-devel] [PATCH 1/2] Fix Block Hotplug race with drive_del() |
Date: |
Wed, 10 Nov 2010 18:39:37 +0100 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/23.1 (gnu/linux) |
Ryan Harper <address@hidden> writes:
> * Markus Armbruster <address@hidden> [2010-11-10 06:48]:
>> One real question, and a couple of nits.
>>
>> Ryan Harper <address@hidden> writes:
>>
>> > Block hot unplug is racy since the guest is required to acknowlege the ACPI
>> > unplug event; this may not happen synchronously with the device removal
>> > command
>>
>> Well, I wouldn't call unplug "racy". It just takes an unpredictable
>> length of time, possibly forever. To make a race, you need to throw in
>> a client assuming (incorrectly) that unplug is instantaneous, as
>> described in your next paragraph.
>>
>> Moreover, all PCI unplug is that way, not just block.
>>
>> > This series aims to close a gap where by mgmt applications that assume the
>> > block resource has been removed without confirming that the guest has
>> > acknowledged the removal may re-assign the underlying device to a second
>> > guest
>> > leading to data leakage.
>>
>> Yes, the incorrect assumption is a problem. But with that fixed (in the
>> management application), we run right into the next problem: there is no
>> way for the management application to reliably disconnect the guest from
>> a block device. And that's the problem you're fixing.
>
> Yeah, that's the right way to word it; providing a method to forcibly
> disconnect the guest from the host device.
>>
>> > This series introduces a new montor command to decouple asynchornous device
>>
>> Typos "montor" and "asynchornous". You might want to use a spell
>> checker :)
>>
>> Lines are a bit long. Recommend wrap at column 70.
>>
>> > removal from restricting guest access to a block device. We do this by
>> > creating
>> > a new monitor command drive_del which maps to a bdrv_unplug() command which
>> > does a qemu_aio_flush; bdrv_flush() and bdrv_close(). Once complete,
>> > subsequent
>> > IO is rejected from the device and the guest will get IO errors but
>> > continue to
>> > function. In addition to preventing further IO, we clean up state pointers
>> > between host (BlockDriverState) and guest (DeviceInfo).
>> >
>> > A subsequent device removal command can be issued to remove the device, to
>> > which
>> > the guest may or maynot respond, but as long as the unplugged bit is set,
>> > no IO
>>
>> "maynot" is not a word.
>>
>> > will be sumbitted.
>>
>> This suggests to drive_del before device_del, which makes the device
>> goes through a "broken device" state on its way to unplug. If the guest
>> accesses the device in that state, it gets I/O errors. Not nice.
>>
>> Instead, I'd recommend device_del, wait for the device to go away,
>> drive_del on time out. If the guest reacts to the ACPI unplug promptly,
>> it's never exposed to the "broken device" state. Note: if the drive_del
>> fails because the device doesn't exist, we lost the race with the
>> automatic destruction, which is harmless. Ignore that error.
>
> Honestly, other than describing what happens if you sever the connection
> when the guest isn't aware of it; I don't want to try to capture how the
> mgmt layer implements the removal.
>
> One may want to force the disconnect before attempting to remove the
> device; or the other way around; that's really the mgmt layer's call.
Fair enough.
>> > Signed-off-by: Ryan Harper <address@hidden>
>> > ---
>> > block.c | 7 +++++++
>> > block.h | 1 +
>> > blockdev.c | 36 ++++++++++++++++++++++++++++++++++++
>> > blockdev.h | 1 +
>> > hmp-commands.hx | 18 ++++++++++++++++++
>> > 5 files changed, 63 insertions(+), 0 deletions(-)
>> >
>> > diff --git a/block.c b/block.c
>> > index 6b505fb..c76a796 100644
>> > --- a/block.c
>> > +++ b/block.c
>> > @@ -1328,6 +1328,13 @@ void bdrv_set_removable(BlockDriverState *bs, int
>> > removable)
>> > }
>> > }
>> >
>> > +void bdrv_unplug(BlockDriverState *bs)
>> > +{
>> > + qemu_aio_flush();
>> > + bdrv_flush(bs);
>> > + bdrv_close(bs);
>> > +}
>> > +
>>
>> Unless we expect more users, I'd inline this into its only caller.
>> Matter of taste.
>
> Works for me.
>
>>
>> > int bdrv_is_removable(BlockDriverState *bs)
>> > {
>> > return bs->removable;
>> > diff --git a/block.h b/block.h
>> > index 78ecfac..581414c 100644
>> > --- a/block.h
>> > +++ b/block.h
>> > @@ -171,6 +171,7 @@ void bdrv_set_on_error(BlockDriverState *bs,
>> > BlockErrorAction on_read_error,
>> > BlockErrorAction on_write_error);
>> > BlockErrorAction bdrv_get_on_error(BlockDriverState *bs, int is_read);
>> > void bdrv_set_removable(BlockDriverState *bs, int removable);
>> > +void bdrv_unplug(BlockDriverState *bs);
>> > int bdrv_is_removable(BlockDriverState *bs);
>> > int bdrv_is_read_only(BlockDriverState *bs);
>> > int bdrv_is_sg(BlockDriverState *bs);
>> > diff --git a/blockdev.c b/blockdev.c
>> > index 6cb179a..ee8c2ec 100644
>> > --- a/blockdev.c
>> > +++ b/blockdev.c
>> > @@ -14,6 +14,8 @@
>> > #include "qemu-option.h"
>> > #include "qemu-config.h"
>> > #include "sysemu.h"
>> > +#include "hw/qdev.h"
>> > +#include "block_int.h"
>> >
>> > static QTAILQ_HEAD(drivelist, DriveInfo) drives =
>> > QTAILQ_HEAD_INITIALIZER(drives);
>> >
>> > @@ -597,3 +599,37 @@ int do_change_block(Monitor *mon, const char *device,
>> > }
>> > return monitor_read_bdrv_key_start(mon, bs, NULL, NULL);
>> > }
>> > +
>> > +int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data)
>> > +{
>> > + const char *id = qdict_get_str(qdict, "id");
>> > + BlockDriverState *bs;
>> > + Property *prop;
>> > +
>> > + bs = bdrv_find(id);
>> > + if (!bs) {
>> > + qerror_report(QERR_DEVICE_NOT_FOUND, id);
>> > + return -1;
>> > + }
>> > +
>> > + /* quiesce block driver; prevent further io */
>> > + bdrv_unplug(bs);
>> > +
>> > + /* clean up guest state from pointing to host resource by
>> > + * finding and removing DeviceState "drive" property */
>> > + for (prop = bs->peer->info->props; prop && prop->name; prop++) {
>> > + if ((prop->info->type == PROP_TYPE_DRIVE) &&
>> > + (*(BlockDriverState **)qdev_get_prop_ptr(bs->peer, prop) ==
>> > bs)) {
>> > + if (prop->info->free) {
>> > + prop->info->free(bs->peer, prop);
>> > + }
Your use of prop->info->free() in this context is wrong. More below.
>>
>> Does this null the drive property? I doubt it. Quick check in the
>> debugger?
>>
>> The free callbacks generally don't zap the properties, because they run
>> from qdev_free().
>
> To be honest; I didn't see anything that looked like "remove this
> property" in the qdev api. Any pointers?
The closest we have is indeed the Property method free(), but that's not
quite right. It's really only for use by qdev_free().
> should I be calling qdev_free() on the dev?
No, because then the whole device is gone, not just the property :)
> I don't quite understand
> the distinction between the info list of properties and the device
> itself, nor specifically what we need to remove in the drive_del()
> operation versus the device_del() portion.
device_del / qdev_free() destroy a qdev, such as a "virtio-blk-pci"
device (C type VirtIOPCIProxy).
drive_del destroys something else, namely the block device host part
(BlockDriverState + DeviceInfo). Obviously, it needs to zap all
pointers to the host part along with it. Specifically, it needs to zap
the device's pointer to it.
Example: if a "virtio-blk-pci" device is using drive "foo", then
"drive_del foo" needs to zap its member block.bs.
Complication: we don't (want to) know what kind of device exactly is
using the drive. But we do know that a drive property must be
describing it.
So we search the properties (for (prop...)) for a drive property
(prop->info->type == PROP_TYPE_DRIVE) that points to this drive (... ==
bs).
Result:
BlockDriverState *bs;
Property *prop;
BlockDriverState **ptr;
[...]
for (prop = bs->peer->info->props; prop && prop->name; prop++) {
if ((prop->info->type == PROP_TYPE_DRIVE)) {
ptr = qdev_get_prop_ptr(dev, prop);
if (*ptr == bs) {
bdrv_detach(bs, bs->peer);
*ptr = NULL;
break;
}
}
}
Aside: arguably, bdrv_detach() should zap *both* pointers, i.e. also do
the *ptr = NULL. Not your problem to fix.
Only then are we ready to destroy the host part:
drive_uninit(drive_get_by_blockdev(bs));
Does this help?
- [Qemu-devel] [PATCH 0/2] v6 Decouple block device removal from device removal, Ryan Harper, 2010/11/08
- [Qemu-devel] [PATCH 2/2] Add qmp version of drive_del, Ryan Harper, 2010/11/08
- [Qemu-devel] [PATCH 1/2] Fix Block Hotplug race with drive_del(), Ryan Harper, 2010/11/08
- Re: [Qemu-devel] [PATCH 1/2] Fix Block Hotplug race with drive_del(), Markus Armbruster, 2010/11/10
- Re: [Qemu-devel] [PATCH 1/2] Fix Block Hotplug race with drive_del(), Ryan Harper, 2010/11/10
- Re: [Qemu-devel] [PATCH 1/2] Fix Block Hotplug race with drive_del(),
Markus Armbruster <=
- Re: [Qemu-devel] [PATCH 1/2] Fix Block Hotplug race with drive_del(), Ryan Harper, 2010/11/10
- Re: [Qemu-devel] [PATCH 1/2] Fix Block Hotplug race with drive_del(), Ryan Harper, 2010/11/10
- Re: [Qemu-devel] [PATCH 1/2] Fix Block Hotplug race with drive_del(), Markus Armbruster, 2010/11/11
- Re: [Qemu-devel] [PATCH 1/2] Fix Block Hotplug race with drive_del(), Ryan Harper, 2010/11/11
- Re: [Qemu-devel] [PATCH 1/2] Fix Block Hotplug race with drive_del(), Markus Armbruster, 2010/11/11
[Qemu-devel] Re: [PATCH 0/2] v6 Decouple block device removal from device removal, Michael S. Tsirkin, 2010/11/09