qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from devic


From: Markus Armbruster
Subject: Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
Date: Tue, 02 Nov 2010 10:40:32 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/23.1 (gnu/linux)

Ryan Harper <address@hidden> writes:

> * Markus Armbruster <address@hidden> [2010-10-29 11:11]:
>> Ryan Harper <address@hidden> writes:
>> 
>> > * Markus Armbruster <address@hidden> [2010-10-29 09:13]:
>> >> [Note cc: Michael]
>> >> 
>> >> Ryan Harper <address@hidden> writes:
>> >> 
>> >> 
>> >> If I understand your patch correctly, the difference between your
>> >> drive_unplug and my blockdev_del is as follows:
>> >> 
>> >> * drive_unplug forcefully severs the connection between the host part of
>> >>   the block device and its BlockDriverState.  A shell of the host part
>> >>   remains, to be cleaned up later.  You need forceful disconnect
>> >>   operation to be able to revoke access to an image whether the guest
>> >>   cooperates or not.  Fair enough.
>> >> 
>> >> * blockdev_del deletes a host part.  My current version fails when the
>> >>   host part is in use.  I patterned that after netdev_del, which used to
>> >>   work that way, until commit 2ffcb18d:
>> >> 
>> >>     Make netdev_del delete the netdev even when it's in use
>> >>     
>> >>     To hot-unplug guest and host part of a network device, you do:
>> >>     
>> >>         device_del NIC-ID
>> >>         netdev_del NETDEV-ID
>> >>     
>> >>     For PCI devices, device_del merely tells ACPI to unplug the device.
>> >>     The device goes away for real only after the guest processed the ACPI
>> >>     unplug event.
>> >>     
>> >>     You have to wait until then (e.g. by polling info pci) before you can
>> >>     unplug the netdev.  Not good.
>> >>     
>> >>     Fix by removing the "in use" check from do_netdev_del().  Deleting a
>> >>     netdev while it's in use is safe; packets simply get routed to the bit
>> >>     bucket.
>> >> 
>> >>   Isn't this the very same problem that's behind your drive_unplug?
>> >
>> > Yes it is.
>> >
>> >> 
>> >> I'd like to have some consistency among net, block and char device
>> >> commands, i.e. a common set of operations that work the same for all of
>> >> them.  Can we agree on such a set?
>> >
>> > Yeah; the current trouble (or at least what I perceive to be trouble) is
>> > that in the case where the guest responds to device_del induced ACPI
>> > removal event; the current qdev code already does the host-side device
>> > tear down.  Not sure if it is OK to do a blockdev_del() immediately
>> > after the device_del.  What happens when we do:
>> >
>> > device_del
>> > ACPI to guest
>> > blockdev_del /* removes host-side device */
>> 
>> Fails in my tree, because the blockdev's still in use.  See below.
>> 
>> > guest responds to ACPI
>> > qdev calls pci device removal code
>> > qemu attempts to destroy the associated host-side block
>> >
>> > That may just work today; and if not, it shouldn't be hard to fix up the
>> > code to check for NULLs
>> 
>> I hate the automatic deletion of host part along with the guest part.
>> device_del should undo device_add.  {block,net,char}dev_{add,del} should
>> be similarly paired.
>
> Agreed.
>> 
>> In my blockdev branch, I keep the automatic delete only for backwards
>> compatibility: if you create the drive with drive_add, it gets
>> auto-deleted, but if you use blockdev_add, it stays around.
>
> But what to do about the case where we're doing drive_add and then a
> device_del()  That's the urgent situation that needs to be resolved.

What's the exact problem we need to solve urgently?

Is it "provide means to cut the connection to the host part immediately,
even with an uncooperative guest"?

Does this need to be separate from device_del?

>> >> Even if your drive_unplug shouldn't fit in that set, we might want it as
>> >> a stop-gap.  Depends on how urgent the need for it is.  Yet another
>> >> special-purpose command to be deprecated later.
>> >
>> > The fix is urgent; but I'm willing to spin a couple patches if it helps
>> > get this into better shape.
>> 
>> Can we agree on a common solution for block and net?  That's why I cc'ed
>> Michael.
>
> I didn't see a good way to have block behave the same as net; though I
> do agree that it would be good to have this be common, long term.

If we can't make them behave 100% the same, then the next best thing is
to offer a preferred way to do things that works similarly enough to let
users ignore the differences.

Possible preferred ways to revoke access to a host part:

A. device_del

   Need to make device_del cut the connection right away instead of when
   the guest completes unplug.

   device_del changes behavior.  Any problems with that?

   Not an option if we need "cut the connection" to be separate from
   device_del.

B. FOO_del

   Got netdev_del.

   Need drive_del.  If drive is in use, replace it by a special "dead
   drive" without a device name (so the ID becomes available for new
   drives), then delete the original.

   Wart: drive_del doesn't work reliably after device_del, because the
   drive is auto-deleted when the guest completes unplug.  Not a problem
   for my blockdev_add/blockdev_del work-in-progress, because host parts
   created with blockdev_add don't auto-delete.

C. FOO_unplug

   You got a patch for drive_unplug.

   Need netdev_unplug.

   By the way, I hate "unplug", because it suggests relation to hot
   unplug.  What about "disconnect"?

Any preferences?

>> Currently, we have two different ways:
>> 
>> * The netdev way: "del" always succeeds
>> 
>>   How can it succeed if the host part is in use?
>> 
>>   If all device models are prepared to deal with a missing host part, we
>>   can delete it right away.
>> 
>>   Else, we need to replace it with a suitable zombie, which is
>>   auto-deleted when it goes out of use.  Such zombies are not be visible
>>   elsewhere, in particular, the ID becomes available immediately.
>> 
>> * The unplug way: "del" fails while in use, "unplug" always succeeds
>> 
>>   Feels a bit cleaner to me.  But changing netdev_del might not be
>>   acceptable.
>> 
>> Either way works for me as an user interface.  But I'd rather not have
>> both.
>> 
>> Next, we need to consider how to integrate this with the automatic
>> deletion of drives on qdev destruction.  That's too late for unplug, we
>> want that right in device_del.  I'd leave the stupid automatic delete
>> where it is now, in qdev destruction.  The C API need unplug and delete
>> separate for that.
>> 
>> 
>> Regardless of the way we choose, we need to think very clearly on how
>> exactly device models should behave when their host part is missing or a
>> zombie, and how that behavior appears in the guest.
>> 
>> For net, making it look exactly like a yanked out network cable would
>> make sense to me.
>> 
>> What about block?
>
> It seems to me that for block it's like cdrom with no disk, floppy with
> no media, hard disk that's gone bad.  I think we we throw EIO back; it's
> handled gracefully enough.  This is what happens when you do a
> drive_unplug with my patch; the application using the device gets IO
> errors.  That's expected if a drive were to suddently fail (which is
> what this looks like).  And certainly there is some responsibility
> at the mgmt console to ensure you're not unplugging a drive that you are
> currently using.

Total drive failure works for me.

"No media" is cute, but it's possible only for drives with removable
media.  I'd rather have all drives behave the same, whether their media
is removable or not.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]