[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] RFC: Partial workaround for buggy guest virtio-balloon
From: |
Anthony Liguori |
Subject: |
Re: [Qemu-devel] RFC: Partial workaround for buggy guest virtio-balloon driver |
Date: |
Thu, 08 Nov 2012 07:11:13 -0600 |
User-agent: |
Notmuch/0.13.2+93~ged93d79 (http://notmuchmail.org) Emacs/23.3.1 (x86_64-pc-linux-gnu) |
David Gibson <address@hidden> writes:
> Linux kernel commits 1a87228f5f1d316002c7c161316f5524592be766
> "virtio_balloon: Fix endian bug" and
> 3ccc9372ed0fab33d20f10be3c1efd5776ff5913 "virtio_balloon: fix handling
> of PAGE_SIZE != 4k" fixed two serious bugs in their (guest side)
> handling of the virtio balloon. In practice, these bugs only affected
> powerpc guests, which is big-endian and frequently configured for 64k
> base page size. Attempting to use the balloon with the buggy guest
> would usually result in an immediate guest crash.
You should create a new feature VIRTIO_BALLOON_F_ENDIAN_SAFE,
advertise it in the host, and add a guest kernel patch to ack it in
newer kernels.
Older kernels won't ack this feature which gives you a safe way to to
disable the driver on a big endian host.
You won't get support for 3.4 kernels but it's much nicer to handle it
this way.
Regards,
Anthony Liguori
>
> The bugs are fixed now, of course and the balloon works fine with
> kernels v3.4 and later, but unfortunately there are many distro
> releases still in use which still have buggy kernels.
>
> The nature of the page size bug makes it impossible to work around
> from the host side (there simply isn't enough information supplied to
> operate the balloon correctly). However, it *is* possible with some
> fiddling to safely detect the endian bug in the guest kernel, and
> disable the balloon in this case. The two fixes were applied to the
> mainline kernel almost consecutively, so there are no released kernels
> with one fix but not the other, meaning we can use the presence of the
> endian bug as a proxy for the presence of the page size bug.
>
> This patch implements such a test, working as follows.
>
> For a fixed guest kernel:
> 1. qemu sets a state variable to "TESTING" and the initial balloon
> target size to 16 (4k pages).
> 2. When the guest balloon driver starts, it sees the target, finds
> either 16 unused 4k pages or 1 unused 64k page (depending on
> guest config) and submits the 16 resulting 4k pfns to the
> host. qemu, in TESTING state, ignores the submitted pages for
> now.
> 4. The guest then updates the 'actual' field in the balloon config
> space to 16. qemu sees this and determines that the guest is not
> buggy, it moves to CLEANUP state, and sets the target balloon
> size back to 0.
> 5. The guest sees the target go back to 0, and reclaims its page(s)
> from the balloon. qemu continues to ignore the page addresses
> for now in CLEANUP state.
> 6. The guest updates the actual field to 0. qemu now considers
> cleanup complete and moves to GOOD state. The balloon now
> operates normally.
>
> For a buggy kernel:
> 1. qemu sets a state variable to "TESTING" and the initial balloon
> target size to 16 (4k pages).
> 2. When the guest balloon driver starts it sees the non-zero target,
> and misinterprets it as 268435456 (endian bug). It starts trying
> to find pages to free.
> 3. The guest is probably newly booted, so it almost certainly finds
> 256 pages easily and submits incorrect addresses for them (page
> size bug) to the host. qemu, in TESTING state ignores the (wrong)
> addresses for now.
> 4. The guest updates the actual field in config space to 256. qemu
> sees this, and determines that the guest is buggy. It moves to
> BUGGY state and sets the balloon target back to zero.
> 5. The guest, before attempting to find the next batch of pages to
> free, rechecks the target and discovers it is now zero. It
> reclaims the pages by submitting more wrong addresses, which qemu
> ignores.
> 6. The balloon is now disabled, if the user attempts to change the
> balloon size, qemu print an error message and otherwise ignore
> it.
>
>
> I'm aware that this patch needs a bunch more comments (largely based
> on the info above), but otherwise do people think this is a reasonable
> approach. It doesn't (and can't) fix the balloon for buggy kernels,
> but it does make the failure mode a lot less ugly.
>
> From dbc721f5e50a39ca3b40d81f060d8bb0e6312995 Mon Sep 17 00:00:00 2001
> From: David Gibson <address@hidden>
> Date: Thu, 8 Nov 2012 14:49:38 +1100
> Subject: [PATCH] Detection of buggy guest balloon
>
> ---
> hw/virtio-balloon.c | 77
> +++++++++++++++++++++++++++++++++++++++++++++++----
> 1 file changed, 72 insertions(+), 5 deletions(-)
>
> diff --git a/hw/virtio-balloon.c b/hw/virtio-balloon.c
> index dd1a650..6a9cd3f 100644
> --- a/hw/virtio-balloon.c
> +++ b/hw/virtio-balloon.c
> @@ -37,8 +37,16 @@ typedef struct VirtIOBalloon
> VirtQueueElement stats_vq_elem;
> size_t stats_vq_offset;
> DeviceState *qdev;
> +
> + int guest_bug_state;
> +#define GUEST_BUG_TESTING 0
> +#define GUEST_BUG_CLEANUP 1
> +#define GUEST_BUG_BUGGY 2
> +#define GUEST_BUG_GOOD 3
> } VirtIOBalloon;
>
> +#define GUEST_BUG_TARGET 16
> +
> static VirtIOBalloon *to_virtio_balloon(VirtIODevice *vdev)
> {
> return (VirtIOBalloon *)vdev;
> @@ -84,6 +92,11 @@ static void virtio_balloon_handle_output(VirtIODevice
> *vdev, VirtQueue *vq)
> pa = (ram_addr_t)ldl_p(&pfn) << VIRTIO_BALLOON_PFN_SHIFT;
> offset += 4;
>
> + if (s->guest_bug_state != GUEST_BUG_GOOD) {
> + /* Still bug testing, not ready to use balloon yet */
> + continue;
> + }
> +
> /* FIXME: remove get_system_memory(), but how? */
> section = memory_region_find(get_system_memory(), pa, 1);
> if (!section.size || !memory_region_is_ram(section.mr))
> @@ -134,8 +147,23 @@ static void virtio_balloon_get_config(VirtIODevice
> *vdev, uint8_t *config_data)
> {
> VirtIOBalloon *dev = to_virtio_balloon(vdev);
> struct virtio_balloon_config config;
> + uint32_t num_pages;
> +
> + switch (dev->guest_bug_state) {
> + case GUEST_BUG_TESTING:
> + num_pages = GUEST_BUG_TARGET;
> + break;
>
> - config.num_pages = cpu_to_le32(dev->num_pages);
> + case GUEST_BUG_CLEANUP:
> + case GUEST_BUG_BUGGY:
> + num_pages = 0;
> + break;
> +
> + default:
> + num_pages = dev->num_pages;
> + }
> +
> + config.num_pages = cpu_to_le32(num_pages);
> config.actual = cpu_to_le32(dev->actual);
>
> memcpy(config_data, &config, 8);
> @@ -147,11 +175,41 @@ static void virtio_balloon_set_config(VirtIODevice
> *vdev,
> VirtIOBalloon *dev = to_virtio_balloon(vdev);
> struct virtio_balloon_config config;
> uint32_t oldactual = dev->actual;
> +
> memcpy(&config, config_data, 8);
> dev->actual = le32_to_cpu(config.actual);
> - if (dev->actual != oldactual) {
> - qemu_balloon_changed(ram_size -
> - (dev->actual << VIRTIO_BALLOON_PFN_SHIFT));
> +
> + switch (dev->guest_bug_state) {
> + case GUEST_BUG_TESTING:
> + if (dev->actual == 0) {
> + /* Both buggy and non-buggy guests write a 0 before going
> + * on to write a meaningful value */
> + break;
> + }
> +
> + if (dev->actual > GUEST_BUG_TARGET) {
> + fprintf(stderr, "virtio-balloon: Buggy guest detected, disabling
> balloon\n");
> + dev->guest_bug_state = GUEST_BUG_BUGGY;
> + } else {
> + dev->guest_bug_state = GUEST_BUG_CLEANUP;
> + }
> + /* Changing bug state implicitly alters the config */
> + virtio_notify_config(&dev->vdev);
> + break;
> +
> + case GUEST_BUG_CLEANUP:
> + if (dev->actual == 0) {
> + /* Cleanup completed, proceed with normal operation */
> + dev->guest_bug_state = GUEST_BUG_GOOD;
> + virtio_notify_config(&dev->vdev);
> + }
> + break;
> +
> + default:
> + if (dev->actual != oldactual) {
> + qemu_balloon_changed(ram_size -
> + (dev->actual << VIRTIO_BALLOON_PFN_SHIFT));
> + }
> }
> }
>
> @@ -194,12 +252,21 @@ static void virtio_balloon_to_target(void *opaque,
> ram_addr_t target)
> {
> VirtIOBalloon *dev = opaque;
>
> + if (dev->guest_bug_state == GUEST_BUG_BUGGY) {
> + fprintf(stderr, "Guest is buggy, cannot use balloon\n");
> + }
> +
> if (target > ram_size) {
> target = ram_size;
> }
> if (target) {
> dev->num_pages = (ram_size - target) >> VIRTIO_BALLOON_PFN_SHIFT;
> - virtio_notify_config(&dev->vdev);
> +
> + /* If we're still testing for guest bugs, delay the change
> + * interrupt until we've finished that */
> + if (dev->guest_bug_state == GUEST_BUG_GOOD) {
> + virtio_notify_config(&dev->vdev);
> + }
> }
> }
>
> --
> 1.7.10.4
>
>
>
> --
> David Gibson | I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_
> _other_
> | _way_ _around_!
> http://www.ozlabs.org/~dgibson