qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 3/3] Lift max memory slots limit imposed by vhost-user


From: Michael S. Tsirkin
Subject: Re: [PATCH v2 3/3] Lift max memory slots limit imposed by vhost-user
Date: Thu, 6 Feb 2020 03:32:38 -0500

On Wed, Jan 15, 2020 at 09:57:06PM -0500, Raphael Norwitz wrote:
> The current vhost-user implementation in Qemu imposes a limit on the
> maximum number of memory slots exposed to a VM using a vhost-user
> device. This change provides a new protocol feature
> VHOST_USER_F_CONFIGURE_SLOTS which, when enabled, lifts this limit and
> allows a VM with a vhost-user device to expose a configurable number of
> memory slots, up to the ACPI defined maximum. Existing backends which
> do not support this protocol feature are unaffected.

Hmm ACPI maximum seems to be up to 512 - is this too much to fit in a
single message?  So can't we just increase the number (after negotiating
with remote) and be done with it, instead of add/remove?  Or is there
another reason to prefer add/remove?

> 
> This feature works by using three new messages,
> VHOST_USER_GET_MAX_MEM_SLOTS, VHOST_USER_ADD_MEM_REG and
> VHOST_USER_REM_MEM_REG. VHOST_USER_GET_MAX_MEM_SLOTS gets the
> number of memory slots the backend is willing to accept when the
> backend is initialized. Then, when the memory tables are set or updated,
> a series of VHOST_USER_ADD_MEM_REG and VHOST_USER_REM_MEM_REG messages
> are sent to transmit the regions to map and/or unmap instead of trying
> to send all the regions in one fixed size VHOST_USER_SET_MEM_TABLE
> message.
> 
> The vhost_user struct maintains a shadow state of the VM’s memory
> regions. When the memory tables are modified, the
> vhost_user_set_mem_table() function compares the new device memory state
> to the shadow state and only sends regions which need to be unmapped or
> mapped in. The regions which must be unmapped are sent first, followed
> by the new regions to be mapped in. After all the messages have been
> sent, the shadow state is set to the current virtual device state.
> 
> The current feature implementation does not work with postcopy migration
> and cannot be enabled if the VHOST_USER_PROTOCOL_F_REPLY_ACK feature has
> also been negotiated.

Hmm what would it take to lift the restrictions?
conflicting features like this makes is very hard for users to make
an informed choice what to support.


> Signed-off-by: Raphael Norwitz <address@hidden>
> Signed-off-by: Peter Turschmid <address@hidden>
> Suggested-by: Mike Cui <address@hidden>
> ---
>  docs/interop/vhost-user.rst |  43 ++++++++
>  hw/virtio/vhost-user.c      | 254 
> ++++++++++++++++++++++++++++++++++++++++----
>  2 files changed, 275 insertions(+), 22 deletions(-)
> 
> diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
> index 5f8b3a4..ae9acf2 100644
> --- a/docs/interop/vhost-user.rst
> +++ b/docs/interop/vhost-user.rst
> @@ -786,6 +786,7 @@ Protocol features
>    #define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER  11
>    #define VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD 12
>    #define VHOST_USER_PROTOCOL_F_RESET_DEVICE   13
> +  #define VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS   14
>  
>  Master message types
>  --------------------
> @@ -1205,6 +1206,48 @@ Master message types
>    Only valid if the ``VHOST_USER_PROTOCOL_F_RESET_DEVICE`` protocol
>    feature is set by the backend.
>  
> +``VHOST_USER_GET_MAX_MEM_SLOTS``
> +  :id: 35
> +  :equivalent ioctl: N/A
> +  :slave payload: u64
> +
> +  When the VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS protocol feature has been
> +  successfully negotiated, this message is submitted by master to the
> +  slave. The slave should return the message with a u64 payload
> +  containing the maximum number of memory slots for QEMU to expose to
> +  the guest. This message is not supported with postcopy migration or if
> +  the VHOST_USER_PROTOCOL_F_REPLY_ACK feature has also been negotiated.
> +
> +``VHOST_USER_ADD_MEM_REG``
> +  :id: 36
> +  :equivalent ioctl: N/A
> +  :slave payload: memory region
> +
> +  When the VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS protocol feature has been
> +  successfully negotiated, this message is submitted by master to the slave.
> +  The message payload contains a memory region descriptor struct, describing
> +  a region of guest memory which the slave device must map in. When the
> +  VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS protocol feature has been 
> successfully
> +  negotiated, along with the VHOST_USER_REM_MEM_REG message, this message is
> +  used to set and update the memory tables of the slave device. This message
> +  is not supported with postcopy migration or if the
> +  VHOST_USER_PROTOCOL_F_REPLY_ACK feature has also been negotiated.
> +
> +``VHOST_USER_REM_MEM_REG``
> +  :id: 37
> +  :equivalent ioctl: N/A
> +  :slave payload: memory region
> +
> +  When the VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS protocol feature has been
> +  successfully negotiated, this message is submitted by master to the slave.
> +  The message payload contains a memory region descriptor struct, describing
> +  a region of guest memory which the slave device must unmap. When the
> +  VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS protocol feature has been 
> successfully
> +  negotiated, along with the VHOST_USER_ADD_MEM_REG message, this message is
> +  used to set and update the memory tables of the slave device. This message
> +  is not supported with postcopy migration or if the
> +  VHOST_USER_PROTOCOL_F_REPLY_ACK feature has also been negotiated.
> +
>  Slave message types
>  -------------------
>  
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index af83fdd..fed6d02 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -35,11 +35,29 @@
>  #include <linux/userfaultfd.h>
>  #endif
>  
> -#define VHOST_MEMORY_MAX_NREGIONS    8
> +#define VHOST_MEMORY_LEGACY_NREGIONS    8

Hardly legacy when this is intended to always be used e.g. with
postcopy, right?

>  #define VHOST_USER_F_PROTOCOL_FEATURES 30
>  #define VHOST_USER_SLAVE_MAX_FDS     8
>  
>  /*
> + * Set maximum number of RAM slots supported to
> + * the maximum number supported by the target
> + * hardware plaform.
> + */
> +#if defined(TARGET_X86) || defined(TARGET_X86_64) || \
> +    defined(TARGET_ARM) || defined(TARGET_ARM_64)
> +#include "hw/acpi/acpi.h"
> +#define VHOST_USER_MAX_RAM_SLOTS ACPI_MAX_RAM_SLOTS
> +
> +#elif defined(TARGET_PPC) || defined(TARGET_PPC_64)
> +#include "hw/ppc/spapr.h"
> +#define VHOST_USER_MAX_RAM_SLOTS SPAPR_MAX_RAM_SLOTS
> +
> +#else
> +#define VHOST_USER_MAX_RAM_SLOTS 512
> +#endif
> +
> +/*
>   * Maximum size of virtio device config space
>   */
>  #define VHOST_USER_MAX_CONFIG_SIZE 256
> @@ -59,6 +77,7 @@ enum VhostUserProtocolFeature {
>      VHOST_USER_PROTOCOL_F_HOST_NOTIFIER = 11,
>      VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD = 12,
>      VHOST_USER_PROTOCOL_F_RESET_DEVICE = 13,
> +    VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS = 14,
>      VHOST_USER_PROTOCOL_F_MAX
>  };
>  
> @@ -100,6 +119,9 @@ typedef enum VhostUserRequest {
>      VHOST_USER_SET_INFLIGHT_FD = 32,
>      VHOST_USER_GPU_SET_SOCKET = 33,
>      VHOST_USER_RESET_DEVICE = 34,
> +    VHOST_USER_GET_MAX_MEM_SLOTS = 35,
> +    VHOST_USER_ADD_MEM_REG = 36,
> +    VHOST_USER_REM_MEM_REG = 37,
>      VHOST_USER_MAX
>  } VhostUserRequest;
>  
> @@ -121,9 +143,14 @@ typedef struct VhostUserMemoryRegion {
>  typedef struct VhostUserMemory {
>      uint32_t nregions;
>      uint32_t padding;
> -    VhostUserMemoryRegion regions[VHOST_MEMORY_MAX_NREGIONS];
> +    VhostUserMemoryRegion regions[VHOST_MEMORY_LEGACY_NREGIONS];
>  } VhostUserMemory;
>  
> +typedef struct VhostUserMemRegMsg {
> +    uint32_t padding;
> +    VhostUserMemoryRegion region;
> +} VhostUserMemRegMsg;
> +
>  typedef struct VhostUserLog {
>      uint64_t mmap_size;
>      uint64_t mmap_offset;
> @@ -182,6 +209,7 @@ typedef union {
>          struct vhost_vring_state state;
>          struct vhost_vring_addr addr;
>          VhostUserMemory memory;
> +        VhostUserMemRegMsg mem_reg;
>          VhostUserLog log;
>          struct vhost_iotlb_msg iotlb;
>          VhostUserConfig config;
> @@ -210,7 +238,7 @@ struct vhost_user {
>      int slave_fd;
>      NotifierWithReturn postcopy_notifier;
>      struct PostCopyFD  postcopy_fd;
> -    uint64_t           postcopy_client_bases[VHOST_MEMORY_MAX_NREGIONS];
> +    uint64_t           postcopy_client_bases[VHOST_USER_MAX_RAM_SLOTS];
>      /* Length of the region_rb and region_rb_offset arrays */
>      size_t             region_rb_len;
>      /* RAMBlock associated with a given region */
> @@ -222,6 +250,13 @@ struct vhost_user {
>  
>      /* True once we've entered postcopy_listen */
>      bool               postcopy_listen;
> +
> +    /* Maximum number of RAM slots supported by the backend */
> +    uint64_t memory_slots;
> +
> +    /* Our current regions */
> +    int num_shadow_regions;
> +    VhostUserMemoryRegion shadow_regions[VHOST_USER_MAX_RAM_SLOTS];
>  };
>  
>  static bool ioeventfd_enabled(void)
> @@ -370,7 +405,7 @@ int vhost_user_gpu_set_socket(struct vhost_dev *dev, int 
> fd)
>  static int vhost_user_set_log_base(struct vhost_dev *dev, uint64_t base,
>                                     struct vhost_log *log)
>  {
> -    int fds[VHOST_MEMORY_MAX_NREGIONS];
> +    int fds[VHOST_USER_MAX_RAM_SLOTS];
>      size_t fd_num = 0;
>      bool shmfd = virtio_has_feature(dev->protocol_features,
>                                      VHOST_USER_PROTOCOL_F_LOG_SHMFD);
> @@ -429,7 +464,7 @@ static int vhost_user_fill_set_mem_table_msg(struct 
> vhost_user *u,
>          fd = memory_region_get_fd(mr);
>          if (fd > 0) {
>              if (postcopy) {
> -                assert(*fd_num < VHOST_MEMORY_MAX_NREGIONS);
> +                assert(*fd_num < VHOST_MEMORY_LEGACY_NREGIONS);
>                  trace_vhost_user_set_mem_table_withfd(*fd_num, mr->name,
>                                                        reg->memory_size,
>                                                        reg->guest_phys_addr,
> @@ -437,7 +472,7 @@ static int vhost_user_fill_set_mem_table_msg(struct 
> vhost_user *u,
>                                                        offset);
>                  u->region_rb_offset[i] = offset;
>                  u->region_rb[i] = mr->ram_block;
> -            } else if (*fd_num == VHOST_MEMORY_MAX_NREGIONS) {
> +            } else if (*fd_num == VHOST_MEMORY_LEGACY_NREGIONS) {
>                  error_report("Failed preparing vhost-user memory table msg");
>                  return -1;
>              }
> @@ -474,7 +509,7 @@ static int vhost_user_set_mem_table_postcopy(struct 
> vhost_dev *dev,
>                                               struct vhost_memory *mem)
>  {
>      struct vhost_user *u = dev->opaque;
> -    int fds[VHOST_MEMORY_MAX_NREGIONS];
> +    int fds[VHOST_MEMORY_LEGACY_NREGIONS];
>      size_t fd_num = 0;
>      VhostUserMsg msg_reply;
>      int region_i, msg_i;
> @@ -523,7 +558,7 @@ static int vhost_user_set_mem_table_postcopy(struct 
> vhost_dev *dev,
>      }
>  
>      memset(u->postcopy_client_bases, 0,
> -           sizeof(uint64_t) * VHOST_MEMORY_MAX_NREGIONS);
> +           sizeof(uint64_t) * VHOST_USER_MAX_RAM_SLOTS);
>  
>      /* They're in the same order as the regions that were sent
>       * but some of the regions were skipped (above) if they
> @@ -564,18 +599,151 @@ static int vhost_user_set_mem_table_postcopy(struct 
> vhost_dev *dev,
>      return 0;
>  }
>  
> +static inline bool reg_equal(VhostUserMemoryRegion *shadow_reg,
> +                             struct vhost_memory_region *vdev_reg)
> +{
> +    if (shadow_reg->guest_phys_addr == vdev_reg->guest_phys_addr &&
> +        shadow_reg->userspace_addr == vdev_reg->userspace_addr &&
> +        shadow_reg->memory_size == vdev_reg->memory_size) {
> +        return true;
> +    } else {
> +        return false;
> +    }
> +}
> +
> +static int vhost_user_send_add_remove_regions(struct vhost_dev *dev,
> +                                              struct vhost_memory *mem,
> +                                              VhostUserMsg *msg)
> +{
> +    struct vhost_user *u = dev->opaque;
> +    int i, j, fd;
> +    bool found[VHOST_USER_MAX_RAM_SLOTS] = {};
> +    bool matching = false;
> +    struct vhost_memory_region *reg;
> +    ram_addr_t offset;
> +    MemoryRegion *mr;
> +
> +    /*
> +     * Ensure the VHOST_USER_PROTOCOL_F_REPLY_ACK has not been
> +     * negotiated and no postcopy migration is in progress.
> +     */
> +    assert(!virtio_has_feature(dev->protocol_features,
> +                               VHOST_USER_PROTOCOL_F_REPLY_ACK));
> +    if (u->postcopy_listen && u->postcopy_fd.handler) {
> +        error_report("Postcopy migration is not supported when the "
> +                     "VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS feature "
> +                     "has been negotiated");
> +        return -1;
> +    }
> +
> +    msg->hdr.size = sizeof(msg->payload.mem_reg.padding);
> +    msg->hdr.size += sizeof(VhostUserMemoryRegion);
> +
> +    /*
> +     * Send VHOST_USER_REM_MEM_REG for memory regions in our shadow state
> +     * which are not found not in the device's memory state.

double negation - could not parse this.

> +     */
> +    for (i = 0; i < u->num_shadow_regions; ++i) {
> +        reg = dev->mem->regions;
> +
> +        for (j = 0; j < dev->mem->nregions; j++) {
> +            reg = dev->mem->regions + j;
> +
> +            assert((uintptr_t)reg->userspace_addr == reg->userspace_addr);
> +            mr = memory_region_from_host((void 
> *)(uintptr_t)reg->userspace_addr,
> +                                         &offset);
> +            fd = memory_region_get_fd(mr);
> +
> +            if (reg_equal(&u->shadow_regions[i], reg)) {
> +                matching = true;
> +                found[j] = true;
> +                break;
> +            }
> +        }
> +
> +        if (fd > 0 && !matching) {
> +            msg->hdr.request = VHOST_USER_REM_MEM_REG;
> +            msg->payload.mem_reg.region.userspace_addr = reg->userspace_addr;
> +            msg->payload.mem_reg.region.memory_size = reg->memory_size;
> +            msg->payload.mem_reg.region.guest_phys_addr =
> +                reg->guest_phys_addr;
> +            msg->payload.mem_reg.region.mmap_offset = offset;
> +
> +            if (vhost_user_write(dev, msg, &fd, 1) < 0) {
> +                return -1;
> +            }
> +        }
> +    }
> +
> +    /*
> +     * Send messages to add regions present in the device which are not
> +     * in our shadow state.
> +     */
> +    for (i = 0; i < dev->mem->nregions; ++i) {
> +        reg = dev->mem->regions + i;
> +
> +        /*
> +         * If the region was in both the shadow and vdev state we don't
> +         * need to send a VHOST_USER_ADD_MEM_REG message for it.
> +         */
> +        if (found[i]) {
> +            continue;
> +        }
> +
> +        assert((uintptr_t)reg->userspace_addr == reg->userspace_addr);
> +        mr = memory_region_from_host((void *)(uintptr_t)reg->userspace_addr,
> +                                     &offset);
> +        fd = memory_region_get_fd(mr);
> +
> +        if (fd > 0) {
> +            msg->hdr.request = VHOST_USER_ADD_MEM_REG;
> +            msg->payload.mem_reg.region.userspace_addr = reg->userspace_addr;
> +            msg->payload.mem_reg.region.memory_size = reg->memory_size;
> +            msg->payload.mem_reg.region.guest_phys_addr =
> +                reg->guest_phys_addr;
> +            msg->payload.mem_reg.region.mmap_offset = offset;
> +
> +            if (vhost_user_write(dev, msg, &fd, 1) < 0) {
> +                return -1;
> +            }
> +        }
> +    }
> +
> +    /* Make our shadow state match the updated device state. */
> +    u->num_shadow_regions = dev->mem->nregions;
> +    for (i = 0; i < dev->mem->nregions; ++i) {
> +        reg = dev->mem->regions + i;
> +        u->shadow_regions[i].guest_phys_addr = reg->guest_phys_addr;
> +        u->shadow_regions[i].userspace_addr = reg->userspace_addr;
> +        u->shadow_regions[i].memory_size = reg->memory_size;
> +    }
> +
> +    return 0;
> +}
> +
>  static int vhost_user_set_mem_table(struct vhost_dev *dev,
>                                      struct vhost_memory *mem)
>  {
>      struct vhost_user *u = dev->opaque;
> -    int fds[VHOST_MEMORY_MAX_NREGIONS];
> +    int fds[VHOST_MEMORY_LEGACY_NREGIONS];
>      size_t fd_num = 0;
>      bool do_postcopy = u->postcopy_listen && u->postcopy_fd.handler;
>      bool reply_supported = virtio_has_feature(dev->protocol_features,
>                                                
> VHOST_USER_PROTOCOL_F_REPLY_ACK);
> +    bool config_slots =
> +        virtio_has_feature(dev->protocol_features,
> +                           VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS);
>  
>      if (do_postcopy) {
> -        /* Postcopy has enough differences that it's best done in it's own
> +        if (config_slots) {
> +            error_report("Postcopy migration not supported with "
> +                         "VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS feature "
> +                         "enabled.");
> +            return -1;
> +        }
> +
> +        /*
> +         * Postcopy has enough differences that it's best done in it's own
>           * version
>           */
>          return vhost_user_set_mem_table_postcopy(dev, mem);
> @@ -589,17 +757,22 @@ static int vhost_user_set_mem_table(struct vhost_dev 
> *dev,
>          msg.hdr.flags |= VHOST_USER_NEED_REPLY_MASK;
>      }
>  
> -    if (vhost_user_fill_set_mem_table_msg(u, dev, &msg, fds, &fd_num,
> -                                          false) < 0) {
> -        return -1;
> -    }
> -
> -    if (vhost_user_write(dev, &msg, fds, fd_num) < 0) {
> -        return -1;
> -    }
> +    if (config_slots && !reply_supported) {
> +        if (vhost_user_send_add_remove_regions(dev, mem, &msg) < 0) {
> +            return -1;
> +        }
> +    } else {
> +        if (vhost_user_fill_set_mem_table_msg(u, dev, &msg, fds, &fd_num,
> +                                              false) < 0) {
> +            return -1;
> +        }
> +        if (vhost_user_write(dev, &msg, fds, fd_num) < 0) {
> +            return -1;
> +        }
>  
> -    if (reply_supported) {
> -        return process_message_reply(dev, &msg);
> +        if (reply_supported) {
> +            return process_message_reply(dev, &msg);
> +        }
>      }
>  
>      return 0;
> @@ -764,7 +937,7 @@ static int vhost_set_vring_file(struct vhost_dev *dev,
>                                  VhostUserRequest request,
>                                  struct vhost_vring_file *file)
>  {
> -    int fds[VHOST_MEMORY_MAX_NREGIONS];
> +    int fds[VHOST_USER_MAX_RAM_SLOTS];
>      size_t fd_num = 0;
>      VhostUserMsg msg = {
>          .hdr.request = request,
> @@ -866,6 +1039,23 @@ static int vhost_user_get_features(struct vhost_dev 
> *dev, uint64_t *features)
>      return vhost_user_get_u64(dev, VHOST_USER_GET_FEATURES, features);
>  }
>  
> +static int vhost_user_get_max_memslots(struct vhost_dev *dev,
> +                                       uint64_t *max_memslots)
> +{
> +    uint64_t backend_max_memslots;
> +    int err;
> +
> +    err = vhost_user_get_u64(dev, VHOST_USER_GET_MAX_MEM_SLOTS,
> +                             &backend_max_memslots);
> +    if (err < 0) {
> +        return err;
> +    }
> +
> +    *max_memslots = MIN(backend_max_memslots, VHOST_USER_MAX_RAM_SLOTS);
> +
> +    return *max_memslots;
> +}
> +
>  static int vhost_user_set_owner(struct vhost_dev *dev)
>  {
>      VhostUserMsg msg = {
> @@ -1439,6 +1629,24 @@ static int vhost_user_backend_init(struct vhost_dev 
> *dev, void *opaque)
>                           "slave-req protocol features.");
>              return -1;
>          }
> +
> +        /* get max memory regions if backend supports configurable RAM slots 
> */
> +        if (!virtio_has_feature(dev->protocol_features,
> +                                VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS)) {
> +            u->memory_slots = VHOST_MEMORY_LEGACY_NREGIONS;
> +        } else if (virtio_has_feature(dev->protocol_features,
> +                           VHOST_USER_PROTOCOL_F_REPLY_ACK)) {
> +            error_report("The VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS protocol 
> "
> +                         "feature is not supported when the "
> +                         "VHOST_USER_PROTOCOL_F_REPLY_ACK features has also "
> +                         "been negotiated");
> +            return -1;
> +        } else {
> +            err = vhost_user_get_max_memslots(dev, &u->memory_slots);
> +            if (err < 0) {
> +                return err;
> +            }
> +        }
>      }
>  
>      if (dev->migration_blocker == NULL &&
> @@ -1502,7 +1710,9 @@ static int vhost_user_get_vq_index(struct vhost_dev 
> *dev, int idx)
>  
>  static int vhost_user_memslots_limit(struct vhost_dev *dev)
>  {
> -    return VHOST_MEMORY_MAX_NREGIONS;
> +    struct vhost_user *u = dev->opaque;
> +
> +    return u->memory_slots;
>  }
>  
>  static bool vhost_user_requires_shm_log(struct vhost_dev *dev)
> -- 
> 1.8.3.1




reply via email to

[Prev in Thread] Current Thread [Next in Thread]