[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v2 3/3] Lift max memory slots limit imposed by vhost-user
From: |
Michael S. Tsirkin |
Subject: |
Re: [PATCH v2 3/3] Lift max memory slots limit imposed by vhost-user |
Date: |
Thu, 6 Feb 2020 03:32:38 -0500 |
On Wed, Jan 15, 2020 at 09:57:06PM -0500, Raphael Norwitz wrote:
> The current vhost-user implementation in Qemu imposes a limit on the
> maximum number of memory slots exposed to a VM using a vhost-user
> device. This change provides a new protocol feature
> VHOST_USER_F_CONFIGURE_SLOTS which, when enabled, lifts this limit and
> allows a VM with a vhost-user device to expose a configurable number of
> memory slots, up to the ACPI defined maximum. Existing backends which
> do not support this protocol feature are unaffected.
Hmm ACPI maximum seems to be up to 512 - is this too much to fit in a
single message? So can't we just increase the number (after negotiating
with remote) and be done with it, instead of add/remove? Or is there
another reason to prefer add/remove?
>
> This feature works by using three new messages,
> VHOST_USER_GET_MAX_MEM_SLOTS, VHOST_USER_ADD_MEM_REG and
> VHOST_USER_REM_MEM_REG. VHOST_USER_GET_MAX_MEM_SLOTS gets the
> number of memory slots the backend is willing to accept when the
> backend is initialized. Then, when the memory tables are set or updated,
> a series of VHOST_USER_ADD_MEM_REG and VHOST_USER_REM_MEM_REG messages
> are sent to transmit the regions to map and/or unmap instead of trying
> to send all the regions in one fixed size VHOST_USER_SET_MEM_TABLE
> message.
>
> The vhost_user struct maintains a shadow state of the VM’s memory
> regions. When the memory tables are modified, the
> vhost_user_set_mem_table() function compares the new device memory state
> to the shadow state and only sends regions which need to be unmapped or
> mapped in. The regions which must be unmapped are sent first, followed
> by the new regions to be mapped in. After all the messages have been
> sent, the shadow state is set to the current virtual device state.
>
> The current feature implementation does not work with postcopy migration
> and cannot be enabled if the VHOST_USER_PROTOCOL_F_REPLY_ACK feature has
> also been negotiated.
Hmm what would it take to lift the restrictions?
conflicting features like this makes is very hard for users to make
an informed choice what to support.
> Signed-off-by: Raphael Norwitz <address@hidden>
> Signed-off-by: Peter Turschmid <address@hidden>
> Suggested-by: Mike Cui <address@hidden>
> ---
> docs/interop/vhost-user.rst | 43 ++++++++
> hw/virtio/vhost-user.c | 254
> ++++++++++++++++++++++++++++++++++++++++----
> 2 files changed, 275 insertions(+), 22 deletions(-)
>
> diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
> index 5f8b3a4..ae9acf2 100644
> --- a/docs/interop/vhost-user.rst
> +++ b/docs/interop/vhost-user.rst
> @@ -786,6 +786,7 @@ Protocol features
> #define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER 11
> #define VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD 12
> #define VHOST_USER_PROTOCOL_F_RESET_DEVICE 13
> + #define VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS 14
>
> Master message types
> --------------------
> @@ -1205,6 +1206,48 @@ Master message types
> Only valid if the ``VHOST_USER_PROTOCOL_F_RESET_DEVICE`` protocol
> feature is set by the backend.
>
> +``VHOST_USER_GET_MAX_MEM_SLOTS``
> + :id: 35
> + :equivalent ioctl: N/A
> + :slave payload: u64
> +
> + When the VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS protocol feature has been
> + successfully negotiated, this message is submitted by master to the
> + slave. The slave should return the message with a u64 payload
> + containing the maximum number of memory slots for QEMU to expose to
> + the guest. This message is not supported with postcopy migration or if
> + the VHOST_USER_PROTOCOL_F_REPLY_ACK feature has also been negotiated.
> +
> +``VHOST_USER_ADD_MEM_REG``
> + :id: 36
> + :equivalent ioctl: N/A
> + :slave payload: memory region
> +
> + When the VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS protocol feature has been
> + successfully negotiated, this message is submitted by master to the slave.
> + The message payload contains a memory region descriptor struct, describing
> + a region of guest memory which the slave device must map in. When the
> + VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS protocol feature has been
> successfully
> + negotiated, along with the VHOST_USER_REM_MEM_REG message, this message is
> + used to set and update the memory tables of the slave device. This message
> + is not supported with postcopy migration or if the
> + VHOST_USER_PROTOCOL_F_REPLY_ACK feature has also been negotiated.
> +
> +``VHOST_USER_REM_MEM_REG``
> + :id: 37
> + :equivalent ioctl: N/A
> + :slave payload: memory region
> +
> + When the VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS protocol feature has been
> + successfully negotiated, this message is submitted by master to the slave.
> + The message payload contains a memory region descriptor struct, describing
> + a region of guest memory which the slave device must unmap. When the
> + VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS protocol feature has been
> successfully
> + negotiated, along with the VHOST_USER_ADD_MEM_REG message, this message is
> + used to set and update the memory tables of the slave device. This message
> + is not supported with postcopy migration or if the
> + VHOST_USER_PROTOCOL_F_REPLY_ACK feature has also been negotiated.
> +
> Slave message types
> -------------------
>
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index af83fdd..fed6d02 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -35,11 +35,29 @@
> #include <linux/userfaultfd.h>
> #endif
>
> -#define VHOST_MEMORY_MAX_NREGIONS 8
> +#define VHOST_MEMORY_LEGACY_NREGIONS 8
Hardly legacy when this is intended to always be used e.g. with
postcopy, right?
> #define VHOST_USER_F_PROTOCOL_FEATURES 30
> #define VHOST_USER_SLAVE_MAX_FDS 8
>
> /*
> + * Set maximum number of RAM slots supported to
> + * the maximum number supported by the target
> + * hardware plaform.
> + */
> +#if defined(TARGET_X86) || defined(TARGET_X86_64) || \
> + defined(TARGET_ARM) || defined(TARGET_ARM_64)
> +#include "hw/acpi/acpi.h"
> +#define VHOST_USER_MAX_RAM_SLOTS ACPI_MAX_RAM_SLOTS
> +
> +#elif defined(TARGET_PPC) || defined(TARGET_PPC_64)
> +#include "hw/ppc/spapr.h"
> +#define VHOST_USER_MAX_RAM_SLOTS SPAPR_MAX_RAM_SLOTS
> +
> +#else
> +#define VHOST_USER_MAX_RAM_SLOTS 512
> +#endif
> +
> +/*
> * Maximum size of virtio device config space
> */
> #define VHOST_USER_MAX_CONFIG_SIZE 256
> @@ -59,6 +77,7 @@ enum VhostUserProtocolFeature {
> VHOST_USER_PROTOCOL_F_HOST_NOTIFIER = 11,
> VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD = 12,
> VHOST_USER_PROTOCOL_F_RESET_DEVICE = 13,
> + VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS = 14,
> VHOST_USER_PROTOCOL_F_MAX
> };
>
> @@ -100,6 +119,9 @@ typedef enum VhostUserRequest {
> VHOST_USER_SET_INFLIGHT_FD = 32,
> VHOST_USER_GPU_SET_SOCKET = 33,
> VHOST_USER_RESET_DEVICE = 34,
> + VHOST_USER_GET_MAX_MEM_SLOTS = 35,
> + VHOST_USER_ADD_MEM_REG = 36,
> + VHOST_USER_REM_MEM_REG = 37,
> VHOST_USER_MAX
> } VhostUserRequest;
>
> @@ -121,9 +143,14 @@ typedef struct VhostUserMemoryRegion {
> typedef struct VhostUserMemory {
> uint32_t nregions;
> uint32_t padding;
> - VhostUserMemoryRegion regions[VHOST_MEMORY_MAX_NREGIONS];
> + VhostUserMemoryRegion regions[VHOST_MEMORY_LEGACY_NREGIONS];
> } VhostUserMemory;
>
> +typedef struct VhostUserMemRegMsg {
> + uint32_t padding;
> + VhostUserMemoryRegion region;
> +} VhostUserMemRegMsg;
> +
> typedef struct VhostUserLog {
> uint64_t mmap_size;
> uint64_t mmap_offset;
> @@ -182,6 +209,7 @@ typedef union {
> struct vhost_vring_state state;
> struct vhost_vring_addr addr;
> VhostUserMemory memory;
> + VhostUserMemRegMsg mem_reg;
> VhostUserLog log;
> struct vhost_iotlb_msg iotlb;
> VhostUserConfig config;
> @@ -210,7 +238,7 @@ struct vhost_user {
> int slave_fd;
> NotifierWithReturn postcopy_notifier;
> struct PostCopyFD postcopy_fd;
> - uint64_t postcopy_client_bases[VHOST_MEMORY_MAX_NREGIONS];
> + uint64_t postcopy_client_bases[VHOST_USER_MAX_RAM_SLOTS];
> /* Length of the region_rb and region_rb_offset arrays */
> size_t region_rb_len;
> /* RAMBlock associated with a given region */
> @@ -222,6 +250,13 @@ struct vhost_user {
>
> /* True once we've entered postcopy_listen */
> bool postcopy_listen;
> +
> + /* Maximum number of RAM slots supported by the backend */
> + uint64_t memory_slots;
> +
> + /* Our current regions */
> + int num_shadow_regions;
> + VhostUserMemoryRegion shadow_regions[VHOST_USER_MAX_RAM_SLOTS];
> };
>
> static bool ioeventfd_enabled(void)
> @@ -370,7 +405,7 @@ int vhost_user_gpu_set_socket(struct vhost_dev *dev, int
> fd)
> static int vhost_user_set_log_base(struct vhost_dev *dev, uint64_t base,
> struct vhost_log *log)
> {
> - int fds[VHOST_MEMORY_MAX_NREGIONS];
> + int fds[VHOST_USER_MAX_RAM_SLOTS];
> size_t fd_num = 0;
> bool shmfd = virtio_has_feature(dev->protocol_features,
> VHOST_USER_PROTOCOL_F_LOG_SHMFD);
> @@ -429,7 +464,7 @@ static int vhost_user_fill_set_mem_table_msg(struct
> vhost_user *u,
> fd = memory_region_get_fd(mr);
> if (fd > 0) {
> if (postcopy) {
> - assert(*fd_num < VHOST_MEMORY_MAX_NREGIONS);
> + assert(*fd_num < VHOST_MEMORY_LEGACY_NREGIONS);
> trace_vhost_user_set_mem_table_withfd(*fd_num, mr->name,
> reg->memory_size,
> reg->guest_phys_addr,
> @@ -437,7 +472,7 @@ static int vhost_user_fill_set_mem_table_msg(struct
> vhost_user *u,
> offset);
> u->region_rb_offset[i] = offset;
> u->region_rb[i] = mr->ram_block;
> - } else if (*fd_num == VHOST_MEMORY_MAX_NREGIONS) {
> + } else if (*fd_num == VHOST_MEMORY_LEGACY_NREGIONS) {
> error_report("Failed preparing vhost-user memory table msg");
> return -1;
> }
> @@ -474,7 +509,7 @@ static int vhost_user_set_mem_table_postcopy(struct
> vhost_dev *dev,
> struct vhost_memory *mem)
> {
> struct vhost_user *u = dev->opaque;
> - int fds[VHOST_MEMORY_MAX_NREGIONS];
> + int fds[VHOST_MEMORY_LEGACY_NREGIONS];
> size_t fd_num = 0;
> VhostUserMsg msg_reply;
> int region_i, msg_i;
> @@ -523,7 +558,7 @@ static int vhost_user_set_mem_table_postcopy(struct
> vhost_dev *dev,
> }
>
> memset(u->postcopy_client_bases, 0,
> - sizeof(uint64_t) * VHOST_MEMORY_MAX_NREGIONS);
> + sizeof(uint64_t) * VHOST_USER_MAX_RAM_SLOTS);
>
> /* They're in the same order as the regions that were sent
> * but some of the regions were skipped (above) if they
> @@ -564,18 +599,151 @@ static int vhost_user_set_mem_table_postcopy(struct
> vhost_dev *dev,
> return 0;
> }
>
> +static inline bool reg_equal(VhostUserMemoryRegion *shadow_reg,
> + struct vhost_memory_region *vdev_reg)
> +{
> + if (shadow_reg->guest_phys_addr == vdev_reg->guest_phys_addr &&
> + shadow_reg->userspace_addr == vdev_reg->userspace_addr &&
> + shadow_reg->memory_size == vdev_reg->memory_size) {
> + return true;
> + } else {
> + return false;
> + }
> +}
> +
> +static int vhost_user_send_add_remove_regions(struct vhost_dev *dev,
> + struct vhost_memory *mem,
> + VhostUserMsg *msg)
> +{
> + struct vhost_user *u = dev->opaque;
> + int i, j, fd;
> + bool found[VHOST_USER_MAX_RAM_SLOTS] = {};
> + bool matching = false;
> + struct vhost_memory_region *reg;
> + ram_addr_t offset;
> + MemoryRegion *mr;
> +
> + /*
> + * Ensure the VHOST_USER_PROTOCOL_F_REPLY_ACK has not been
> + * negotiated and no postcopy migration is in progress.
> + */
> + assert(!virtio_has_feature(dev->protocol_features,
> + VHOST_USER_PROTOCOL_F_REPLY_ACK));
> + if (u->postcopy_listen && u->postcopy_fd.handler) {
> + error_report("Postcopy migration is not supported when the "
> + "VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS feature "
> + "has been negotiated");
> + return -1;
> + }
> +
> + msg->hdr.size = sizeof(msg->payload.mem_reg.padding);
> + msg->hdr.size += sizeof(VhostUserMemoryRegion);
> +
> + /*
> + * Send VHOST_USER_REM_MEM_REG for memory regions in our shadow state
> + * which are not found not in the device's memory state.
double negation - could not parse this.
> + */
> + for (i = 0; i < u->num_shadow_regions; ++i) {
> + reg = dev->mem->regions;
> +
> + for (j = 0; j < dev->mem->nregions; j++) {
> + reg = dev->mem->regions + j;
> +
> + assert((uintptr_t)reg->userspace_addr == reg->userspace_addr);
> + mr = memory_region_from_host((void
> *)(uintptr_t)reg->userspace_addr,
> + &offset);
> + fd = memory_region_get_fd(mr);
> +
> + if (reg_equal(&u->shadow_regions[i], reg)) {
> + matching = true;
> + found[j] = true;
> + break;
> + }
> + }
> +
> + if (fd > 0 && !matching) {
> + msg->hdr.request = VHOST_USER_REM_MEM_REG;
> + msg->payload.mem_reg.region.userspace_addr = reg->userspace_addr;
> + msg->payload.mem_reg.region.memory_size = reg->memory_size;
> + msg->payload.mem_reg.region.guest_phys_addr =
> + reg->guest_phys_addr;
> + msg->payload.mem_reg.region.mmap_offset = offset;
> +
> + if (vhost_user_write(dev, msg, &fd, 1) < 0) {
> + return -1;
> + }
> + }
> + }
> +
> + /*
> + * Send messages to add regions present in the device which are not
> + * in our shadow state.
> + */
> + for (i = 0; i < dev->mem->nregions; ++i) {
> + reg = dev->mem->regions + i;
> +
> + /*
> + * If the region was in both the shadow and vdev state we don't
> + * need to send a VHOST_USER_ADD_MEM_REG message for it.
> + */
> + if (found[i]) {
> + continue;
> + }
> +
> + assert((uintptr_t)reg->userspace_addr == reg->userspace_addr);
> + mr = memory_region_from_host((void *)(uintptr_t)reg->userspace_addr,
> + &offset);
> + fd = memory_region_get_fd(mr);
> +
> + if (fd > 0) {
> + msg->hdr.request = VHOST_USER_ADD_MEM_REG;
> + msg->payload.mem_reg.region.userspace_addr = reg->userspace_addr;
> + msg->payload.mem_reg.region.memory_size = reg->memory_size;
> + msg->payload.mem_reg.region.guest_phys_addr =
> + reg->guest_phys_addr;
> + msg->payload.mem_reg.region.mmap_offset = offset;
> +
> + if (vhost_user_write(dev, msg, &fd, 1) < 0) {
> + return -1;
> + }
> + }
> + }
> +
> + /* Make our shadow state match the updated device state. */
> + u->num_shadow_regions = dev->mem->nregions;
> + for (i = 0; i < dev->mem->nregions; ++i) {
> + reg = dev->mem->regions + i;
> + u->shadow_regions[i].guest_phys_addr = reg->guest_phys_addr;
> + u->shadow_regions[i].userspace_addr = reg->userspace_addr;
> + u->shadow_regions[i].memory_size = reg->memory_size;
> + }
> +
> + return 0;
> +}
> +
> static int vhost_user_set_mem_table(struct vhost_dev *dev,
> struct vhost_memory *mem)
> {
> struct vhost_user *u = dev->opaque;
> - int fds[VHOST_MEMORY_MAX_NREGIONS];
> + int fds[VHOST_MEMORY_LEGACY_NREGIONS];
> size_t fd_num = 0;
> bool do_postcopy = u->postcopy_listen && u->postcopy_fd.handler;
> bool reply_supported = virtio_has_feature(dev->protocol_features,
>
> VHOST_USER_PROTOCOL_F_REPLY_ACK);
> + bool config_slots =
> + virtio_has_feature(dev->protocol_features,
> + VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS);
>
> if (do_postcopy) {
> - /* Postcopy has enough differences that it's best done in it's own
> + if (config_slots) {
> + error_report("Postcopy migration not supported with "
> + "VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS feature "
> + "enabled.");
> + return -1;
> + }
> +
> + /*
> + * Postcopy has enough differences that it's best done in it's own
> * version
> */
> return vhost_user_set_mem_table_postcopy(dev, mem);
> @@ -589,17 +757,22 @@ static int vhost_user_set_mem_table(struct vhost_dev
> *dev,
> msg.hdr.flags |= VHOST_USER_NEED_REPLY_MASK;
> }
>
> - if (vhost_user_fill_set_mem_table_msg(u, dev, &msg, fds, &fd_num,
> - false) < 0) {
> - return -1;
> - }
> -
> - if (vhost_user_write(dev, &msg, fds, fd_num) < 0) {
> - return -1;
> - }
> + if (config_slots && !reply_supported) {
> + if (vhost_user_send_add_remove_regions(dev, mem, &msg) < 0) {
> + return -1;
> + }
> + } else {
> + if (vhost_user_fill_set_mem_table_msg(u, dev, &msg, fds, &fd_num,
> + false) < 0) {
> + return -1;
> + }
> + if (vhost_user_write(dev, &msg, fds, fd_num) < 0) {
> + return -1;
> + }
>
> - if (reply_supported) {
> - return process_message_reply(dev, &msg);
> + if (reply_supported) {
> + return process_message_reply(dev, &msg);
> + }
> }
>
> return 0;
> @@ -764,7 +937,7 @@ static int vhost_set_vring_file(struct vhost_dev *dev,
> VhostUserRequest request,
> struct vhost_vring_file *file)
> {
> - int fds[VHOST_MEMORY_MAX_NREGIONS];
> + int fds[VHOST_USER_MAX_RAM_SLOTS];
> size_t fd_num = 0;
> VhostUserMsg msg = {
> .hdr.request = request,
> @@ -866,6 +1039,23 @@ static int vhost_user_get_features(struct vhost_dev
> *dev, uint64_t *features)
> return vhost_user_get_u64(dev, VHOST_USER_GET_FEATURES, features);
> }
>
> +static int vhost_user_get_max_memslots(struct vhost_dev *dev,
> + uint64_t *max_memslots)
> +{
> + uint64_t backend_max_memslots;
> + int err;
> +
> + err = vhost_user_get_u64(dev, VHOST_USER_GET_MAX_MEM_SLOTS,
> + &backend_max_memslots);
> + if (err < 0) {
> + return err;
> + }
> +
> + *max_memslots = MIN(backend_max_memslots, VHOST_USER_MAX_RAM_SLOTS);
> +
> + return *max_memslots;
> +}
> +
> static int vhost_user_set_owner(struct vhost_dev *dev)
> {
> VhostUserMsg msg = {
> @@ -1439,6 +1629,24 @@ static int vhost_user_backend_init(struct vhost_dev
> *dev, void *opaque)
> "slave-req protocol features.");
> return -1;
> }
> +
> + /* get max memory regions if backend supports configurable RAM slots
> */
> + if (!virtio_has_feature(dev->protocol_features,
> + VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS)) {
> + u->memory_slots = VHOST_MEMORY_LEGACY_NREGIONS;
> + } else if (virtio_has_feature(dev->protocol_features,
> + VHOST_USER_PROTOCOL_F_REPLY_ACK)) {
> + error_report("The VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS protocol
> "
> + "feature is not supported when the "
> + "VHOST_USER_PROTOCOL_F_REPLY_ACK features has also "
> + "been negotiated");
> + return -1;
> + } else {
> + err = vhost_user_get_max_memslots(dev, &u->memory_slots);
> + if (err < 0) {
> + return err;
> + }
> + }
> }
>
> if (dev->migration_blocker == NULL &&
> @@ -1502,7 +1710,9 @@ static int vhost_user_get_vq_index(struct vhost_dev
> *dev, int idx)
>
> static int vhost_user_memslots_limit(struct vhost_dev *dev)
> {
> - return VHOST_MEMORY_MAX_NREGIONS;
> + struct vhost_user *u = dev->opaque;
> +
> + return u->memory_slots;
> }
>
> static bool vhost_user_requires_shm_log(struct vhost_dev *dev)
> --
> 1.8.3.1
- Re: [PATCH v2 3/3] Lift max memory slots limit imposed by vhost-user,
Michael S. Tsirkin <=