qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC] vhost-user: introduce F_NEED_ALL_IOTLB protocol f


From: Jason Wang
Subject: Re: [Qemu-devel] [RFC] vhost-user: introduce F_NEED_ALL_IOTLB protocol feature
Date: Thu, 12 Apr 2018 11:37:35 +0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0



On 2018年04月12日 09:57, Michael S. Tsirkin wrote:
On Thu, Apr 12, 2018 at 09:39:43AM +0800, Tiwei Bie wrote:
On Thu, Apr 12, 2018 at 04:29:29AM +0300, Michael S. Tsirkin wrote:
On Thu, Apr 12, 2018 at 09:10:59AM +0800, Tiwei Bie wrote:
On Wed, Apr 11, 2018 at 04:22:21PM +0300, Michael S. Tsirkin wrote:
On Wed, Apr 11, 2018 at 03:20:27PM +0800, Tiwei Bie wrote:
This patch introduces VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB
feature for vhost-user. By default, vhost-user backend needs
to query the IOTLBs from QEMU after meeting unknown IOVAs.
With this protocol feature negotiated, QEMU will provide all
the IOTLBs to vhost-user backend without waiting for the
queries from backend. This is helpful when using a hardware
accelerator which is not able to handle unknown IOVAs at the
vhost-user backend.

Signed-off-by: Tiwei Bie <address@hidden>
This is potentially a large amount of data to be sent
on a socket.
If we take the hardware accelerator out of this picture, we
will find that it's actually a question of "pre-loading" vs
"lazy-loading". I think neither of them is perfect.

For "pre-loading", as you said, we may have a tough starting.
But for "lazy-loading", we can't have a predictable performance.
A sudden, unexpected performance drop may happen at any time,
because we may meet an unknown IOVA at any time in this case.
That's how hardware behaves too though. So we can expect guests
to try to optimize locality.
The difference is that, the software implementation needs to
query the mappings via socket. And it's much slower..
If you are proposing this new feature as an optimization,
then I'd like to see numbers showing the performance gains.

It's definitely possible to optimize things out.  Pre-loading isn't
where I would start optimizing though.  For example, DPDK could have its
own VTD emulation, then it could access guest memory directly.

Have vtd emulation in dpdk have many disadvantages:

- vendor locked, can only work for intel
- duplication of codes and bugs
- a huge number of new message types needs to be invented

So I tend to go to a reverse way, link dpdk to qemu.



Once we meet an unknown IOVA, the backend's data path will need
to stop and query the mapping of the IOVA via the socket and
wait for the reply. And the latency is not negligible (sometimes
it's even unacceptable), especially in high performance network
case. So maybe it's better to make both of them available to
the vhost backend.

I had an impression that a hardware accelerator was using
VFIO anyway. Given this, can't we have QEMU program
the shadow IOMMU tables into VFIO directly?
I think it's a good idea! Currently, my concern about it is
that, the hardware device may not use IOMMU and it may have
its builtin address translation unit. And it will be a pain
for device vendors to teach VFIO to be able to work with the
builtin address translation unit.
I think such drivers would have to interate with VFIO somehow.
Otherwise, what is the plan for assigning such devices then?
Such devices are just for vhost data path acceleration.
That's not true I think.  E.g. RDMA devices have an on-card MMU.

They have many available queue pairs, the switch logic
will be done among those queue pairs. And different queue
pairs will serve different VMs directly.

Best regards,
Tiwei Bie
The way I would do it is attach different PASID values to
different queues. This way you can use the standard IOMMU
to enforce protection.

So that's just shared virtual memory on host which can share iova address space between a specific queue pair and a process. I'm not sure how hard can exist vhost-user backend to support this.

Thanks





Best regards,
Tiwei Bie


---
The idea of this patch is to let QEMU push all the IOTLBs
to vhost-user backend without waiting for the queries from
the backend. Because hardware accelerator at the vhost-user
backend may not be able to handle unknown IOVAs.

This is just a RFC for now. It seems that, it doesn't work
as expected when guest is using kernel driver (To handle
this case, it seems that some RAM regions' events also need
to be listened). Any comments would be appreciated! Thanks!

  docs/interop/vhost-user.txt       |  9 ++++++++
  hw/virtio/vhost-backend.c         |  7 ++++++
  hw/virtio/vhost-user.c            |  8 +++++++
  hw/virtio/vhost.c                 | 47 ++++++++++++++++++++++++++++++++++++---
  include/hw/virtio/vhost-backend.h |  3 +++
  5 files changed, 71 insertions(+), 3 deletions(-)

diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
index 534caab18a..73e07f9dad 100644
--- a/docs/interop/vhost-user.txt
+++ b/docs/interop/vhost-user.txt
@@ -380,6 +380,7 @@ Protocol features
  #define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7
  #define VHOST_USER_PROTOCOL_F_PAGEFAULT      8
  #define VHOST_USER_PROTOCOL_F_CONFIG         9
+#define VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB 10
Master message types
  --------------------
@@ -797,3 +798,11 @@ resilient for selective requests.
  For the message types that already solicit a reply from the client, the
  presence of VHOST_USER_PROTOCOL_F_REPLY_ACK or need_reply bit being set brings
  no behavioural change. (See the 'Communication' section for details.)
+
+VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB
+------------------------------------
+By default, vhost-user backend needs to query the IOTLBs from QEMU after
+meeting unknown IOVAs. With this protocol feature negotiated, QEMU will
+provide all the IOTLBs to vhost backend without waiting for the queries
+from backend. This is helpful when using a hardware accelerator which is
+not able to handle unknown IOVAs at the vhost-user backend.
diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
index 7f09efab8b..d72994e9a5 100644
--- a/hw/virtio/vhost-backend.c
+++ b/hw/virtio/vhost-backend.c
@@ -233,6 +233,11 @@ static void vhost_kernel_set_iotlb_callback(struct 
vhost_dev *dev,
          qemu_set_fd_handler((uintptr_t)dev->opaque, NULL, NULL, NULL);
  }
+static bool vhost_kernel_need_all_device_iotlb(struct vhost_dev *dev)
+{
+    return false;
+}
+
  static const VhostOps kernel_ops = {
          .backend_type = VHOST_BACKEND_TYPE_KERNEL,
          .vhost_backend_init = vhost_kernel_init,
@@ -264,6 +269,8 @@ static const VhostOps kernel_ops = {
  #endif /* CONFIG_VHOST_VSOCK */
          .vhost_set_iotlb_callback = vhost_kernel_set_iotlb_callback,
          .vhost_send_device_iotlb_msg = vhost_kernel_send_device_iotlb_msg,
+        .vhost_backend_need_all_device_iotlb =
+                                vhost_kernel_need_all_device_iotlb,
  };
int vhost_set_backend_type(struct vhost_dev *dev, VhostBackendType backend_type)
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 38da8692bb..30e0b1c13b 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -47,6 +47,7 @@ enum VhostUserProtocolFeature {
      VHOST_USER_PROTOCOL_F_CRYPTO_SESSION = 7,
      VHOST_USER_PROTOCOL_F_PAGEFAULT = 8,
      VHOST_USER_PROTOCOL_F_CONFIG = 9,
+    VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB = 10,
      VHOST_USER_PROTOCOL_F_MAX
  };
@@ -1581,6 +1582,12 @@ vhost_user_crypto_close_session(struct vhost_dev *dev, uint64_t session_id)
      return 0;
  }
+static bool vhost_user_need_all_device_iotlb(struct vhost_dev *dev)
+{
+    return virtio_has_feature(dev->protocol_features,
+                              VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB);
+}
+
  const VhostOps user_ops = {
          .backend_type = VHOST_BACKEND_TYPE_USER,
          .vhost_backend_init = vhost_user_init,
@@ -1611,4 +1618,5 @@ const VhostOps user_ops = {
          .vhost_set_config = vhost_user_set_config,
          .vhost_crypto_create_session = vhost_user_crypto_create_session,
          .vhost_crypto_close_session = vhost_user_crypto_close_session,
+        .vhost_backend_need_all_device_iotlb = 
vhost_user_need_all_device_iotlb,
  };
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index f51bf573d5..70922ee998 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -48,6 +48,9 @@ static unsigned int used_memslots;
  static QLIST_HEAD(, vhost_dev) vhost_devices =
      QLIST_HEAD_INITIALIZER(vhost_devices);
+static int vhost_memory_region_lookup(struct vhost_dev *hdev, uint64_t gpa,
+                                      uint64_t *uaddr, uint64_t *len);
+
  bool vhost_has_free_slot(void)
  {
      unsigned int slots_limit = ~0U;
@@ -634,12 +637,39 @@ static void vhost_region_addnop(MemoryListener *listener,
      vhost_region_add_section(dev, section);
  }
-static void vhost_iommu_unmap_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
+static void vhost_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
  {
      struct vhost_iommu *iommu = container_of(n, struct vhost_iommu, n);
      struct vhost_dev *hdev = iommu->hdev;
      hwaddr iova = iotlb->iova + iommu->iommu_offset;
+ if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
+        uint64_t uaddr, len;
+
+        rcu_read_lock();
+
+        if (iotlb->target_as != NULL) {
+            if (vhost_memory_region_lookup(hdev, iotlb->translated_addr,
+                        &uaddr, &len)) {
+                error_report("Fail to lookup the translated address "
+                        "%"PRIx64, iotlb->translated_addr);
+                goto out;
+            }
+
+            len = MIN(iotlb->addr_mask + 1, len);
+            iova = iova & ~iotlb->addr_mask;
+
+            if (vhost_backend_update_device_iotlb(hdev, iova, uaddr,
+                                                  len, iotlb->perm)) {
+                error_report("Fail to update device iotlb");
+                goto out;
+            }
+        }
+out:
+        rcu_read_unlock();
+        return;
+    }
+
      if (vhost_backend_invalidate_device_iotlb(hdev, iova,
                                                iotlb->addr_mask + 1)) {
          error_report("Fail to invalidate device iotlb");
@@ -652,6 +682,7 @@ static void vhost_iommu_region_add(MemoryListener *listener,
      struct vhost_dev *dev = container_of(listener, struct vhost_dev,
                                           iommu_listener);
      struct vhost_iommu *iommu;
+    IOMMUNotifierFlag flags;
      Int128 end;
if (!memory_region_is_iommu(section->mr)) {
@@ -662,8 +693,15 @@ static void vhost_iommu_region_add(MemoryListener 
*listener,
      end = int128_add(int128_make64(section->offset_within_region),
                       section->size);
      end = int128_sub(end, int128_one());
-    iommu_notifier_init(&iommu->n, vhost_iommu_unmap_notify,
-                        IOMMU_NOTIFIER_UNMAP,
+
+    if (dev->vhost_ops->vhost_backend_need_all_device_iotlb(dev)) {
+        flags = IOMMU_NOTIFIER_ALL;
+    } else {
+        flags = IOMMU_NOTIFIER_UNMAP;
+    }
+
+    iommu_notifier_init(&iommu->n, vhost_iommu_map_notify,
+                        flags,
                          section->offset_within_region,
                          int128_get64(end));
      iommu->mr = section->mr;
@@ -673,6 +711,9 @@ static void vhost_iommu_region_add(MemoryListener *listener,
      memory_region_register_iommu_notifier(section->mr, &iommu->n);
      QLIST_INSERT_HEAD(&dev->iommu_list, iommu, iommu_next);
      /* TODO: can replay help performance here? */
+    if (flags == IOMMU_NOTIFIER_ALL) {
+        memory_region_iommu_replay(IOMMU_MEMORY_REGION(iommu->mr), &iommu->n);
+    }
  }
static void vhost_iommu_region_del(MemoryListener *listener,
diff --git a/include/hw/virtio/vhost-backend.h 
b/include/hw/virtio/vhost-backend.h
index 5dac61f9ea..602fd987a4 100644
--- a/include/hw/virtio/vhost-backend.h
+++ b/include/hw/virtio/vhost-backend.h
@@ -101,6 +101,8 @@ typedef int (*vhost_crypto_create_session_op)(struct 
vhost_dev *dev,
  typedef int (*vhost_crypto_close_session_op)(struct vhost_dev *dev,
                                               uint64_t session_id);
+typedef bool (*vhost_backend_need_all_device_iotlb_op)(struct vhost_dev *dev);
+
  typedef struct VhostOps {
      VhostBackendType backend_type;
      vhost_backend_init vhost_backend_init;
@@ -138,6 +140,7 @@ typedef struct VhostOps {
      vhost_set_config_op vhost_set_config;
      vhost_crypto_create_session_op vhost_crypto_create_session;
      vhost_crypto_close_session_op vhost_crypto_close_session;
+    vhost_backend_need_all_device_iotlb_op vhost_backend_need_all_device_iotlb;
  } VhostOps;
extern const VhostOps user_ops;
--
2.11.0




reply via email to

[Prev in Thread] Current Thread [Next in Thread]