qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] Supporting emulation of IOMMUs


From: David Gibson
Subject: [Qemu-devel] Supporting emulation of IOMMUs
Date: Thu, 21 Apr 2011 17:03:47 +1000
User-agent: Mutt/1.5.20 (2009-06-14)

A few months ago, Eduard - Gabriel Munteanu posted a series of patches
implementing support for emulating the AMD PCI IOMMU
(http://lists.nongnu.org/archive/html/qemu-devel/2011-01/msg03196.html).

In fact, this series implemented a general DMA/IOMMU layer which can
be used by any device model, and one translation backend for this
implementing the AMD specific PCI IOMMU.

These patches don't seem to have gone anywhere for the last few
months, however, and so far I've been unable to contact the author
(trying again with this mail).

I have an interest in this code, because the pSeries machine will also
need IOMMU emulation support.  At present we only support virtual
devices, through the PAPR interface, and we have support for the
hypervisor-controller IOMMU translation in the PAPR VIO code.
However, we want to add PCI device support and this will also need
IOMMU translation.

The series seems to have the right basic approach, so if the author is
indeed MIA, I was planning to pick up the patches and resubmit them
(with support for the pSeries IOMMU added).

Before I do that, I was hoping to get some consensus that this is the
right way to go.  For reference, I have an updated version of the
first patch (which adds the core IOMMU layer) below.

From: Eduard - Gabriel Munteanu <address@hidden>
Date: Fri, 4 Feb 2011 01:32:55 +0200
Subject: [PATCH] Generic DMA memory access interface

This introduces replacements for memory access functions like
cpu_physical_memory_read(). The new interface can handle address
translation and access checking through an IOMMU.

David Gibson: I have made several bugfixes and cleanups to Eduard's
original patch.

 * dma_memory_rw() was incorrectly using (uninitialized) plen instead
   of len in the fallback to no-IOMMU case.

 * the dma_memory_map() tracking was storing the guest physical
   address of each mapping, but not the qemu user virtual address.
   However in unmap() it was then attempting to lookup by virtual
   using a completely bogus cast.

 * The dma_memory_rw() function is moved from dma_rw.h to dma_rw.c, it
   was a bit too much code for an inline.

 * IOMMU support is now available on all target platforms, not just
   i386, but is configurable (--enable-iommu/--disable-iommu).  Stubs
   are used so that individual drivers can use the new dma interface
   and it will turn into old-style cpu physical accesses at no cost on
   IOMMU-less builds.

Signed-off-by: Eduard - Gabriel Munteanu <address@hidden>
Signed-off-by: David Gibson <address@hidden>
---
 Makefile.target |    1 +
 configure       |   12 ++++
 hw/dma_rw.c     |  147 +++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/dma_rw.h     |  156 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 316 insertions(+), 0 deletions(-)
 create mode 100644 hw/dma_rw.c
 create mode 100644 hw/dma_rw.h

diff --git a/Makefile.target b/Makefile.target
index 95f5eda..c3d36c6 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -198,6 +198,7 @@ obj-$(CONFIG_REALLY_VIRTFS) += virtio-9p.o
 obj-y += rwhandler.o
 obj-$(CONFIG_KVM) += kvm.o kvm-all.o
 obj-$(CONFIG_NO_KVM) += kvm-stub.o
+obj-$(CONFIG_IOMMU) += dma_rw.o
 LIBS+=-lz
 
 QEMU_CFLAGS += $(VNC_TLS_CFLAGS)
diff --git a/configure b/configure
index da2da04..fa6f4d5 100755
--- a/configure
+++ b/configure
@@ -130,6 +130,7 @@ xen=""
 linux_aio=""
 attr=""
 vhost_net=""
+iommu="no"
 xfs=""
 
 gprof="no"
@@ -723,6 +724,10 @@ for opt do
   ;;
   --enable-vhost-net) vhost_net="yes"
   ;;
+  --enable-iommu) iommu="yes"
+  ;;
+  --disable-iommu) iommu="no"
+  ;;
   --disable-opengl) opengl="no"
   ;;
   --enable-opengl) opengl="yes"
@@ -934,6 +939,8 @@ echo "  --enable-docs            enable documentation build"
 echo "  --disable-docs           disable documentation build"
 echo "  --disable-vhost-net      disable vhost-net acceleration support"
 echo "  --enable-vhost-net       enable vhost-net acceleration support"
+echo "  --disable-iommu          disable IOMMU emulation support"
+echo "  --enable-vhost-net       enable IOMMU emulation support"
 echo "  --enable-trace-backend=B Set trace backend"
 echo "                           Available backends:" 
$("$source_path"/scripts/tracetool --list-backends)
 echo "  --with-trace-file=NAME   Full PATH,NAME of file to store traces"
@@ -2608,6 +2615,7 @@ echo "madvise           $madvise"
 echo "posix_madvise     $posix_madvise"
 echo "uuid support      $uuid"
 echo "vhost-net support $vhost_net"
+echo "IOMMU support     $iommu"
 echo "Trace backend     $trace_backend"
 echo "Trace output file $trace_file-<pid>"
 echo "spice support     $spice"
@@ -3412,6 +3420,10 @@ if test "$target_softmmu" = "yes" -a \( \
   echo "CONFIG_NEED_MMU=y" >> $config_target_mak
 fi
 
+if test "$iommu" = "yes" ; then
+  echo "CONFIG_IOMMU=y" >> $config_target_mak
+fi
+
 if test "$gprof" = "yes" ; then
   echo "TARGET_GPROF=yes" >> $config_target_mak
   if test "$target_linux_user" = "yes" ; then
diff --git a/hw/dma_rw.c b/hw/dma_rw.c
new file mode 100644
index 0000000..627835c
--- /dev/null
+++ b/hw/dma_rw.c
@@ -0,0 +1,147 @@
+/*
+ * Generic DMA memory access interface.
+ *
+ * Copyright (c) 2011 Eduard - Gabriel Munteanu
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "dma_rw.h"
+#include "range.h"
+
+void dma_memory_rw(DMADevice *dev, dma_addr_t addr,
+                   void *buf, dma_addr_t len, int is_write)
+{
+    dma_addr_t paddr, plen;
+    int err;
+
+    /*
+     * Fast-path non-iommu.
+     * More importantly, makes it obvious what this function does.
+     */
+    if (NO_IOMMU(dev)) {
+        cpu_physical_memory_rw(addr, buf, len, is_write);
+        return;
+    }
+
+    while (len) {
+        err = dev->mmu->translate(dev, addr, &paddr, &plen, is_write);
+        if (err) {
+            return;
+        }
+
+        /* The translation might be valid for larger regions. */
+        if (plen > len) {
+            plen = len;
+        }
+
+        cpu_physical_memory_rw(paddr, buf, plen, is_write);
+
+        len -= plen;
+        addr += plen;
+        buf += plen;
+    }
+}
+
+typedef struct DMAMemoryMap DMAMemoryMap;
+struct DMAMemoryMap {
+    dma_addr_t              addr;
+    size_t                  len;
+    void *                  buf;
+    DMAInvalidateMapFunc    *invalidate;
+    void                    *invalidate_opaque;
+
+    QLIST_ENTRY(DMAMemoryMap) list;
+};
+
+void dma_invalidate_memory_range(DMADevice *dev,
+                                 dma_addr_t addr, dma_addr_t len)
+{
+    DMAMemoryMap *map;
+
+    QLIST_FOREACH(map, &dev->mmu->memory_maps, list) {
+        if (ranges_overlap(addr, len, map->addr, map->len)) {
+            map->invalidate(map->invalidate_opaque);
+            QLIST_REMOVE(map, list);
+            free(map);
+        }
+    }
+}
+
+void *dma_memory_map(DMADevice *dev, DMAInvalidateMapFunc *cb, void *opaque,
+                     dma_addr_t addr, dma_addr_t *len, int is_write)
+{
+    int err;
+    target_phys_addr_t paddr, plen;
+    void *buf;
+
+    if (NO_IOMMU(dev)) {
+        return cpu_physical_memory_map(addr, len, is_write);
+    }
+
+    plen = *len;
+    err = dev->mmu->translate(dev, addr, &paddr, &plen, is_write);
+    if (err) {
+        return NULL;
+    }
+
+    /*
+     * If this is true, the virtual region is contiguous,
+     * but the translated physical region isn't. We just
+     * clamp *len, much like cpu_physical_memory_map() does.
+     */
+    if (plen < *len) {
+        *len = plen;
+    }
+
+    buf = cpu_physical_memory_map(paddr, len, is_write);
+
+    /* We treat maps as remote TLBs to cope with stuff like AIO. */
+    if (cb) {
+        DMAMemoryMap *map;
+
+        map = qemu_malloc(sizeof(DMAMemoryMap));
+        map->addr = addr;
+        map->len = *len;
+        map->buf = buf;
+        map->invalidate = cb;
+        map->invalidate_opaque = opaque;
+        
+        QLIST_INSERT_HEAD(&dev->mmu->memory_maps, map, list);
+    }
+
+    return buf;
+}
+
+void dma_memory_unmap(DMADevice *dev, void *buffer, dma_addr_t len,
+                      int is_write, dma_addr_t access_len)
+{
+    cpu_physical_memory_unmap(buffer, len, is_write, access_len);
+    if (!NO_IOMMU(dev)) {
+        DMAMemoryMap *map;
+
+        QLIST_FOREACH(map, &dev->mmu->memory_maps, list) {
+            if ((map->buf == buffer) && (map->len == len)) {
+                QLIST_REMOVE(map, list);
+                free(map);
+            }
+        }
+    }
+}
+
diff --git a/hw/dma_rw.h b/hw/dma_rw.h
new file mode 100644
index 0000000..58a3d0f
--- /dev/null
+++ b/hw/dma_rw.h
@@ -0,0 +1,156 @@
+#ifndef DMA_RW_H
+#define DMA_RW_H
+
+#include "qemu-common.h"
+
+typedef uint64_t dma_addr_t;
+
+typedef struct DMAMmu DMAMmu;
+typedef struct DMADevice DMADevice;
+
+typedef int DMATranslateFunc(DMADevice *dev,
+                             dma_addr_t addr,
+                             dma_addr_t *paddr,
+                             dma_addr_t *len,
+                             int is_write);
+
+typedef void DMAInvalidateMapFunc(void *);
+
+#ifndef CONFIG_IOMMU
+struct DMAMmu {
+};
+
+struct DMADevice {
+};
+
+#define NO_IOMMU(_dev) (1)
+
+static inline void dma_memory_rw(DMADevice *dev, dma_addr_t addr,
+                                 void *buf, dma_addr_t len, int is_write)
+{
+    cpu_physical_memory_rw(addr, buf, len, is_write);
+}
+
+static inline void *dma_memory_map(DMADevice *dev,
+                                   DMAInvalidateMapFunc *cb, void *opaque,
+                                   dma_addr_t addr, dma_addr_t *len,
+                                   int is_write)
+{
+    return cpu_physical_memory_map(addr, len, is_write);
+}
+
+static inline void dma_memory_unmap(DMADevice *dev,
+                                    void *buffer, dma_addr_t len,
+                                    int is_write, dma_addr_t access_len)
+{
+    cpu_physical_memory_unmap(buffer, len, is_write, access_len);
+}
+
+#define DEFINE_DMA_LD(suffix, size)                                       \
+static inline uint##size##_t                                              \
+dma_ld##suffix(DMADevice *dev, dma_addr_t addr)                           \
+{                                                                         \
+    return ld##suffix##_phys(addr);                                       \
+}
+
+#define DEFINE_DMA_ST(suffix, size)                                       \
+static inline void                                                        \
+dma_st##suffix(DMADevice *dev, dma_addr_t addr, uint##size##_t val)       \
+{                                                                         \
+    st##suffix##_phys(addr, val);                                         \
+}
+
+#else
+struct DMAMmu {
+    DeviceState *iommu;
+    DMATranslateFunc *translate;
+    QLIST_HEAD(memory_maps, DMAMemoryMap) memory_maps;
+};
+
+struct DMADevice {
+    DMAMmu *mmu;
+};
+
+#define NO_IOMMU(_dev) (!(_dev) || !(_dev)->mmu)
+
+void dma_memory_rw(DMADevice *dev, dma_addr_t addr,
+                   void *buf, dma_addr_t len, int is_write);
+
+void *dma_memory_map(DMADevice *dev,
+                     DMAInvalidateMapFunc *cb, void *opaque,
+                     dma_addr_t addr, dma_addr_t *len,
+                     int is_write);
+void dma_memory_unmap(DMADevice *dev,
+                      void *buffer, dma_addr_t len,
+                      int is_write, dma_addr_t access_len);
+
+void dma_invalidate_memory_range(DMADevice *dev,
+                                 dma_addr_t addr, dma_addr_t len);
+
+/* warning: like the corresponding ldX_phys / stX_phys functions, these
+ * DMA accessors can only handle aligned accesses */
+
+#define DEFINE_DMA_LD(suffix, size)                                       \
+static inline uint##size##_t                                              \
+dma_ld##suffix(DMADevice *dev, dma_addr_t addr)                           \
+{                                                                         \
+    int err;                                                              \
+    dma_addr_t paddr, plen;                                               \
+                                                                          \
+    if (NO_IOMMU(dev)) {                                                  \
+        return ld##suffix##_phys(addr);                                   \
+    }                                                                     \
+                                                                          \
+    err = dev->mmu->translate(dev, addr, &paddr, &plen, 0);               \
+    if (err || (plen < size / 8))                                         \
+        return 0;                                                         \
+                                                                          \
+    return ld##suffix##_phys(paddr);                                      \
+}
+
+#define DEFINE_DMA_ST(suffix, size)                                       \
+static inline void                                                        \
+dma_st##suffix(DMADevice *dev, dma_addr_t addr, uint##size##_t val)       \
+{                                                                         \
+    int err;                                                              \
+    target_phys_addr_t paddr, plen;                                       \
+                                                                          \
+    if (NO_IOMMU(dev)) {                                                  \
+        st##suffix##_phys(addr, val);                                     \
+        return;                                                           \
+    }                                                                     \
+    err = dev->mmu->translate(dev, addr, &paddr, &plen, 1);               \
+    if (err || (plen < size / 8))                                         \
+        return;                                                           \
+                                                                          \
+    st##suffix##_phys(paddr, val);                                        \
+}
+#endif /* CONFIG_IOMMU */
+
+DEFINE_DMA_LD(ub, 8)
+DEFINE_DMA_LD(uw, 16)
+DEFINE_DMA_LD(l, 32)
+DEFINE_DMA_LD(q, 64)
+
+DEFINE_DMA_ST(b, 8)
+DEFINE_DMA_ST(w, 16)
+DEFINE_DMA_ST(l, 32)
+DEFINE_DMA_ST(q, 64)
+
+static inline void dma_memory_read(DMADevice *dev,
+                                   dma_addr_t addr,
+                                   void *buf,
+                                   dma_addr_t len)
+{
+    dma_memory_rw(dev, addr, buf, len, 0);
+}
+
+static inline void dma_memory_write(DMADevice *dev,
+                                    dma_addr_t addr,
+                                    const void *buf,
+                                    dma_addr_t len)
+{
+    dma_memory_rw(dev, addr, (void *) buf, len, 1);
+}
+
+#endif /* DMA_RW_H */
-- 
1.7.1


-- 
David Gibson                    | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
                                | _way_ _around_!
http://www.ozlabs.org/~dgibson



reply via email to

[Prev in Thread] Current Thread [Next in Thread]