qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC v3 21/29] vhost: Add VhostIOVATree


From: Eugenio Perez Martin
Subject: Re: [RFC v3 21/29] vhost: Add VhostIOVATree
Date: Wed, 14 Jul 2021 11:14:35 +0200

On Wed, Jul 14, 2021 at 8:54 AM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Wed, Jul 14, 2021 at 5:04 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> >
> > 在 2021/6/1 下午4:15, Eugenio Perez Martin 写道:
> > > On Mon, May 31, 2021 at 11:40 AM Jason Wang <jasowang@redhat.com> wrote:
> > >>
> > >> 在 2021/5/20 上午12:28, Eugenio Pérez 写道:
> > >>> This tree is able to look for a translated address from a IOVA address.
> > >>>
> > >>> At first glance is similar to util/iova-tree. However, SVQ working on
> > >>> devices with limited IOVA space need more capabilities, like allocating
> > >>> IOVA chunks or perform reverse translations (qemu addresses to iova).
> > >>>
> > >>> Starting a sepparated implementation. Knowing than insertions/deletions
> > >>> will not be as frequent as searches,
> > >>
> > >> This might not be true if vIOMMU is enabled.
> > >>
> > > Right.
> > >
> > >>> it uses an ordered array at
> > >>> implementation.
> > >>
> > >> I wonder how much overhead could g_array be if it needs to grow.
> > >>
> > > I didn't do any tests, actually. But I see this VhostIOVATree as a
> > > replaceable tool, just to get the buffer translations to work. So I'm
> > > both ok with change it now and ok to delay it, since they should not
> > > be hard to do.
> > >
> > >>>    A different name could be used, but ordered
> > >>> searchable array is a little bit long though.
> > >>
> > >> Note that we had a very good example for this. That is the kernel iova
> > >> allocator which is implemented via rbtree.
> > >>
> > >> Instead of figuring out g_array vs g_tree stuffs, I would simple go with
> > >> g_tree first (based on util/iova-tree) and borrow the well design kernel
> > >> iova allocator API to have a generic IOVA one instead of coupling it
> > >> with vhost. It could be used by other userspace driver in the future:
> > >>
> > >> init_iova_domain()/put_iova_domain();
> > >>
> > >> alloc_iova()/free_iova();
> > >>
> > >> find_iova();
> > >>
> > > We could go that way, but then the iova-tree should be extended to
> > > support both translations (iova->translated_addr is now implemented in
> > > iova-tree, the reverse is not). If I understood you correctly,
> > > borrowing the kernel iova allocator would give us both, right?
> >
> >
> > No the reverse lookup is done via a specific IOMMU driver if I
> > understand it correctly.
> >
> > And if the mapping is 1:1 we can just use two iova-tree I guess.
> >
>
> I did try with two IOVATree, but the usage of the reversed one is
> confusing at best. To reuse most of the code, .iova needs to be
> .translated_addr, and the opposite. Maybe I can try again using a
> wrapper structure that reverses them on each operation (insert,
> search, ...).
>
> Thanks!
>

Another feature is also needed that IOVATree does not support:
Allocation, as the searching of a free hole in iova range [1]. Linux's
alloc_iova is actually the equivalent of vhost_iova_tree_insert in
this commit, that expects iova addresses to be specified, not
"allocated".

My first attempt to solve that was to add a second (third?) tree with
the free space. But this again complicates the code a lot, and misses
a few optimization opportunities (like the need of searching in many
trees instead of just one, and then reuse the result [2]. Maybe
iova_tree_foreach can be modified to achieve this, but then more code
needs to be modified.

[1] Range themselves are not supported in IOVATree natively, but I
think they are more or less straightforward to implement in
vhost-vdpa.

> >
> > >
> > > Note that it is not coupled to vhost at all except in the name: all
> > > the implementations only work with hwaddr and void pointers memory.
> > > Just to illustrate the point, I think it could be a drop-in
> > > replacement for iova-tree at this moment (with all the
> > > drawbacks/advantages of an array vs tree).
> >
> >
> > Ok.
> >
> > Thanks
> >
> >
> > >
> > >> Another reference is the iova allocator that is implemented in VFIO.
> > > I will check this too.
> > >
> > > Thanks!
> > >
> > >
> > >> Thanks
> > >>
> > >>
> > >>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > >>> ---
> > >>>    hw/virtio/vhost-iova-tree.h |  50 ++++++++++
> > >>>    hw/virtio/vhost-iova-tree.c | 188 
> > >>> ++++++++++++++++++++++++++++++++++++
> > >>>    hw/virtio/meson.build       |   2 +-
> > >>>    3 files changed, 239 insertions(+), 1 deletion(-)
> > >>>    create mode 100644 hw/virtio/vhost-iova-tree.h
> > >>>    create mode 100644 hw/virtio/vhost-iova-tree.c
> > >>>
> > >>> diff --git a/hw/virtio/vhost-iova-tree.h b/hw/virtio/vhost-iova-tree.h
> > >>> new file mode 100644
> > >>> index 0000000000..2a44af8b3a
> > >>> --- /dev/null
> > >>> +++ b/hw/virtio/vhost-iova-tree.h
> > >>> @@ -0,0 +1,50 @@
> > >>> +/*
> > >>> + * vhost software live migration ring
> > >>> + *
> > >>> + * SPDX-FileCopyrightText: Red Hat, Inc. 2021
> > >>> + * SPDX-FileContributor: Author: Eugenio Pérez <eperezma@redhat.com>
> > >>> + *
> > >>> + * SPDX-License-Identifier: GPL-2.0-or-later
> > >>> + */
> > >>> +
> > >>> +#ifndef HW_VIRTIO_VHOST_IOVA_TREE_H
> > >>> +#define HW_VIRTIO_VHOST_IOVA_TREE_H
> > >>> +
> > >>> +#include <gmodule.h>
> > >>> +
> > >>> +#include "exec/memory.h"
> > >>> +
> > >>> +typedef struct VhostDMAMap {
> > >>> +    void *translated_addr;
> > >>> +    hwaddr iova;
> > >>> +    hwaddr size;                /* Inclusive */
> > >>> +    IOMMUAccessFlags perm;
> > >>> +} VhostDMAMap;
> > >>> +
> > >>> +typedef enum VhostDMAMapNewRC {
> > >>> +    VHOST_DMA_MAP_OVERLAP = -2,
> > >>> +    VHOST_DMA_MAP_INVALID = -1,
> > >>> +    VHOST_DMA_MAP_OK = 0,
> > >>> +} VhostDMAMapNewRC;
> > >>> +
> > >>> +/**
> > >>> + * VhostIOVATree
> > >>> + *
> > >>> + * Store and search IOVA -> Translated mappings.
> > >>> + *
> > >>> + * Note that it cannot remove nodes.
> > >>> + */
> > >>> +typedef struct VhostIOVATree {
> > >>> +    /* Ordered array of reverse translations, IOVA address to qemu 
> > >>> memory. */
> > >>> +    GArray *iova_taddr_map;
> > >>> +} VhostIOVATree;
> > >>> +
> > >>> +void vhost_iova_tree_new(VhostIOVATree *iova_rm);
> > >>> +void vhost_iova_tree_destroy(VhostIOVATree *iova_rm);
> > >>> +
> > >>> +const VhostDMAMap *vhost_iova_tree_find_taddr(const VhostIOVATree 
> > >>> *iova_rm,
> > >>> +                                              const VhostDMAMap *map);
> > >>> +VhostDMAMapNewRC vhost_iova_tree_insert(VhostIOVATree *iova_rm,
> > >>> +                                        VhostDMAMap *map);
> > >>> +
> > >>> +#endif
> > >>> diff --git a/hw/virtio/vhost-iova-tree.c b/hw/virtio/vhost-iova-tree.c
> > >>> new file mode 100644
> > >>> index 0000000000..dfd7e448b5
> > >>> --- /dev/null
> > >>> +++ b/hw/virtio/vhost-iova-tree.c
> > >>> @@ -0,0 +1,188 @@
> > >>> +/*
> > >>> + * vhost software live migration ring
> > >>> + *
> > >>> + * SPDX-FileCopyrightText: Red Hat, Inc. 2021
> > >>> + * SPDX-FileContributor: Author: Eugenio Pérez <eperezma@redhat.com>
> > >>> + *
> > >>> + * SPDX-License-Identifier: GPL-2.0-or-later
> > >>> + */
> > >>> +
> > >>> +#include "qemu/osdep.h"
> > >>> +#include "vhost-iova-tree.h"
> > >>> +
> > >>> +#define G_ARRAY_NOT_ZERO_TERMINATED false
> > >>> +#define G_ARRAY_NOT_CLEAR_ON_ALLOC false
> > >>> +
> > >>> +/**
> > >>> + * Inserts an element after an existing one in garray.
> > >>> + *
> > >>> + * @array      The array
> > >>> + * @prev_elem  The previous element of array of NULL if prepending

[2] If I remember correctly.

> > >>> + * @map        The DMA map
> > >>> + *
> > >>> + * It provides the aditional advantage of being type safe over
> > >>> + * g_array_insert_val, which accepts a reference pointer instead of a 
> > >>> value
> > >>> + * with no complains.
> > >>> + */
> > >>> +static void vhost_iova_tree_insert_after(GArray *array,
> > >>> +                                         const VhostDMAMap *prev_elem,
> > >>> +                                         const VhostDMAMap *map)
> > >>> +{
> > >>> +    size_t pos;
> > >>> +
> > >>> +    if (!prev_elem) {
> > >>> +        pos = 0;
> > >>> +    } else {
> > >>> +        pos = prev_elem - &g_array_index(array, typeof(*prev_elem), 0) 
> > >>> + 1;
> > >>> +    }
> > >>> +
> > >>> +    g_array_insert_val(array, pos, *map);
> > >>> +}
> > >>> +
> > >>> +static gint vhost_iova_tree_cmp_iova(gconstpointer a, gconstpointer b)
> > >>> +{
> > >>> +    const VhostDMAMap *m1 = a, *m2 = b;
> > >>> +
> > >>> +    if (m1->iova > m2->iova + m2->size) {
> > >>> +        return 1;
> > >>> +    }
> > >>> +
> > >>> +    if (m1->iova + m1->size < m2->iova) {
> > >>> +        return -1;
> > >>> +    }
> > >>> +
> > >>> +    /* Overlapped */
> > >>> +    return 0;
> > >>> +}
> > >>> +
> > >>> +/**
> > >>> + * Find the previous node to a given iova
> > >>> + *
> > >>> + * @array  The ascending ordered-by-translated-addr array of 
> > >>> VhostDMAMap
> > >>> + * @map    The map to insert
> > >>> + * @prev   Returned location of the previous map
> > >>> + *
> > >>> + * Return VHOST_DMA_MAP_OK if everything went well, or 
> > >>> VHOST_DMA_MAP_OVERLAP if
> > >>> + * it already exists. It is ok to use this function to check if a 
> > >>> given range
> > >>> + * exists, but it will use a linear search.
> > >>> + *
> > >>> + * TODO: We can use bsearch to locate the entry if we save the state 
> > >>> in the
> > >>> + * needle, knowing that the needle is always the first argument to
> > >>> + * compare_func.
> > >>> + */
> > >>> +static VhostDMAMapNewRC vhost_iova_tree_find_prev(const GArray *array,
> > >>> +                                                  GCompareFunc 
> > >>> compare_func,
> > >>> +                                                  const VhostDMAMap 
> > >>> *map,
> > >>> +                                                  const VhostDMAMap 
> > >>> **prev)
> > >>> +{
> > >>> +    size_t i;
> > >>> +    int r;
> > >>> +
> > >>> +    *prev = NULL;
> > >>> +    for (i = 0; i < array->len; ++i) {
> > >>> +        r = compare_func(map, &g_array_index(array, typeof(*map), i));
> > >>> +        if (r == 0) {
> > >>> +            return VHOST_DMA_MAP_OVERLAP;
> > >>> +        }
> > >>> +        if (r < 0) {
> > >>> +            return VHOST_DMA_MAP_OK;
> > >>> +        }
> > >>> +
> > >>> +        *prev = &g_array_index(array, typeof(**prev), i);
> > >>> +    }
> > >>> +
> > >>> +    return VHOST_DMA_MAP_OK;
> > >>> +}
> > >>> +
> > >>> +/**
> > >>> + * Create a new IOVA tree
> > >>> + *
> > >>> + * @tree  The IOVA tree
> > >>> + */
> > >>> +void vhost_iova_tree_new(VhostIOVATree *tree)
> > >>> +{
> > >>> +    assert(tree);
> > >>> +
> > >>> +    tree->iova_taddr_map = g_array_new(G_ARRAY_NOT_ZERO_TERMINATED,
> > >>> +                                       G_ARRAY_NOT_CLEAR_ON_ALLOC,
> > >>> +                                       sizeof(VhostDMAMap));
> > >>> +}
> > >>> +
> > >>> +/**
> > >>> + * Destroy an IOVA tree
> > >>> + *
> > >>> + * @tree  The iova tree
> > >>> + */
> > >>> +void vhost_iova_tree_destroy(VhostIOVATree *tree)
> > >>> +{
> > >>> +    g_array_unref(g_steal_pointer(&tree->iova_taddr_map));
> > >>> +}
> > >>> +
> > >>> +/**
> > >>> + * Perform a search on a GArray.
> > >>> + *
> > >>> + * @array Glib array
> > >>> + * @map Map to look up
> > >>> + * @compare_func Compare function to use
> > >>> + *
> > >>> + * Return The found element or NULL if not found.
> > >>> + *
> > >>> + * This can be replaced with g_array_binary_search (Since glib 2.62) 
> > >>> when that
> > >>> + * is common enough.
> > >>> + */
> > >>> +static const VhostDMAMap *vhost_iova_tree_bsearch(const GArray *array,
> > >>> +                                                  const VhostDMAMap 
> > >>> *map,
> > >>> +                                                  GCompareFunc 
> > >>> compare_func)
> > >>> +{
> > >>> +    return bsearch(map, array->data, array->len, sizeof(*map), 
> > >>> compare_func);
> > >>> +}
> > >>> +
> > >>> +/**
> > >>> + * Find the translated address stored from a IOVA address
> > >>> + *
> > >>> + * @tree  The iova tree
> > >>> + * @map   The map with the memory address
> > >>> + *
> > >>> + * Return the stored mapping, or NULL if not found.
> > >>> + */
> > >>> +const VhostDMAMap *vhost_iova_tree_find_taddr(const VhostIOVATree 
> > >>> *tree,
> > >>> +                                              const VhostDMAMap *map)
> > >>> +{
> > >>> +    return vhost_iova_tree_bsearch(tree->iova_taddr_map, map,
> > >>> +                                  vhost_iova_tree_cmp_iova);
> > >>> +}
> > >>> +
> > >>> +/**
> > >>> + * Insert a new map
> > >>> + *
> > >>> + * @tree  The iova tree
> > >>> + * @map   The iova map
> > >>> + *
> > >>> + * Returns:
> > >>> + * - VHOST_DMA_MAP_OK if the map fits in the container
> > >>> + * - VHOST_DMA_MAP_INVALID if the map does not make sense (like size 
> > >>> overflow)
> > >>> + * - VHOST_DMA_MAP_OVERLAP if the tree already contains that map
> > >>> + * Can query the assignated iova in map.
> > >>> + */
> > >>> +VhostDMAMapNewRC vhost_iova_tree_insert(VhostIOVATree *tree,
> > >>> +                                        VhostDMAMap *map)
> > >>> +{
> > >>> +    const VhostDMAMap *prev;
> > >>> +    int find_prev_rc;
> > >>> +
> > >>> +    if (map->translated_addr + map->size < map->translated_addr ||
> > >>> +        map->iova + map->size < map->iova || map->perm == IOMMU_NONE) {
> > >>> +        return VHOST_DMA_MAP_INVALID;
> > >>> +    }
> > >>> +
> > >>> +    /* Check for duplicates, and save position for insertion */
> > >>> +    find_prev_rc = vhost_iova_tree_find_prev(tree->iova_taddr_map,
> > >>> +                                             vhost_iova_tree_cmp_iova, 
> > >>> map,
> > >>> +                                             &prev);
> > >>> +    if (find_prev_rc == VHOST_DMA_MAP_OVERLAP) {
> > >>> +        return VHOST_DMA_MAP_OVERLAP;
> > >>> +    }
> > >>> +
> > >>> +    vhost_iova_tree_insert_after(tree->iova_taddr_map, prev, map);
> > >>> +    return VHOST_DMA_MAP_OK;
> > >>> +}
> > >>> diff --git a/hw/virtio/meson.build b/hw/virtio/meson.build
> > >>> index 8b5a0225fe..cb306b83c6 100644
> > >>> --- a/hw/virtio/meson.build
> > >>> +++ b/hw/virtio/meson.build
> > >>> @@ -11,7 +11,7 @@ softmmu_ss.add(when: 'CONFIG_ALL', if_true: 
> > >>> files('vhost-stub.c'))
> > >>>
> > >>>    virtio_ss = ss.source_set()
> > >>>    virtio_ss.add(files('virtio.c'))
> > >>> -virtio_ss.add(when: 'CONFIG_VHOST', if_true: files('vhost.c', 
> > >>> 'vhost-backend.c', 'vhost-shadow-virtqueue.c'))
> > >>> +virtio_ss.add(when: 'CONFIG_VHOST', if_true: files('vhost.c', 
> > >>> 'vhost-backend.c', 'vhost-shadow-virtqueue.c', 'vhost-iova-tree.c'))
> > >>>    virtio_ss.add(when: 'CONFIG_VHOST_USER', if_true: 
> > >>> files('vhost-user.c'))
> > >>>    virtio_ss.add(when: 'CONFIG_VHOST_VDPA', if_true: 
> > >>> files('vhost-vdpa.c'))
> > >>>    virtio_ss.add(when: 'CONFIG_VIRTIO_BALLOON', if_true: 
> > >>> files('virtio-balloon.c'))
> >




reply via email to

[Prev in Thread] Current Thread [Next in Thread]