qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC 4/4] MemoryRegion with EOI callbacks for VFIO Plat


From: Eric Auger
Subject: Re: [Qemu-devel] [RFC 4/4] MemoryRegion with EOI callbacks for VFIO Platform devices
Date: Wed, 23 Apr 2014 17:00:47 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0

Hi Alvise,

Thank you for the patch. Indeed I am very interested in further
discussing the vfio-platform integration with you.

On 04/17/2014 07:29 PM, Alvise Rigo wrote:
> The user can specify the location of the memory region (register) used
> by the guest driver to clear the pending interrupt; if enable, this
> mechanism will substitute the default one (timer based).

actually the current mechanism is that any read/write access to the
device memory region (currently #0) after an IRQ hit is trapped and
interpreted as an attempt of the guest driver to reset the interrupt
status register. This is quite coarse!

To me the timer mechanism is another thing that makes possible to turn
off this trapping after a while if no IRQ is pending anymore.
> 
> The region is provided as command line property "intclr-region" of the
> vfio-platform device. The property is a string "region_index;offset;size" 
> where:
> 
>     region_index:   is the index of the memory region where the register
>                     lives,
>     offset:         offset of the register in the region,
>     size:           size of the register.
> 
> example:
>     -device vfio-platform,...,intclr-region="0;0x2c;4"

I fully understand the rationale behind reducing the memory region that
is trapped for detecting device IRQ status reset.

However the code as of today does not always reduce the trapped region
size. Let me share with you my understanding:

given the granularity of the MMU settings (4kB or 64kB granule) it is
not possible to trap just a register. In practice defining such IO
region will force a whole page to be unmapped from stage2. and both
Midway xgmac and PL330 node region is 4kB. This means that when
providing such option, this is the whole register space (for those IPs)
that is ALWAYS unmapped from stage2 point of view. This means upon guest
access KVM_RUN exists and handle_mmio is entered. Then in
cpu_physical_memory_rw,  address_space_rw, memory_access_is_direct
checks whether the memory region is IO or RAM leading either to
io_mem_read/write (calling QEMU device callbacks) or qemu_get_ram_ptr
and memcpy (using "direct" host access).

My conclusion is your patch improves the precision of the EOI however it
may decrease the perf in case where the guest could benefit from a
direct nested stage1 + stage2 access to the HW device memory. This is
where the timer mechanism can play a role, restoring the mmap after a
while. So to me your patch is not antagonist with timer mechanism and
you should restore it. Besides, as documented in PCI code also, in case
IPs are generating a very high throughput of IRQ/s, the mmap region is
not so much helpful. However I guess that for less intensive IRQ IPs,
the mmap region definitively brings all its value and shall remain.

Let me know if you share the same understanding.

A last comment on the option passing. Philosophy of the previous patch
was to hide much of the complexity to the end-user. Passing the IRQ
status register address and size requires some knowledge from the
end-user. I don't know what will be the community drift. There is a
separate thread related to "vl.c: Enable adding devices to the system
bus" patch,
https://lists.gnu.org/archive/html/qemu-devel/2014-04/msg01797.html.


Best Regards

Eric


> Signed-off-by: Alvise Rigo <address@hidden>
> ---
>  hw/vfio/platform.c | 158 
> ++++++++++++++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 150 insertions(+), 8 deletions(-)
> 
> diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
> index c4a4286..9dae311 100644
> --- a/hw/vfio/platform.c
> +++ b/hw/vfio/platform.c
> @@ -40,6 +40,7 @@
>  
>  #include "vfio-common.h"
>  
> +/*#define DEBUG_VFIO 1*/
>  #ifdef DEBUG_VFIO
>  #define DPRINTF(fmt, ...) \
>      do { fprintf(stderr, "vfio: %s: " fmt, __func__, ## __VA_ARGS__); } \
> @@ -65,6 +66,7 @@ typedef struct VFIORegion {
>      size_t size;
>      uint32_t flags; /* VFIO region flags (rd/wr/mmap) */
>      uint8_t nr; /* cache the region number for debug */
> +    MemoryRegion *intclr;
>  } VFIORegion;
>  
>  
> @@ -99,6 +101,8 @@ typedef struct VFIODevice {
>      QLIST_ENTRY(VFIODevice) next;
>      struct VFIOGroup *group;
>      QLIST_HEAD(, VFIOINTp) intp_list;
> +    char *intclr_region_str; /* clear interrupt region string */
> +    bool has_intclr_region;
>  } VFIODevice;
>  
>  
> @@ -120,6 +124,48 @@ void vfio_get_props(SysBusDevice *s, char **pname,
>       *psize = vdev->regions[0].size;
>  }
>  
> +static int parse_clrint_string(const char *intclr_str, uint32_t *id_reg,
> +                                    hwaddr *off_reg, uint64_t *size_reg)
> +{
> +    char *str_ptr = g_strdup(intclr_str);
> +    char *idx, *off, *size;
> +
> +    if (strlen(intclr_str) < 5) {
> +        return -1;
> +    }
> +
> +    idx = str_ptr;
> +    str_ptr = strchr(str_ptr, ';');
> +    if (!str_ptr || idx == str_ptr) {
> +        return -1;
> +    }
> +    *str_ptr = '\0';
> +
> +    off = ++str_ptr;
> +    str_ptr = strchr(str_ptr, ';');
> +    if (!str_ptr || off == str_ptr) {
> +        return -1;
> +    }
> +    *str_ptr = '\0';
> +
> +    size = ++str_ptr;
> +    if (!*size) {
> +        return -1;
> +    }
> +
> +    *id_reg = strtol(idx, NULL, 10);
> +    *off_reg = strtol(off, NULL, 0);
> +    *size_reg = strtol(size, NULL, 10);
> +
> +    if (errno == EINVAL || errno == ERANGE) {
> +        return -1;
> +    }
> +
> +    DPRINTF("intclr region - id: %u offset: 0x%"HWADDR_PRIx" size: 
> %"PRIx64"\n",
> +                                                *id_reg, *off_reg, 
> *size_reg);
> +    return 0;
> +}
> +
>  static void vfio_disable_irqindex(VFIODevice *vdev, int index)
>  {
>      struct vfio_irq_set irq_set = {
> @@ -175,7 +221,7 @@ static void vfio_intp_mmap_enable(void *opaque)
>  
>  
>  /*
> - * The fd handler
> + * The fd handlers
>   */
>  
>  
> @@ -216,7 +262,22 @@ static void vfio_intp_interrupt(void *opaque)
>  
>  }
>  
> +static void vfio_clrint_with_region(void *opaque)
> +{
> +    int ret;
> +    VFIOINTp *intp = (VFIOINTp *)opaque;
>  
> +    ret = event_notifier_test_and_clear(&intp->interrupt);
> +    if (!ret) {
> +        DPRINTF("Error when clearing fd=%d\n",
> +                event_notifier_get_fd(&intp->interrupt));
> +    }
> +
> +    intp->pending = true;
> +
> +    /* trigger the virtual IRQ */
> +    qemu_set_irq(intp->qemuirq, 1);
> +}
>  
>  static void vfio_irq_eoi(VFIODevice *vdev)
>  {
> @@ -245,10 +306,57 @@ static void vfio_irq_eoi(VFIODevice *vdev)
>      }
>  
>      return;
> +}
> +
> +static uint64_t vfio_intclr_reg_read(void *opaque, hwaddr addr, unsigned 
> size)
> +{
> +    VFIORegion *region = opaque;
> +
> +    uint32_t data;
> +
> +    if (size != 4) {
> +        hw_error("unsupported size during intclr read");
> +    }
>  
> +    if (pread(region->fd, &data, size, region->fd_offset
> +                + region->intclr->addr + addr) != size) {
> +        error_report("(0x%"HWADDR_PRIx", %d) failed: %m",
> +                     addr, size);
> +        return (uint64_t)-1;
> +    }
> +
> +    DPRINTF("(region %d, addr= 0x%"HWADDR_PRIx", data=%d) = 0x%"PRIx32"\n",
> +            region->nr, addr, size, data);
> +
> +    vfio_irq_eoi(container_of(region, VFIODevice, regions[region->nr]));
> +
> +    return data;
>  }
>  
> +static void vfio_intclr_reg_write(void *opaque, hwaddr addr,
> +                               uint64_t data, unsigned size)
> +{
> +    VFIORegion *region = opaque;
> +
> +    uint32_t buf = data;
> +
> +    if (pwrite(region->fd, &buf, size, region->fd_offset
> +                + region->intclr->addr + addr) != size) {
> +        error_report("(0x%"HWADDR_PRIx", 0x%"PRIx64", %d) failed: %m",
> +                     addr, data, size);
> +    }
> +
> +    DPRINTF("(region %d, addr=0x%"HWADDR_PRIx", data= 0x%"PRIx64", %d)\n",
> +            region->nr, addr, data, size);
>  
> +    vfio_irq_eoi(container_of(region, VFIODevice, regions[region->nr]));
> +}
> +
> +static const struct MemoryRegionOps clrint_ops = {
> +    .read = vfio_intclr_reg_read,
> +    .write = vfio_intclr_reg_write,
> +    .endianness = DEVICE_LITTLE_ENDIAN,
> +};
>  
>  static int vfio_enable_intp(VFIODevice *vdev, unsigned int index)
>  {
> @@ -268,6 +376,7 @@ static int vfio_enable_intp(VFIODevice *vdev, unsigned 
> int index)
>                                      vfio_intp_mmap_enable, intp);
>  
>      /* TO DO: currently each IRQ has the same mmap timeout */
> +
>      intp->mmap_timeout = intp->vdev->mmap_timeout;
>  
>      sysbus_init_irq(sbdev, &intp->qemuirq);
> @@ -302,7 +411,11 @@ static int vfio_enable_intp(VFIODevice *vdev, unsigned 
> int index)
>  
>      DPRINTF("register fd=%d/irq index=%d to kernel\n", *pfd, index);
>  
> -    qemu_set_fd_handler(*pfd, vfio_intp_interrupt, NULL, intp);
> +    if (vdev->has_intclr_region) {
> +        qemu_set_fd_handler(*pfd, vfio_clrint_with_region, NULL, intp);
> +    } else {
> +        qemu_set_fd_handler(*pfd, vfio_intp_interrupt, NULL, intp);
> +    }
>  
>      /* pass the index/fd binding to the kernel driver so that it
>       * triggers this fd on HW IRQ
> @@ -323,7 +436,6 @@ static int vfio_enable_intp(VFIODevice *vdev, unsigned 
> int index)
>      QLIST_INSERT_HEAD(&vdev->intp_list, intp, next);
>  
>      return 0;
> -
>  }
>  
>  
> @@ -380,7 +492,7 @@ static int vfio_mmap_region(VFIODevice *vdev, VFIORegion 
> *region,
>              goto error;
>          }
>  
> -        memory_region_init_ram_ptr(submem, OBJECT(vdev), name, size, *map);
> +        memory_region_init_ptr(submem, OBJECT(vdev), name, size, *map);
>      }
>  
>      memory_region_add_subregion(mem, offset, submem);
> @@ -501,10 +613,10 @@ static void vfio_map_region(VFIODevice *vdev, int nr)
>  
>  static void vfio_unmap_region(VFIODevice *vdev, int nr)
>  {
> -VFIORegion *region = &vdev->regions[nr];
> -memory_region_del_subregion(&region->mem, &region->mmap_mem);
> -munmap(region->mmap, memory_region_size(&region->mmap_mem));
> -memory_region_destroy(&region->mmap_mem);
> +    VFIORegion *region = &vdev->regions[nr];
> +    memory_region_del_subregion(&region->mem, &region->mmap_mem);
> +    munmap(region->mmap, memory_region_size(&region->mmap_mem));
> +    memory_region_destroy(&region->mmap_mem);
>  }
>  
>  
> @@ -669,6 +781,17 @@ static void vfio_platform_realize(DeviceState *dev, 
> Error **errp)
>          }
>      }
>  
> +    hwaddr intclr_off = 0;
> +    uint64_t intclr_size = 0;
> +    uint32_t id = 0;
> +    if (vdev->intclr_region_str) {
> +        ret = parse_clrint_string(vdev->intclr_region_str, &id, &intclr_off,
> +                                                              &intclr_size);
> +    } else {
> +        ret = -1;
> +    }
> +    vdev->has_intclr_region = (!ret) ? true : false;
> +
>      ret = vfio_get_device(group, path, vdev);
>      if (ret) {
>          error_report("vfio: failed to get device %s", path);
> @@ -678,6 +801,24 @@ static void vfio_platform_realize(DeviceState *dev, 
> Error **errp)
>  
>      for (i = 0; i < vdev->num_regions; i++) {
>          vfio_map_region(vdev, i);
> +
> +        if (id == i && vdev->has_intclr_region) {
> +            char *clrint_region_name = NULL;
> +            /* add clear interrupt region */
> +            MemoryRegion *irq_reg = g_new(MemoryRegion, 1);
> +            vdev->regions[i].intclr = irq_reg;
> +
> +            clrint_region_name = g_strconcat(vdev->name, " intr.reg", NULL);
> +            memory_region_init_io(irq_reg, OBJECT(vdev), &clrint_ops,
> +                    &vdev->regions[i], clrint_region_name,
> +                    intclr_size);
> +
> +            memory_region_add_subregion(&vdev->regions[i].mmap_mem,
> +                    intclr_off, irq_reg);
> +
> +            g_free(clrint_region_name);
> +        }
> +
>          sysbus_init_mmio(sbdev, &vdev->regions[i].mem);
>      }
>  }
> @@ -729,6 +870,7 @@ typedef struct VFIOPlatformDeviceClass {
>  static Property vfio_platform_dev_properties[] = {
>  DEFINE_PROP_STRING("vfio_device", VFIODevice, name),
>  DEFINE_PROP_STRING("compat", VFIODevice, compat),
> +DEFINE_PROP_STRING("intclr-region", VFIODevice, intclr_region_str),
>  DEFINE_PROP_UINT32("mmap-timeout-ms", VFIODevice, mmap_timeout, 1100),
>  DEFINE_PROP_END_OF_LIST(),
>  };
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]