qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC 4/4] MemoryRegion with EOI callbacks for VFIO Plat


From: Alvise Rigo
Subject: Re: [Qemu-devel] [RFC 4/4] MemoryRegion with EOI callbacks for VFIO Platform devices
Date: Fri, 25 Apr 2014 11:07:59 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0

Hi Eric,

Thank you for reviewing it.

On 23/04/2014 17:00, Eric Auger wrote:
> Hi Alvise,
> 
> Thank you for the patch. Indeed I am very interested in further
> discussing the vfio-platform integration with you.
> 
> On 04/17/2014 07:29 PM, Alvise Rigo wrote:
>> The user can specify the location of the memory region (register) used
>> by the guest driver to clear the pending interrupt; if enable, this
>> mechanism will substitute the default one (timer based).
> 
> actually the current mechanism is that any read/write access to the
> device memory region (currently #0) after an IRQ hit is trapped and
> interpreted as an attempt of the guest driver to reset the interrupt
> status register. This is quite coarse!
> 
> To me the timer mechanism is another thing that makes possible to turn
> off this trapping after a while if no IRQ is pending anymore.
>>
>> The region is provided as command line property "intclr-region" of the
>> vfio-platform device. The property is a string "region_index;offset;size" 
>> where:
>>
>>     region_index:   is the index of the memory region where the register
>>                     lives,
>>     offset:         offset of the register in the region,
>>     size:           size of the register.
>>
>> example:
>>     -device vfio-platform,...,intclr-region="0;0x2c;4"
> 
> I fully understand the rationale behind reducing the memory region that
> is trapped for detecting device IRQ status reset.
> 
> However the code as of today does not always reduce the trapped region
> size. Let me share with you my understanding:
> 
> given the granularity of the MMU settings (4kB or 64kB granule) it is
> not possible to trap just a register. In practice defining such IO
> region will force a whole page to be unmapped from stage2. and both
> Midway xgmac and PL330 node region is 4kB. This means that when
> providing such option, this is the whole register space (for those IPs)
> that is ALWAYS unmapped from stage2 point of view. This means upon guest
> access KVM_RUN exists and handle_mmio is entered. Then in
> cpu_physical_memory_rw,  address_space_rw, memory_access_is_direct
> checks whether the memory region is IO or RAM leading either to
> io_mem_read/write (calling QEMU device callbacks) or qemu_get_ram_ptr
> and memcpy (using "direct" host access).

You are right, this solution will cause the guest to trap every time it
will access the device memory: I was not considering the MMU stage2
unmapping behaviour.

However, if we keep the time mechanism together with the interrupt clear
memory region, some good things could happen: in a scenario with high
throughput of interrupts, such a solution could definitively be an
improvement, allowing us to reflect the behaviour of the real interrupt
with the emulated one. Moreover, in case of a high throughput of IRQ/s
and accesses as well, at every access the callback will not check every
IRQ/s to see if it's pending, saving a lot of work.

On the contrary, in a scenario where there is a mild traffic of
interrupts, we should try to keep the mmap region as more as possible
like you suggested, relaying however on the interrupt clear memory
region to trap only the access that is in charge to disable the
interrupt. In fact, what I noticed with the DMA330 is that the kernel
driver was accessing two registers of the device during the timer window
before actually writing to the interrupt clear register: this means that
the EOI is made at the first access and not a the third.
In the first scenario the timer could play a fundamental role, which is
to define the temporal window where interrupts could possibly hit again
and renew the timer (in order to avoid subsequent enable/disable of the
mmap memory region).
Is then needed a timer for each IRQ?

I will update the patch with these last ideas to have in case a base for
further discussion.

Best regards,
alvise

> 
> My conclusion is your patch improves the precision of the EOI however it
> may decrease the perf in case where the guest could benefit from a
> direct nested stage1 + stage2 access to the HW device memory. This is
> where the timer mechanism can play a role, restoring the mmap after a
> while. So to me your patch is not antagonist with timer mechanism and
> you should restore it. Besides, as documented in PCI code also, in case
> IPs are generating a very high throughput of IRQ/s, the mmap region is
> not so much helpful. However I guess that for less intensive IRQ IPs,
> the mmap region definitively brings all its value and shall remain.
> 
> Let me know if you share the same understanding.
> 
> A last comment on the option passing. Philosophy of the previous patch
> was to hide much of the complexity to the end-user. Passing the IRQ
> status register address and size requires some knowledge from the
> end-user. I don't know what will be the community drift. There is a
> separate thread related to "vl.c: Enable adding devices to the system
> bus" patch,
> https://lists.gnu.org/archive/html/qemu-devel/2014-04/msg01797.html.
> 
> 
> Best Regards
> 
> Eric
> 
> 
>> Signed-off-by: Alvise Rigo <address@hidden>
>> ---
>>  hw/vfio/platform.c | 158 
>> ++++++++++++++++++++++++++++++++++++++++++++++++++---
>>  1 file changed, 150 insertions(+), 8 deletions(-)
>>
>> diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
>> index c4a4286..9dae311 100644
>> --- a/hw/vfio/platform.c
>> +++ b/hw/vfio/platform.c
>> @@ -40,6 +40,7 @@
>>  
>>  #include "vfio-common.h"
>>  
>> +/*#define DEBUG_VFIO 1*/
>>  #ifdef DEBUG_VFIO
>>  #define DPRINTF(fmt, ...) \
>>      do { fprintf(stderr, "vfio: %s: " fmt, __func__, ## __VA_ARGS__); } \
>> @@ -65,6 +66,7 @@ typedef struct VFIORegion {
>>      size_t size;
>>      uint32_t flags; /* VFIO region flags (rd/wr/mmap) */
>>      uint8_t nr; /* cache the region number for debug */
>> +    MemoryRegion *intclr;
>>  } VFIORegion;
>>  
>>  
>> @@ -99,6 +101,8 @@ typedef struct VFIODevice {
>>      QLIST_ENTRY(VFIODevice) next;
>>      struct VFIOGroup *group;
>>      QLIST_HEAD(, VFIOINTp) intp_list;
>> +    char *intclr_region_str; /* clear interrupt region string */
>> +    bool has_intclr_region;
>>  } VFIODevice;
>>  
>>  
>> @@ -120,6 +124,48 @@ void vfio_get_props(SysBusDevice *s, char **pname,
>>       *psize = vdev->regions[0].size;
>>  }
>>  
>> +static int parse_clrint_string(const char *intclr_str, uint32_t *id_reg,
>> +                                    hwaddr *off_reg, uint64_t *size_reg)
>> +{
>> +    char *str_ptr = g_strdup(intclr_str);
>> +    char *idx, *off, *size;
>> +
>> +    if (strlen(intclr_str) < 5) {
>> +        return -1;
>> +    }
>> +
>> +    idx = str_ptr;
>> +    str_ptr = strchr(str_ptr, ';');
>> +    if (!str_ptr || idx == str_ptr) {
>> +        return -1;
>> +    }
>> +    *str_ptr = '\0';
>> +
>> +    off = ++str_ptr;
>> +    str_ptr = strchr(str_ptr, ';');
>> +    if (!str_ptr || off == str_ptr) {
>> +        return -1;
>> +    }
>> +    *str_ptr = '\0';
>> +
>> +    size = ++str_ptr;
>> +    if (!*size) {
>> +        return -1;
>> +    }
>> +
>> +    *id_reg = strtol(idx, NULL, 10);
>> +    *off_reg = strtol(off, NULL, 0);
>> +    *size_reg = strtol(size, NULL, 10);
>> +
>> +    if (errno == EINVAL || errno == ERANGE) {
>> +        return -1;
>> +    }
>> +
>> +    DPRINTF("intclr region - id: %u offset: 0x%"HWADDR_PRIx" size: 
>> %"PRIx64"\n",
>> +                                                *id_reg, *off_reg, 
>> *size_reg);
>> +    return 0;
>> +}
>> +
>>  static void vfio_disable_irqindex(VFIODevice *vdev, int index)
>>  {
>>      struct vfio_irq_set irq_set = {
>> @@ -175,7 +221,7 @@ static void vfio_intp_mmap_enable(void *opaque)
>>  
>>  
>>  /*
>> - * The fd handler
>> + * The fd handlers
>>   */
>>  
>>  
>> @@ -216,7 +262,22 @@ static void vfio_intp_interrupt(void *opaque)
>>  
>>  }
>>  
>> +static void vfio_clrint_with_region(void *opaque)
>> +{
>> +    int ret;
>> +    VFIOINTp *intp = (VFIOINTp *)opaque;
>>  
>> +    ret = event_notifier_test_and_clear(&intp->interrupt);
>> +    if (!ret) {
>> +        DPRINTF("Error when clearing fd=%d\n",
>> +                event_notifier_get_fd(&intp->interrupt));
>> +    }
>> +
>> +    intp->pending = true;
>> +
>> +    /* trigger the virtual IRQ */
>> +    qemu_set_irq(intp->qemuirq, 1);
>> +}
>>  
>>  static void vfio_irq_eoi(VFIODevice *vdev)
>>  {
>> @@ -245,10 +306,57 @@ static void vfio_irq_eoi(VFIODevice *vdev)
>>      }
>>  
>>      return;
>> +}
>> +
>> +static uint64_t vfio_intclr_reg_read(void *opaque, hwaddr addr, unsigned 
>> size)
>> +{
>> +    VFIORegion *region = opaque;
>> +
>> +    uint32_t data;
>> +
>> +    if (size != 4) {
>> +        hw_error("unsupported size during intclr read");
>> +    }
>>  
>> +    if (pread(region->fd, &data, size, region->fd_offset
>> +                + region->intclr->addr + addr) != size) {
>> +        error_report("(0x%"HWADDR_PRIx", %d) failed: %m",
>> +                     addr, size);
>> +        return (uint64_t)-1;
>> +    }
>> +
>> +    DPRINTF("(region %d, addr= 0x%"HWADDR_PRIx", data=%d) = 0x%"PRIx32"\n",
>> +            region->nr, addr, size, data);
>> +
>> +    vfio_irq_eoi(container_of(region, VFIODevice, regions[region->nr]));
>> +
>> +    return data;
>>  }
>>  
>> +static void vfio_intclr_reg_write(void *opaque, hwaddr addr,
>> +                               uint64_t data, unsigned size)
>> +{
>> +    VFIORegion *region = opaque;
>> +
>> +    uint32_t buf = data;
>> +
>> +    if (pwrite(region->fd, &buf, size, region->fd_offset
>> +                + region->intclr->addr + addr) != size) {
>> +        error_report("(0x%"HWADDR_PRIx", 0x%"PRIx64", %d) failed: %m",
>> +                     addr, data, size);
>> +    }
>> +
>> +    DPRINTF("(region %d, addr=0x%"HWADDR_PRIx", data= 0x%"PRIx64", %d)\n",
>> +            region->nr, addr, data, size);
>>  
>> +    vfio_irq_eoi(container_of(region, VFIODevice, regions[region->nr]));
>> +}
>> +
>> +static const struct MemoryRegionOps clrint_ops = {
>> +    .read = vfio_intclr_reg_read,
>> +    .write = vfio_intclr_reg_write,
>> +    .endianness = DEVICE_LITTLE_ENDIAN,
>> +};
>>  
>>  static int vfio_enable_intp(VFIODevice *vdev, unsigned int index)
>>  {
>> @@ -268,6 +376,7 @@ static int vfio_enable_intp(VFIODevice *vdev, unsigned 
>> int index)
>>                                      vfio_intp_mmap_enable, intp);
>>  
>>      /* TO DO: currently each IRQ has the same mmap timeout */
>> +
>>      intp->mmap_timeout = intp->vdev->mmap_timeout;
>>  
>>      sysbus_init_irq(sbdev, &intp->qemuirq);
>> @@ -302,7 +411,11 @@ static int vfio_enable_intp(VFIODevice *vdev, unsigned 
>> int index)
>>  
>>      DPRINTF("register fd=%d/irq index=%d to kernel\n", *pfd, index);
>>  
>> -    qemu_set_fd_handler(*pfd, vfio_intp_interrupt, NULL, intp);
>> +    if (vdev->has_intclr_region) {
>> +        qemu_set_fd_handler(*pfd, vfio_clrint_with_region, NULL, intp);
>> +    } else {
>> +        qemu_set_fd_handler(*pfd, vfio_intp_interrupt, NULL, intp);
>> +    }
>>  
>>      /* pass the index/fd binding to the kernel driver so that it
>>       * triggers this fd on HW IRQ
>> @@ -323,7 +436,6 @@ static int vfio_enable_intp(VFIODevice *vdev, unsigned 
>> int index)
>>      QLIST_INSERT_HEAD(&vdev->intp_list, intp, next);
>>  
>>      return 0;
>> -
>>  }
>>  
>>  
>> @@ -380,7 +492,7 @@ static int vfio_mmap_region(VFIODevice *vdev, VFIORegion 
>> *region,
>>              goto error;
>>          }
>>  
>> -        memory_region_init_ram_ptr(submem, OBJECT(vdev), name, size, *map);
>> +        memory_region_init_ptr(submem, OBJECT(vdev), name, size, *map);
>>      }
>>  
>>      memory_region_add_subregion(mem, offset, submem);
>> @@ -501,10 +613,10 @@ static void vfio_map_region(VFIODevice *vdev, int nr)
>>  
>>  static void vfio_unmap_region(VFIODevice *vdev, int nr)
>>  {
>> -VFIORegion *region = &vdev->regions[nr];
>> -memory_region_del_subregion(&region->mem, &region->mmap_mem);
>> -munmap(region->mmap, memory_region_size(&region->mmap_mem));
>> -memory_region_destroy(&region->mmap_mem);
>> +    VFIORegion *region = &vdev->regions[nr];
>> +    memory_region_del_subregion(&region->mem, &region->mmap_mem);
>> +    munmap(region->mmap, memory_region_size(&region->mmap_mem));
>> +    memory_region_destroy(&region->mmap_mem);
>>  }
>>  
>>  
>> @@ -669,6 +781,17 @@ static void vfio_platform_realize(DeviceState *dev, 
>> Error **errp)
>>          }
>>      }
>>  
>> +    hwaddr intclr_off = 0;
>> +    uint64_t intclr_size = 0;
>> +    uint32_t id = 0;
>> +    if (vdev->intclr_region_str) {
>> +        ret = parse_clrint_string(vdev->intclr_region_str, &id, &intclr_off,
>> +                                                              &intclr_size);
>> +    } else {
>> +        ret = -1;
>> +    }
>> +    vdev->has_intclr_region = (!ret) ? true : false;
>> +
>>      ret = vfio_get_device(group, path, vdev);
>>      if (ret) {
>>          error_report("vfio: failed to get device %s", path);
>> @@ -678,6 +801,24 @@ static void vfio_platform_realize(DeviceState *dev, 
>> Error **errp)
>>  
>>      for (i = 0; i < vdev->num_regions; i++) {
>>          vfio_map_region(vdev, i);
>> +
>> +        if (id == i && vdev->has_intclr_region) {
>> +            char *clrint_region_name = NULL;
>> +            /* add clear interrupt region */
>> +            MemoryRegion *irq_reg = g_new(MemoryRegion, 1);
>> +            vdev->regions[i].intclr = irq_reg;
>> +
>> +            clrint_region_name = g_strconcat(vdev->name, " intr.reg", NULL);
>> +            memory_region_init_io(irq_reg, OBJECT(vdev), &clrint_ops,
>> +                    &vdev->regions[i], clrint_region_name,
>> +                    intclr_size);
>> +
>> +            memory_region_add_subregion(&vdev->regions[i].mmap_mem,
>> +                    intclr_off, irq_reg);
>> +
>> +            g_free(clrint_region_name);
>> +        }
>> +
>>          sysbus_init_mmio(sbdev, &vdev->regions[i].mem);
>>      }
>>  }
>> @@ -729,6 +870,7 @@ typedef struct VFIOPlatformDeviceClass {
>>  static Property vfio_platform_dev_properties[] = {
>>  DEFINE_PROP_STRING("vfio_device", VFIODevice, name),
>>  DEFINE_PROP_STRING("compat", VFIODevice, compat),
>> +DEFINE_PROP_STRING("intclr-region", VFIODevice, intclr_region_str),
>>  DEFINE_PROP_UINT32("mmap-timeout-ms", VFIODevice, mmap_timeout, 1100),
>>  DEFINE_PROP_END_OF_LIST(),
>>  };
>>
> 



reply via email to

[Prev in Thread] Current Thread [Next in Thread]