qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH] memory: Don't use memcpy for ram marked as


From: Paolo Bonzini
Subject: Re: [Qemu-devel] [RFC PATCH] memory: Don't use memcpy for ram marked as skip_dump
Date: Sat, 22 Oct 2016 05:14:21 -0400 (EDT)


----- Original Message -----
> From: "Alex Williamson" <address@hidden>
> To: address@hidden
> Cc: address@hidden, "thorsten kohfeldt" <address@hidden>
> Sent: Friday, October 21, 2016 7:11:44 PM
> Subject: [RFC PATCH] memory: Don't use memcpy for ram marked as skip_dump
> 
> With a vfio assigned device we lay down a base MemoryRegion registered
> as an IO region, giving us read & write accessors.  If the region
> supports mmap, we lay down a higher priority sub-region MemoryRegion
> on top of the base layer initialized as a RAM pointer to the mmap.
> Finally, if we have any quirks for the device (ie. address ranges that
> need additional virtualization support), we put another IO sub-region
> on top of the mmap MemoryRegion.  When this is flattened, we now
> potentially have sub-page mmap MemoryRegions exposed which cannot be
> directly mapped through KVM.
> 
> This is as expected, but a subtle detail of this is that we end up
> with two different access mechanisms through QEMU.  If we disable the
> mmap MemoryRegion, we make use of the IO MemoryRegion and service
> accesses using pread and pwrite to the vfio device file descriptor.
> If the mmap MemoryRegion is enabled and we end up in one of these
> sub-page gaps, QEMU handles the access as RAM, using memcpy to the
> mmap.  Using the mmap through QEMU is a subtle difference, but it's
> fine, the problem is the memcpy.  My assumption is that memcpy makes
> no guarantees about access width and potentially uses all sorts of
> optimized memory transfers that are not intended for talking to device
> MMIO.  It turns out that this has been a problem for Realtek NIC
> assignment, which has such a quirk that creates a sub-page mmap
> MemoryRegion access.
> 
> My proposal to fix this is to leverage the skip_dump flag that we
> already use for special handling of these device-backed MMIO ranges.
> When skip_dump is set for a MemoryRegion, we mark memory access as
> non-direct and automatically insert MemoryRegionOps with basic
> semantics to handle accesses.  Note that we only enable dword
> accesses because some devices don't particularly like qword accesses
> (Realtek NICs are such a device).  This actually also fixes memory
> inspection via the xp command in the QEMU monitor as well.
> 
> Please comment.  Is this the best way to solve this problem?  Thanks

Looks good to me.

Paolo

> 
> Reported-by: Thorsten Kohfeldt <address@hidden>
> Signed-off-by: Alex Williamson <address@hidden>
> ---
>  include/exec/memory.h |    6 ++++--
>  memory.c              |   44 ++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 48 insertions(+), 2 deletions(-)
> 
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 10d7eac..a4c3acf 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -1464,9 +1464,11 @@ void *qemu_map_ram_ptr(RAMBlock *ram_block, ram_addr_t
> addr);
>  static inline bool memory_access_is_direct(MemoryRegion *mr, bool is_write)
>  {
>      if (is_write) {
> -        return memory_region_is_ram(mr) && !mr->readonly;
> +        return memory_region_is_ram(mr) &&
> +               !mr->readonly && !memory_region_is_skip_dump(mr);
>      } else {
> -        return memory_region_is_ram(mr) || memory_region_is_romd(mr);
> +        return (memory_region_is_ram(mr) && !memory_region_is_skip_dump(mr))
> ||
> +               memory_region_is_romd(mr);
>      }
>  }
>  
> diff --git a/memory.c b/memory.c
> index 58f9269..7ed7ca9 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -1136,6 +1136,46 @@ const MemoryRegionOps unassigned_mem_ops = {
>      .endianness = DEVICE_NATIVE_ENDIAN,
>  };
>  
> +static uint64_t skip_dump_mem_read(void *opaque, hwaddr addr, unsigned size)
> +{
> +    uint64_t val = (uint64_t)~0;
> +
> +    switch (size) {
> +    case 1:
> +        val = *(uint8_t *)(opaque + addr);
> +        break;
> +    case 2:
> +        val = *(uint16_t *)(opaque + addr);
> +        break;
> +    case 4:
> +        val = *(uint32_t *)(opaque + addr);
> +        break;
> +    }
> +
> +    return val;
> +}
> +
> +static void skip_dump_mem_write(void *opaque, hwaddr addr, uint64_t data,
> unsigned size)
> +{
> +    switch (size) {
> +    case 1:
> +        *(uint8_t *)(opaque + addr) = (uint8_t)data;
> +        break;
> +    case 2:
> +        *(uint16_t *)(opaque + addr) = (uint16_t)data;
> +        break;
> +    case 4:
> +        *(uint32_t *)(opaque + addr) = (uint32_t)data;
> +        break;
> +    }
> +}
> +
> +const MemoryRegionOps skip_dump_mem_ops = {
> +    .read = skip_dump_mem_read,
> +    .write = skip_dump_mem_write,
> +    .endianness = DEVICE_NATIVE_ENDIAN,
> +};
> +
>  bool memory_region_access_valid(MemoryRegion *mr,
>                                  hwaddr addr,
>                                  unsigned size,
> @@ -1366,6 +1406,10 @@ void memory_region_init_ram_ptr(MemoryRegion *mr,
>  void memory_region_set_skip_dump(MemoryRegion *mr)
>  {
>      mr->skip_dump = true;
> +    if (mr->ram && mr->ops == &unassigned_mem_ops) {
> +        mr->ops = &skip_dump_mem_ops;
> +        mr->opaque = mr->ram_block->host;
> +    }
>  }
>  
>  void memory_region_init_alias(MemoryRegion *mr,
> 
> 



reply via email to

[Prev in Thread] Current Thread [Next in Thread]