[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-ppc] [PATCH qemu v7 2/4] vfio/pci: Relax DMA map errors for MM
From: |
Alexey Kardashevskiy |
Subject: |
Re: [Qemu-ppc] [PATCH qemu v7 2/4] vfio/pci: Relax DMA map errors for MMIO regions |
Date: |
Mon, 26 Feb 2018 19:36:34 +1100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0 |
On 19/02/18 13:46, Alexey Kardashevskiy wrote:
> On 16/02/18 16:28, David Gibson wrote:
>> On Wed, Feb 14, 2018 at 08:55:41AM -0700, Alex Williamson wrote:
>>> On Wed, 14 Feb 2018 19:09:16 +1100
>>> Alexey Kardashevskiy <address@hidden> wrote:
>>>
>>>> On 14/02/18 12:33, David Gibson wrote:
>>>>> On Tue, Feb 13, 2018 at 07:20:56PM +1100, Alexey Kardashevskiy wrote:
>>>>>> On 13/02/18 16:41, David Gibson wrote:
>>>>>>> On Tue, Feb 13, 2018 at 04:36:30PM +1100, David Gibson wrote:
>>>>>>>> On Tue, Feb 13, 2018 at 12:15:52PM +1100, Alexey Kardashevskiy wrote:
>>>>>>>>> On 13/02/18 03:06, Alex Williamson wrote:
>>>>>>>>>> On Mon, 12 Feb 2018 18:05:54 +1100
>>>>>>>>>> Alexey Kardashevskiy <address@hidden> wrote:
>>>>>>>>>>
>>>>>>>>>>> On 12/02/18 16:19, David Gibson wrote:
>>>>>>>>>>>> On Fri, Feb 09, 2018 at 06:55:01PM +1100, Alexey Kardashevskiy
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> At the moment if vfio_memory_listener is registered in the system
>>>>>>>>>>>>> memory
>>>>>>>>>>>>> address space, it maps/unmaps every RAM memory region for DMA.
>>>>>>>>>>>>> It expects system page size aligned memory sections so
>>>>>>>>>>>>> vfio_dma_map
>>>>>>>>>>>>> would not fail and so far this has been the case. A mapping
>>>>>>>>>>>>> failure
>>>>>>>>>>>>> would be fatal. A side effect of such behavior is that some MMIO
>>>>>>>>>>>>> pages
>>>>>>>>>>>>> would not be mapped silently.
>>>>>>>>>>>>>
>>>>>>>>>>>>> However we are going to change MSIX BAR handling so we will end
>>>>>>>>>>>>> having
>>>>>>>>>>>>> non-aligned sections in vfio_memory_listener (more details is in
>>>>>>>>>>>>> the next patch) and vfio_dma_map will exit QEMU.
>>>>>>>>>>>>>
>>>>>>>>>>>>> In order to avoid fatal failures on what previously was not a
>>>>>>>>>>>>> failure and
>>>>>>>>>>>>> was just silently ignored, this checks the section alignment to
>>>>>>>>>>>>> the smallest supported IOMMU page size and prints an error if not
>>>>>>>>>>>>> aligned;
>>>>>>>>>>>>> it also prints an error if vfio_dma_map failed despite the page
>>>>>>>>>>>>> size check.
>>>>>>>>>>>>> Both errors are not fatal; only MMIO RAM regions are checked
>>>>>>>>>>>>> (aka "RAM device" regions).
>>>>>>>>>>>>>
>>>>>>>>>>>>> If the amount of errors printed is overwhelming, the MSIX
>>>>>>>>>>>>> relocation
>>>>>>>>>>>>> could be used to avoid excessive error output.
>>>>>>>>>>>>>
>>>>>>>>>>>>> This is unlikely to cause any behavioral change.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Signed-off-by: Alexey Kardashevskiy <address@hidden>
>>>>>>>>>>>>
>>>>>>>>>>>> There are some relatively superficial problems noted below.
>>>>>>>>>>>>
>>>>>>>>>>>> But more fundamentally, this feels like it's extending an existing
>>>>>>>>>>>> hack past the point of usefulness.
>>>>>>>>>>>>
>>>>>>>>>>>> The explicit check for is_ram_device() here has always bothered me
>>>>>>>>>>>> -
>>>>>>>>>>>> it's not like a real bus bridge magically knows whether a target
>>>>>>>>>>>> address maps to RAM or not.
>>>>>>>>>>>>
>>>>>>>>>>>> What I think is really going on is that even for systems without an
>>>>>>>>>>>> IOMMU, it's not really true to say that the PCI address space maps
>>>>>>>>>>>> directly onto address_space_memory. Instead, there's a large, but
>>>>>>>>>>>> much less than 2^64 sized, "upstream window" at address 0 on the
>>>>>>>>>>>> PCI
>>>>>>>>>>>> bus, which is identity mapped to the system bus. Details will vary
>>>>>>>>>>>> with the system, but in practice we expect nothing but RAM to be in
>>>>>>>>>>>> that window. Addresses not within that window won't be mapped to
>>>>>>>>>>>> the
>>>>>>>>>>>> system bus but will just be broadcast on the PCI bus and might be
>>>>>>>>>>>> picked up as a p2p transaction.
>>>>>>>>>>>
>>>>>>>>>>> Currently this p2p works only via the IOMMU, direct p2p is not
>>>>>>>>>>> possible as
>>>>>>>>>>> the guest needs to know physical MMIO addresses to make p2p work
>>>>>>>>>>> and it
>>>>>>>>>>> does not.
>>>>>>>>>>
>>>>>>>>>> /me points to the Direct Translated P2P section of the ACS spec,
>>>>>>>>>> though
>>>>>>>>>> it's as prone to spoofing by the device as ATS. In any case, p2p
>>>>>>>>>> reflected from the IOMMU is still p2p and offloads the CPU even if
>>>>>>>>>> bandwidth suffers vs bare metal depending on if the data doubles back
>>>>>>>>>> over any links. Thanks,
>>>>>>>>>
>>>>>>>>> Sure, I was just saying that p2p via IOMMU won't be as simple as
>>>>>>>>> broadcast
>>>>>>>>> on the PCI bus, IOMMU needs to be programmed in advance to make this
>>>>>>>>> work,
>>>>>>>>> and current that broadcast won't work for the passed through devices.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Well, sure, p2p in a guest with passthrough devices clearly needs to
>>>>>>>> be translated through the IOMMU (and p2p from a passthrough to an
>>>>>>>> emulated device is essentially impossible).
>>>>>>>>
>>>>>>>> But.. what does that have to do with this code. This is the memory
>>>>>>>> area watcher, looking for memory regions being mapped directly into
>>>>>>>> the PCI space. NOT IOMMU regions, since those are handled separately
>>>>>>>> by wiring up the IOMMU notifier. This will only trigger if RAM-like,
>>>>>>>> non-RAM regions are put into PCI space *not* behind an IOMMMU.
>>>>>>>
>>>>>>> Duh, sorry, realised I was mixing up host and guest IOMMU. I guess
>>>>>>> the point here is that this will map RAM-like devices into the host
>>>>>>> IOMMU when there is no guest IOMMU, allowing p2p transactions between
>>>>>>> passthrough devices (though not from passthrough to emulated devices).
>>>>>>
>>>>>> Correct.
>>>>>>
>>>>>>>
>>>>>>> The conditions still seem kind of awkward to me, but I guess it makes
>>>>>>> sense.
>>>>>>
>>>>>> Is it the time to split this listener to RAM-listener and PCI bus
>>>>>> listener?
>>>>>
>>>>> I'm not really sure what you mean by that.
>>>>>
>>>>>> On x86 it listens on the "memory" AS, on spapr - on the
>>>>>> "address@hidden" AS, this will just create more confusion over time...
>>>>>
>>>>> Hm, it's still logically the same AS in each case - the PCI address
>>>>> space. It's just that in the x86 case that happens to be the same as
>>>>> the memory AS (at least when there isn't a guest side IOMMU).
>>>>
>>>> Hm. Ok.
>>>>
>>>>> I do wonder if that's really right even for x86 without IOMMU. The
>>>>> PCI address space is identity mapped to the "main" memory address
>>>>> space there, but I'm not sure it's mapped th *all* of the main memory
>>>>
>>>> What should have been in the place of "th" above? :)
>>>>
>>>>> address space, probably just certain ranges of it. That's what I was
>>>>> suggesting earlier in the thread.
>>>>
>>>> afaict vfio is trying to mmap every RAM MR == KVM memory slot, no ranges or
>>>> anything like that. I am trying to figure out what is not correct with the
>>>> patch. Thanks,
>>>
>>> It is possible for x86 systems to have translation applied to MMIO vs
>>> RAM such that the processor view and device view of MMIO are different,
>>> putting them into separate address spaces, but this is not typical and
>>> not available on the class of chipsets that QEMU emulates for PC.
>>> Therefore I think it's correct that MMIO and RAM all live in one big
>>> happy, flat address space as they do now (assuming the guest IOMMU is
>>> not present or disabled). Thanks,
>>
>> Ah.. I think the thing I was missing is that on PC (at least with
>> typical chipsets) *all* MMIO essentially comes from PCI space. Even
>> the legacy devices are essentially ISA which is treated as being on a
>> bridge under the PCI space. On non-x86 there are often at least a
>> handful of MMIO devices that aren't within the PCI space - say, the
>> PCI host bridge itself at least. x86 avoids that by using the
>> separate IO space, which is also a bit weird in that it's
>> simultaneously direct attached to the cpu (and so doesn't need bridge
>> configuration), but also identified with the legay IO space treated as
>> being under two bridges (host->PCI, PCI->ISA) for other purposes.
>>
>> Maybe it's just me, but I find it makes more sense to me if I think of
>> it as the two ASes being equal because on PC the system address map
>> (address_space_memory) is made equal to the PCI address map, rather
>> than the PCI address map being made equal to the system one.
>
> It makes more sense to me too, we just do not exploit/expose this on SPAPR
> - the PCI address space only has 2xIOMMU and 1xMSI windows and that's it,
> no MMIO which is mapped to the system AS only (adding those to the PCI AS
> is little tricky as mentioned elsewhere).
>
> Anyway, I still wonder - what needs to be addressed in the 2/4 patch in
> order to proceed?
Ping?
--
Alexey
signature.asc
Description: OpenPGP digital signature
- Re: [Qemu-ppc] [PATCH qemu v7 2/4] vfio/pci: Relax DMA map errors for MMIO regions, (continued)
- Re: [Qemu-ppc] [PATCH qemu v7 2/4] vfio/pci: Relax DMA map errors for MMIO regions, Alex Williamson, 2018/02/12
- Re: [Qemu-ppc] [PATCH qemu v7 2/4] vfio/pci: Relax DMA map errors for MMIO regions, Alexey Kardashevskiy, 2018/02/12
- Re: [Qemu-ppc] [PATCH qemu v7 2/4] vfio/pci: Relax DMA map errors for MMIO regions, David Gibson, 2018/02/13
- Re: [Qemu-ppc] [PATCH qemu v7 2/4] vfio/pci: Relax DMA map errors for MMIO regions, David Gibson, 2018/02/13
- Re: [Qemu-ppc] [PATCH qemu v7 2/4] vfio/pci: Relax DMA map errors for MMIO regions, Alexey Kardashevskiy, 2018/02/13
- Re: [Qemu-ppc] [PATCH qemu v7 2/4] vfio/pci: Relax DMA map errors for MMIO regions, David Gibson, 2018/02/13
- Re: [Qemu-ppc] [PATCH qemu v7 2/4] vfio/pci: Relax DMA map errors for MMIO regions, Alexey Kardashevskiy, 2018/02/14
- Re: [Qemu-ppc] [PATCH qemu v7 2/4] vfio/pci: Relax DMA map errors for MMIO regions, Alex Williamson, 2018/02/14
- Re: [Qemu-ppc] [PATCH qemu v7 2/4] vfio/pci: Relax DMA map errors for MMIO regions, David Gibson, 2018/02/16
- Re: [Qemu-ppc] [PATCH qemu v7 2/4] vfio/pci: Relax DMA map errors for MMIO regions, Alexey Kardashevskiy, 2018/02/18
- Re: [Qemu-ppc] [PATCH qemu v7 2/4] vfio/pci: Relax DMA map errors for MMIO regions,
Alexey Kardashevskiy <=
[Qemu-ppc] [PATCH qemu v7 3/4] vfio-pci: Allow mmap of MSIX BAR, Alexey Kardashevskiy, 2018/02/09
[Qemu-ppc] [PATCH qemu v7 4/4] ppc/spapr, vfio: Turn off MSIX emulation for VFIO devices, Alexey Kardashevskiy, 2018/02/09
[Qemu-ppc] [PATCH qemu v7 1/4] linux-headers: update to f1517df8701c, Alexey Kardashevskiy, 2018/02/09
Re: [Qemu-ppc] [Qemu-devel] [PATCH qemu v7 0/4] vfio-pci: Allow mmap of MSIX BAR, no-reply, 2018/02/09
Re: [Qemu-ppc] [Qemu-devel] [PATCH qemu v7 0/4] vfio-pci: Allow mmap of MSIX BAR, no-reply, 2018/02/09