qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v7 3/3] sPAPR: EEH support for VFIO PCI device


From: Gavin Shan
Subject: Re: [Qemu-devel] [PATCH v7 3/3] sPAPR: EEH support for VFIO PCI device
Date: Wed, 28 May 2014 22:33:14 +1000
User-agent: Mutt/1.5.21 (2010-09-15)

On Wed, May 28, 2014 at 01:24:15PM +0200, Alexander Graf wrote:
>
>On 28.05.14 06:12, Gavin Shan wrote:
>>On Wed, May 28, 2014 at 12:41:37AM +0200, Alexander Graf wrote:
>>>On 28.05.14 00:27, Benjamin Herrenschmidt wrote:
>>>>On Wed, 2014-05-28 at 00:22 +0200, Alexander Graf wrote:
>>.../...
>>
>>>>In any case, the above isn't the problem, we register rtas functions
>>>>called rtas_ibm_*, that's fine.
>>>They're called rtas_ibm, not rtas_vfio. They simply live in the wrong
>>>file for starters. Once you start moving them out of *vfio*.c and
>>>make sure you don't sneak in vfio headers into spapr_rtas.c all the
>>>abstraction should come naturally.
>>>
>>>>The problem you have is with what's inside these functions, correct ?
>>>>You want some kind of PCI Dev ops... am I understanding properly ?
>>>>Instead of having them call the VFIO ioctl's directly.
>>>We have a nice bus hierarchy. The RTAS call tells you the PHB's ID.
>>>The machine knows about all of its PHBs, so you can find that one.
>>>
>>>From the PHB we can then find a PCIDevice by identifier if we
>>>received one in the RTAS call.
>>>
>>>If we received a PE, we need to act on a virtual PE object (not
>>>available yet I think) which then would call at least into each VFIO
>>>container associated with that guest PE. We don't want to call into
>>>devices here, since we could have 2 VFIO devices in one PE and don't
>>>want to do a PE level operation twice.
>>>
>>>If we want to simplify things (and I'm fine with that), we can also
>>>restrict a "guest PE" to only span at most a single VFIO container
>>>context and unlimited emulated devices. That way we don't need to
>>>worry about potential side effects.
>>>
>>>I can't tell you what the "guest PE" would best be modeled at. Maybe
>>>a simple list/hash inside the PHB is the answer. But we need to
>>>ensure that the guest knows about its scope of action. And we need to
>>>make sure that we maintain abstraction levels to keep at least a
>>>minimum level of sanity for the reader.
>>>
>>sPAPRVFIOPHBState has one-to-one mapping with container, which has
>>one-to-one mapping relationship with IOMMU group on sPAPR platform.
>>So sPAPRVFIOPHBState is representing "guest PE".
>
>So each PHB only spans a single PE?
>

Yep.

>>
>>Yes. RTAS calls from guest, 2 types of address information as input there:
>>PHB BUID+device's PCI config address, or PHB BUID+PE address. After removing
>>the logic of translating device's PCI config address to PE address from
>>host to QEMU, we don't have to care much about the first case (PHB BUID+
>>device's PCI config address). That means we only have to care "PE address",
>>which is basically the IOMMU group ID (plus fixed offset) of 
>>sPAPRPHBVFIOState.
>>So the BUID can be used to locate sPAPRPHBVFIOState and "PE address" can
>>help doing same thing.
>
>... then we can just ignore the "PE address", no?
>

Yes, we can ignore that. But I plan to use it for more sanity check, which
isn't harmful.

>>
>>As you mentioned above, spapr_rtas.c can't include "vfio.h". I guess I have
>>to rework on PCIDevice/PCIDeviceClass for a bit as follows. Alex, could you
>>help confirming it's what you're expecting to see?
>
>I depends. If you need a PCI device level callback, then yes. If you
>don't need a PCI level callback, just have one in the PHB. I'm not
>quite sure how you can find the "VFIO container" that belongs to that
>PHB though, but I'm sure you'll figure that out :).
>

Yeah, I'll have PHB level callback after having a discussion with
Alexey who gave good idea to make the abstraction as follows. Please
let me know if I'm in wrong direction again because I don't have QEMU
experience at all:

        RTAS call -> hw/ppc/spapr_pci.c where RTAS calls get registered
                  -> sPAPRPHBClass::rtas_eeh_handler() in 
hw/ppc/spapr_pci_vfio.c
                  -> vfio_container_ioctl() in hw/misc/vfio.c


If you think it's right way to go, I'm going to change the code accordingly
and send it to you for review after Alexey's quick scan. That would save
you a bit time in future though it already took you much time.

>>
>>- In include/hw/pci/pci.h, to have following callback for PCIDeviceClass
>>
>>   /* It's basically mapping ioctl commands to another set of macros */
>>   #define PCI_DEV_EEH_OP_ENABLE    0
>>   #define PCI_DEV_EEH_OP_DISABLE   1
>>   #define PCI_DEV_EEH_OP_GET_STATE 2
>>   #define PCI_DEV_EEH_OP_PE_RESET  3
>>   #define PCI_DEV_EEH_OP_PE_CONFIGURE 4
>>
>>   #define PCI_DEV_EEH_OPT_HRSET     0
>>   #define PCI_DEV_EEH_OPT_FRESET    1
>>           :
>>   int (*eeh_handler)(PCIDevice *pdev, int opcode, int option);
>
>If we need a PCI device level callback, yes.
>
>>
>>- In hw/ppc/spapr_pci_vfio.c, to register EEH RTAS calls there and I
>>   shouldn't include linux-headers/vfio.h there. When EEH RTAS calls
>>   come in, find the PE according to the PE address and pick up any
>>   one of the included PCI device and call into eeh_handler(). If the
>>   input address is device's PCI config address, find the PCI device
>>   and call to its eeh_handler() callback.
>
>Uh. The RTAS flow should go through a few different files that
>basically represent the different layers on a real machine.
>
>The spapr_rtas.c file should register the RTAS calls, as that one
>emulates the RTAS code that would usually run in guest code. That
>code determines and calls into the PHB.
>
>The PHB is implemented in hw/ppc/spapr_pci.c, so that's where the EEH
>calls on the PHB level belong. If the target of a call is a PCI
>device, the PHB looks up the device, its class and calls into an EEH
>call inside the PCI device. If the target is a PE, the PHB needs to
>call out on the TCE entity representing the PE. If that is backed by
>a VFIO IOMMU, it should get called somehow.
>

Ah, It's almost same to the above idea that Alexey shared. The different
point is that I'm going to have PHB level callback as sPAPRVFIOPHBState
could be regarded as "PE" :-)

Thanks,
Gavin




reply via email to

[Prev in Thread] Current Thread [Next in Thread]