qemu-ppc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-ppc] [PATCH v2 8/8] spapr_pci: Use XICS interrupt allocator an


From: Alexey Kardashevskiy
Subject: Re: [Qemu-ppc] [PATCH v2 8/8] spapr_pci: Use XICS interrupt allocator and do not cache interrupts in PHB
Date: Tue, 27 May 2014 14:51:28 +1000
User-agent: Mozilla/5.0 (X11; Linux i686 on x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0

On 05/23/2014 12:25 AM, Alexey Kardashevskiy wrote:
> On 05/22/2014 08:57 PM, Alexander Graf wrote:
>>
>> On 22.05.14 12:53, Alexey Kardashevskiy wrote:
>>> On 05/22/2014 05:16 PM, Alexander Graf wrote:>
>>>>> Am 22.05.2014 um 08:53 schrieb Alexey Kardashevskiy <address@hidden>:
>>>>>
>>>>>> On 05/21/2014 10:42 PM, Alexey Kardashevskiy wrote:
>>>>>>> On 05/21/2014 08:35 PM, Alexander Graf wrote:
>>>>>>>
>>>>>>>> On 21.05.14 12:13, Alexey Kardashevskiy wrote:
>>>>>>>>> On 05/21/2014 07:50 PM, Alexander Graf wrote:
>>>>>>>>>> On 21.05.14 11:33, Alexey Kardashevskiy wrote:
>>>>>>>>>>> On 05/21/2014 07:13 PM, Alexander Graf wrote:
>>>>>>>>>>>> On 21.05.14 11:11, Michael S. Tsirkin wrote:
>>>>>>>>>>>>> On Wed, May 21, 2014 at 11:06:09AM +0200, Alexander Graf wrote:
>>>>>>>>>>>>>> On 21.05.14 10:52, Alexey Kardashevskiy wrote:
>>>>>>>>>>>>>>> On 05/21/2014 06:40 PM, Alexander Graf wrote:
>>>>>>>>>>>>>>>> On 15.05.14 11:59, Alexey Kardashevskiy wrote:
>>>>>>>>>>>>>>>> Currently SPAPR PHB keeps track of all allocated MSI/MISX
>>>>>>>>>>>>>>>> interrupt as
>>>>>>>>>>>>>>>> XICS used to be unable to reuse interrupts which becomes a
>>>>>>>>>>>>>>>> problem for
>>>>>>>>>>>>>>>> dynamic MSI reconfiguration which is happening on guest driver
>>>>>>>>>>>>>>>> reload or
>>>>>>>>>>>>>>>> PCI hot (un)plug. Another problem is that PHB has a limit of
>>>>>>>>>>>>>>>> devices
>>>>>>>>>>>>>>>> supporting MSI/MSIX (SPAPR_MSIX_MAX_DEVS=32) and there is no
>>>>>>>>>>>>>>>> good
>>>>>>>>>>>>>>>> reason
>>>>>>>>>>>>>>>> for that.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This makes use of new XICS ability to reuse interrupts.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This removes cached MSI configuration from SPAPR PHB so the
>>>>>>>>>>>>>>>> first
>>>>>>>>>>>>>>>> IRQ
>>>>>>>>>>>>>>>> number
>>>>>>>>>>>>>>>> of a device is stored in MSI/MSIX config space so there is no
>>>>>>>>>>>>>>>> need to
>>>>>>>>>>>>>>>> store
>>>>>>>>>>>>>>>> this anywhere else. From now on, SPAPR PHB only keeps flags
>>>>>>>>>>>>>>>> telling
>>>>>>>>>>>>>>>> what
>>>>>>>>>>>>>>>> type
>>>>>>>>>>>>>>>> of interrupt for which device it has configured in order to
>>>>>>>>>>>>>>>> return
>>>>>>>>>>>>>>>> error if
>>>>>>>>>>>>>>>> (for example) MSIX was enabled and the guest is trying to
>>>>>>>>>>>>>>>> disable
>>>>>>>>>>>>>>>> MSI
>>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>>> it has not enabled.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This removes a limit for the maximum number of MSIX-enabled
>>>>>>>>>>>>>>>> devices
>>>>>>>>>>>>>>>> per PHB,
>>>>>>>>>>>>>>>> now XICS and PCI bus capacity are the only limitation.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This changes migration stream as it fixes
>>>>>>>>>>>>>>>> vmstate_spapr_pci_msi::name
>>>>>>>>>>>>>>>> which was
>>>>>>>>>>>>>>>> wrong since the beginning.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This fixed traces to be more informative.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Signed-off-by: Alexey Kardashevskiy <address@hidden>
>>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> In reality either MSIX or MSI is enabled, never both. So I
>>>>>>>>>>>>>>>> could
>>>>>>>>>>>>>>>> remove
>>>>>>>>>>>>>>>> msi/msix
>>>>>>>>>>>>>>>> bitmaps from this patch, would it make sense?
>>>>>>>>>>>>>>> Is this a hard requirement? Does a device have to choose between
>>>>>>>>>>>>>>> MSIX and
>>>>>>>>>>>>>>> MSI or could it theoretically have both enabled? Is this a PCI
>>>>>>>>>>>>>>> limitation,
>>>>>>>>>>>>>>> a PAPR/XICS limitation or just a limitation of your
>>>>>>>>>>>>>>> implementation?
>>>>>>>>>>>>>> My implementation does not have this limitation, I asked if I can
>>>>>>>>>>>>>> simplify
>>>>>>>>>>>>>> code by introducing one :)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I cannot see any reason why PCI cannot have both MSI and MSIX
>>>>>>>>>>>>>> enabled
>>>>>>>>>>>>>> but
>>>>>>>>>>>>>> it does not seem to be used by anyone => cannot debug and
>>>>>>>>>>>>>> confirm.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> PAPR spec assumes that if the guest tries enabling MSIX when
>>>>>>>>>>>>>> MSI is
>>>>>>>>>>>>>> already
>>>>>>>>>>>>>> enabled, this is a "change", not enabling both types. But it also
>>>>>>>>>>>>>> says MSI
>>>>>>>>>>>>>> and MSIX vector numbers are not shared. Hm.
>>>>>>>>>>>>> Yeah, I'm not aware of any limitation on hardware here and I'd
>>>>>>>>>>>>> rather not impose one.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Michael, do you know of any hardware that uses MSI *and* MSI-X at
>>>>>>>>>>>>> the same time?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Alex
>>>>>>>>>>>> No, and the PCI spec says:
>>>>>>>>>>>>       A function is permitted to implement both MSI and MSI-X, but
>>>>>>>>>>>> system
>>>>>>>>>>>>       software is
>>>>>>>>>>>>       prohibited from enabling both at the same time. If system
>>>>>>>>>>>> software
>>>>>>>>>>>>       enables both at the same time, the result is undefined.
>>>>>>>>>>> Ah, cool. So yes Alexey, feel free to impose it :).
>>>>>>>>>> Heh. This solves just half of the problem - I still have to keep
>>>>>>>>>> track of
>>>>>>>>>> what device got MSI/MSIX configured via that ibm,change-msi
>>>>>>>>>> interface. I
>>>>>>>>>> was hoping I can store such flag somewhere in a device PCI config
>>>>>>>>>> space
>>>>>>>>>> but
>>>>>>>>>> MSI/MSIX enable bit is not good as it is not set when those calls are
>>>>>>>>>> made.
>>>>>>>>>> And I cannot rely on address/data fields much as the guest can
>>>>>>>>>> change them
>>>>>>>>>> (I already use them to store IRQ numbers and btw it is missing
>>>>>>>>>> checks when
>>>>>>>>>> I read them back for disposal, I'll fix in next round).
>>>>>>>>>>
>>>>>>>>>> Or on "enable" event I could put IRQ numbers to .data of MSI
>>>>>>>>>> config space
>>>>>>>>>> and on "disable" check if it is not zero, then configuration took
>>>>>>>>>> place,
>>>>>>>>>> then I can remove my msi[]/msix[] flag arrays. If the guest did
>>>>>>>>>> any change
>>>>>>>>>> to MSI/MSIX config space (it does not on SPAPR except weird selftest
>>>>>>>>>> cases), I compare .data with what ICS can possibly have and either
>>>>>>>>>> reject
>>>>>>>>>> "disable" or handle it and if it breaks XICS - that's too bad for the
>>>>>>>>>> stupid guest. Would that be acceptable?
>>>>>>>>> Can't you prohibit the guest from writing to the MSI configuration
>>>>>>>>> registers itself? Then you don't need to do sanity checks.
>>>>>>>> I could for emulated devices but VFIO uses the same code. For example,
>>>>>>>> there is an IBM SCSI IPR card which does a "self test". For that, it
>>>>>>>> saves
>>>>>>>> MSIX BAR content, does reboot via some backdoor interface and
>>>>>>>> restores MSIX
>>>>>>>> BAR. It has been solved for VFIO in the host kernel by restoring
>>>>>>>> MSIX data
>>>>>>>> from cached values when guest is trying to restore it with what it
>>>>>>>> thinks
>>>>>>>> is actual MSIX data (it is virtualized because of x86). But there is
>>>>>>>> cache
>>>>>>> We already have a cache because we don't access the real PCI
>>>>>>> registers with
>>>>>>> msi_set_message(), no?
>>>>>>
>>>>>> For emulated devices there is no cache. And in any case the guest is
>>>>>> allowed to write to it... Who knows what AIX does? I do not.
>>>>>
>>>>> Tried GHashTable for keeping bus:dev.fn <-> (irq, num), more or less ok
>>>>> but
>>>>> how to migrate such thing? Temporary cache somewhere and then unpack
>>>>> it? Or
>>>>> use old style migration callbacks?
>>>> Could you try to introduce a new vmstate type that just serializes and
>>>> deserializes hash tables? Maybe there is already a serialization
>>>> function for it in glib?
>>> I have not found any (most likely I do not know how to search there),
>>> I added mine, here are VMSTATE_HASH + its use for SPAPR.
>>>
>>>
>>> Is this a movement to right direction? I need to put key/value sizes
>>> into VMSTATE definition somehow but do not really want to touch
>>> VMStateField.
>>
>> I'm not sure. Juan?
> 
> 
> I also tried to simplify this particular thing more by assuming that the
> key is always "int" and put size of value to VMStateField::size. But there
> is no way to get the size in VMStateInfo::get(). Or I could do a
> "subsection" and try implementing has as an array (and have get/put defined
> for items, should work) but in this case I'll need to cache number of
> elements of the hash table somewhere so I'll need a wrapper around GHashTable.
> 
> Making generalized version is not obvious for me :(

Juan?
Alex?
Anybody?
Ping.

How do I migrate GHashTable? If I am allowed to use custom and bit more
polished get/put from "[PATCH 1/2] vmstate: Add helper to enable GHashTable
migration", I'll be fine.

Thanks!



-- 
Alexey



reply via email to

[Prev in Thread] Current Thread [Next in Thread]