qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v2 8/8] spapr_pci: Use XICS interrupt allocator


From: Alexey Kardashevskiy
Subject: Re: [Qemu-devel] [PATCH v2 8/8] spapr_pci: Use XICS interrupt allocator and do not cache interrupts in PHB
Date: Wed, 28 May 2014 10:34:42 +1000
User-agent: Mozilla/5.0 (X11; Linux i686 on x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0

On 05/28/2014 09:55 AM, Alexander Graf wrote:
> 
> On 27.05.14 06:51, Alexey Kardashevskiy wrote:
>> On 05/23/2014 12:25 AM, Alexey Kardashevskiy wrote:
>>> On 05/22/2014 08:57 PM, Alexander Graf wrote:
>>>> On 22.05.14 12:53, Alexey Kardashevskiy wrote:
>>>>> On 05/22/2014 05:16 PM, Alexander Graf wrote:>
>>>>>>> Am 22.05.2014 um 08:53 schrieb Alexey Kardashevskiy <address@hidden>:
>>>>>>>
>>>>>>>> On 05/21/2014 10:42 PM, Alexey Kardashevskiy wrote:
>>>>>>>>> On 05/21/2014 08:35 PM, Alexander Graf wrote:
>>>>>>>>>
>>>>>>>>>> On 21.05.14 12:13, Alexey Kardashevskiy wrote:
>>>>>>>>>>> On 05/21/2014 07:50 PM, Alexander Graf wrote:
>>>>>>>>>>>> On 21.05.14 11:33, Alexey Kardashevskiy wrote:
>>>>>>>>>>>>> On 05/21/2014 07:13 PM, Alexander Graf wrote:
>>>>>>>>>>>>>> On 21.05.14 11:11, Michael S. Tsirkin wrote:
>>>>>>>>>>>>>>> On Wed, May 21, 2014 at 11:06:09AM +0200, Alexander Graf wrote:
>>>>>>>>>>>>>>>> On 21.05.14 10:52, Alexey Kardashevskiy wrote:
>>>>>>>>>>>>>>>>> On 05/21/2014 06:40 PM, Alexander Graf wrote:
>>>>>>>>>>>>>>>>>> On 15.05.14 11:59, Alexey Kardashevskiy wrote:
>>>>>>>>>>>>>>>>>> Currently SPAPR PHB keeps track of all allocated MSI/MISX
>>>>>>>>>>>>>>>>>> interrupt as
>>>>>>>>>>>>>>>>>> XICS used to be unable to reuse interrupts which becomes a
>>>>>>>>>>>>>>>>>> problem for
>>>>>>>>>>>>>>>>>> dynamic MSI reconfiguration which is happening on guest
>>>>>>>>>>>>>>>>>> driver
>>>>>>>>>>>>>>>>>> reload or
>>>>>>>>>>>>>>>>>> PCI hot (un)plug. Another problem is that PHB has a limit of
>>>>>>>>>>>>>>>>>> devices
>>>>>>>>>>>>>>>>>> supporting MSI/MSIX (SPAPR_MSIX_MAX_DEVS=32) and there is no
>>>>>>>>>>>>>>>>>> good
>>>>>>>>>>>>>>>>>> reason
>>>>>>>>>>>>>>>>>> for that.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> This makes use of new XICS ability to reuse interrupts.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> This removes cached MSI configuration from SPAPR PHB so the
>>>>>>>>>>>>>>>>>> first
>>>>>>>>>>>>>>>>>> IRQ
>>>>>>>>>>>>>>>>>> number
>>>>>>>>>>>>>>>>>> of a device is stored in MSI/MSIX config space so there
>>>>>>>>>>>>>>>>>> is no
>>>>>>>>>>>>>>>>>> need to
>>>>>>>>>>>>>>>>>> store
>>>>>>>>>>>>>>>>>> this anywhere else. From now on, SPAPR PHB only keeps flags
>>>>>>>>>>>>>>>>>> telling
>>>>>>>>>>>>>>>>>> what
>>>>>>>>>>>>>>>>>> type
>>>>>>>>>>>>>>>>>> of interrupt for which device it has configured in order to
>>>>>>>>>>>>>>>>>> return
>>>>>>>>>>>>>>>>>> error if
>>>>>>>>>>>>>>>>>> (for example) MSIX was enabled and the guest is trying to
>>>>>>>>>>>>>>>>>> disable
>>>>>>>>>>>>>>>>>> MSI
>>>>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>>>>> it has not enabled.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> This removes a limit for the maximum number of MSIX-enabled
>>>>>>>>>>>>>>>>>> devices
>>>>>>>>>>>>>>>>>> per PHB,
>>>>>>>>>>>>>>>>>> now XICS and PCI bus capacity are the only limitation.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> This changes migration stream as it fixes
>>>>>>>>>>>>>>>>>> vmstate_spapr_pci_msi::name
>>>>>>>>>>>>>>>>>> which was
>>>>>>>>>>>>>>>>>> wrong since the beginning.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> This fixed traces to be more informative.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Signed-off-by: Alexey Kardashevskiy <address@hidden>
>>>>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> In reality either MSIX or MSI is enabled, never both. So I
>>>>>>>>>>>>>>>>>> could
>>>>>>>>>>>>>>>>>> remove
>>>>>>>>>>>>>>>>>> msi/msix
>>>>>>>>>>>>>>>>>> bitmaps from this patch, would it make sense?
>>>>>>>>>>>>>>>>> Is this a hard requirement? Does a device have to choose
>>>>>>>>>>>>>>>>> between
>>>>>>>>>>>>>>>>> MSIX and
>>>>>>>>>>>>>>>>> MSI or could it theoretically have both enabled? Is this a
>>>>>>>>>>>>>>>>> PCI
>>>>>>>>>>>>>>>>> limitation,
>>>>>>>>>>>>>>>>> a PAPR/XICS limitation or just a limitation of your
>>>>>>>>>>>>>>>>> implementation?
>>>>>>>>>>>>>>>> My implementation does not have this limitation, I asked if
>>>>>>>>>>>>>>>> I can
>>>>>>>>>>>>>>>> simplify
>>>>>>>>>>>>>>>> code by introducing one :)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I cannot see any reason why PCI cannot have both MSI and MSIX
>>>>>>>>>>>>>>>> enabled
>>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>> it does not seem to be used by anyone => cannot debug and
>>>>>>>>>>>>>>>> confirm.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> PAPR spec assumes that if the guest tries enabling MSIX when
>>>>>>>>>>>>>>>> MSI is
>>>>>>>>>>>>>>>> already
>>>>>>>>>>>>>>>> enabled, this is a "change", not enabling both types. But
>>>>>>>>>>>>>>>> it also
>>>>>>>>>>>>>>>> says MSI
>>>>>>>>>>>>>>>> and MSIX vector numbers are not shared. Hm.
>>>>>>>>>>>>>>> Yeah, I'm not aware of any limitation on hardware here and I'd
>>>>>>>>>>>>>>> rather not impose one.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Michael, do you know of any hardware that uses MSI *and*
>>>>>>>>>>>>>>> MSI-X at
>>>>>>>>>>>>>>> the same time?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Alex
>>>>>>>>>>>>>> No, and the PCI spec says:
>>>>>>>>>>>>>>        A function is permitted to implement both MSI and
>>>>>>>>>>>>>> MSI-X, but
>>>>>>>>>>>>>> system
>>>>>>>>>>>>>>        software is
>>>>>>>>>>>>>>        prohibited from enabling both at the same time. If system
>>>>>>>>>>>>>> software
>>>>>>>>>>>>>>        enables both at the same time, the result is undefined.
>>>>>>>>>>>>> Ah, cool. So yes Alexey, feel free to impose it :).
>>>>>>>>>>>> Heh. This solves just half of the problem - I still have to keep
>>>>>>>>>>>> track of
>>>>>>>>>>>> what device got MSI/MSIX configured via that ibm,change-msi
>>>>>>>>>>>> interface. I
>>>>>>>>>>>> was hoping I can store such flag somewhere in a device PCI config
>>>>>>>>>>>> space
>>>>>>>>>>>> but
>>>>>>>>>>>> MSI/MSIX enable bit is not good as it is not set when those
>>>>>>>>>>>> calls are
>>>>>>>>>>>> made.
>>>>>>>>>>>> And I cannot rely on address/data fields much as the guest can
>>>>>>>>>>>> change them
>>>>>>>>>>>> (I already use them to store IRQ numbers and btw it is missing
>>>>>>>>>>>> checks when
>>>>>>>>>>>> I read them back for disposal, I'll fix in next round).
>>>>>>>>>>>>
>>>>>>>>>>>> Or on "enable" event I could put IRQ numbers to .data of MSI
>>>>>>>>>>>> config space
>>>>>>>>>>>> and on "disable" check if it is not zero, then configuration took
>>>>>>>>>>>> place,
>>>>>>>>>>>> then I can remove my msi[]/msix[] flag arrays. If the guest did
>>>>>>>>>>>> any change
>>>>>>>>>>>> to MSI/MSIX config space (it does not on SPAPR except weird
>>>>>>>>>>>> selftest
>>>>>>>>>>>> cases), I compare .data with what ICS can possibly have and either
>>>>>>>>>>>> reject
>>>>>>>>>>>> "disable" or handle it and if it breaks XICS - that's too bad
>>>>>>>>>>>> for the
>>>>>>>>>>>> stupid guest. Would that be acceptable?
>>>>>>>>>>> Can't you prohibit the guest from writing to the MSI configuration
>>>>>>>>>>> registers itself? Then you don't need to do sanity checks.
>>>>>>>>>> I could for emulated devices but VFIO uses the same code. For
>>>>>>>>>> example,
>>>>>>>>>> there is an IBM SCSI IPR card which does a "self test". For that, it
>>>>>>>>>> saves
>>>>>>>>>> MSIX BAR content, does reboot via some backdoor interface and
>>>>>>>>>> restores MSIX
>>>>>>>>>> BAR. It has been solved for VFIO in the host kernel by restoring
>>>>>>>>>> MSIX data
>>>>>>>>>> from cached values when guest is trying to restore it with what it
>>>>>>>>>> thinks
>>>>>>>>>> is actual MSIX data (it is virtualized because of x86). But there is
>>>>>>>>>> cache
>>>>>>>>> We already have a cache because we don't access the real PCI
>>>>>>>>> registers with
>>>>>>>>> msi_set_message(), no?
>>>>>>>> For emulated devices there is no cache. And in any case the guest is
>>>>>>>> allowed to write to it... Who knows what AIX does? I do not.
>>>>>>> Tried GHashTable for keeping bus:dev.fn <-> (irq, num), more or less ok
>>>>>>> but
>>>>>>> how to migrate such thing? Temporary cache somewhere and then unpack
>>>>>>> it? Or
>>>>>>> use old style migration callbacks?
>>>>>> Could you try to introduce a new vmstate type that just serializes and
>>>>>> deserializes hash tables? Maybe there is already a serialization
>>>>>> function for it in glib?
>>>>> I have not found any (most likely I do not know how to search there),
>>>>> I added mine, here are VMSTATE_HASH + its use for SPAPR.
>>>>>
>>>>>
>>>>> Is this a movement to right direction? I need to put key/value sizes
>>>>> into VMSTATE definition somehow but do not really want to touch
>>>>> VMStateField.
>>>> I'm not sure. Juan?
>>>
>>> I also tried to simplify this particular thing more by assuming that the
>>> key is always "int" and put size of value to VMStateField::size. But there
>>> is no way to get the size in VMStateInfo::get(). Or I could do a
>>> "subsection" and try implementing has as an array (and have get/put defined
>>> for items, should work) but in this case I'll need to cache number of
>>> elements of the hash table somewhere so I'll need a wrapper around
>>> GHashTable.
>>>
>>> Making generalized version is not obvious for me :(
>> Juan?
>> Alex?
>> Anybody?
>> Ping.
>>
>> How do I migrate GHashTable? If I am allowed to use custom and bit more
>> polished get/put from "[PATCH 1/2] vmstate: Add helper to enable GHashTable
>> migration", I'll be fine.
> 
> Yeah, I think it's ok to be custom in this case. Or another crazy idea -
> could you flatten the hash table into an array of structs that you can
> describe using VMState? You could then convert from the flat array to/from
> the GHashTable with pre_load/post_load hooks.


Array is exactly what I am trying to get rid of. Ok, I'll remove hashmap at
all and implement dynamic flat array (yay, yet another bicycle!).


> The benefit of that would be that the data becomes more easily
> introspectible with tools like my live migration parser.

Wow. What is that parser?


-- 
Alexey



reply via email to

[Prev in Thread] Current Thread [Next in Thread]