qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v2 8/8] spapr_pci: Use XICS interrupt allocator


From: Alexander Graf
Subject: Re: [Qemu-devel] [PATCH v2 8/8] spapr_pci: Use XICS interrupt allocator and do not cache interrupts in PHB
Date: Wed, 28 May 2014 02:41:09 +0200
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:24.0) Gecko/20100101 Thunderbird/24.5.0


On 28.05.14 02:34, Alexey Kardashevskiy wrote:
On 05/28/2014 09:55 AM, Alexander Graf wrote:
On 27.05.14 06:51, Alexey Kardashevskiy wrote:
On 05/23/2014 12:25 AM, Alexey Kardashevskiy wrote:
On 05/22/2014 08:57 PM, Alexander Graf wrote:
On 22.05.14 12:53, Alexey Kardashevskiy wrote:
On 05/22/2014 05:16 PM, Alexander Graf wrote:>
Am 22.05.2014 um 08:53 schrieb Alexey Kardashevskiy <address@hidden>:

On 05/21/2014 10:42 PM, Alexey Kardashevskiy wrote:
On 05/21/2014 08:35 PM, Alexander Graf wrote:

On 21.05.14 12:13, Alexey Kardashevskiy wrote:
On 05/21/2014 07:50 PM, Alexander Graf wrote:
On 21.05.14 11:33, Alexey Kardashevskiy wrote:
On 05/21/2014 07:13 PM, Alexander Graf wrote:
On 21.05.14 11:11, Michael S. Tsirkin wrote:
On Wed, May 21, 2014 at 11:06:09AM +0200, Alexander Graf wrote:
On 21.05.14 10:52, Alexey Kardashevskiy wrote:
On 05/21/2014 06:40 PM, Alexander Graf wrote:
On 15.05.14 11:59, Alexey Kardashevskiy wrote:
Currently SPAPR PHB keeps track of all allocated MSI/MISX
interrupt as
XICS used to be unable to reuse interrupts which becomes a
problem for
dynamic MSI reconfiguration which is happening on guest
driver
reload or
PCI hot (un)plug. Another problem is that PHB has a limit of
devices
supporting MSI/MSIX (SPAPR_MSIX_MAX_DEVS=32) and there is no
good
reason
for that.

This makes use of new XICS ability to reuse interrupts.

This removes cached MSI configuration from SPAPR PHB so the
first
IRQ
number
of a device is stored in MSI/MSIX config space so there
is no
need to
store
this anywhere else. From now on, SPAPR PHB only keeps flags
telling
what
type
of interrupt for which device it has configured in order to
return
error if
(for example) MSIX was enabled and the guest is trying to
disable
MSI
which
it has not enabled.

This removes a limit for the maximum number of MSIX-enabled
devices
per PHB,
now XICS and PCI bus capacity are the only limitation.

This changes migration stream as it fixes
vmstate_spapr_pci_msi::name
which was
wrong since the beginning.

This fixed traces to be more informative.

Signed-off-by: Alexey Kardashevskiy <address@hidden>
---

In reality either MSIX or MSI is enabled, never both. So I
could
remove
msi/msix
bitmaps from this patch, would it make sense?
Is this a hard requirement? Does a device have to choose
between
MSIX and
MSI or could it theoretically have both enabled? Is this a
PCI
limitation,
a PAPR/XICS limitation or just a limitation of your
implementation?
My implementation does not have this limitation, I asked if
I can
simplify
code by introducing one :)

I cannot see any reason why PCI cannot have both MSI and MSIX
enabled
but
it does not seem to be used by anyone => cannot debug and
confirm.

PAPR spec assumes that if the guest tries enabling MSIX when
MSI is
already
enabled, this is a "change", not enabling both types. But
it also
says MSI
and MSIX vector numbers are not shared. Hm.
Yeah, I'm not aware of any limitation on hardware here and I'd
rather not impose one.

Michael, do you know of any hardware that uses MSI *and*
MSI-X at
the same time?


Alex
No, and the PCI spec says:
        A function is permitted to implement both MSI and
MSI-X, but
system
        software is
        prohibited from enabling both at the same time. If system
software
        enables both at the same time, the result is undefined.
Ah, cool. So yes Alexey, feel free to impose it :).
Heh. This solves just half of the problem - I still have to keep
track of
what device got MSI/MSIX configured via that ibm,change-msi
interface. I
was hoping I can store such flag somewhere in a device PCI config
space
but
MSI/MSIX enable bit is not good as it is not set when those
calls are
made.
And I cannot rely on address/data fields much as the guest can
change them
(I already use them to store IRQ numbers and btw it is missing
checks when
I read them back for disposal, I'll fix in next round).

Or on "enable" event I could put IRQ numbers to .data of MSI
config space
and on "disable" check if it is not zero, then configuration took
place,
then I can remove my msi[]/msix[] flag arrays. If the guest did
any change
to MSI/MSIX config space (it does not on SPAPR except weird
selftest
cases), I compare .data with what ICS can possibly have and either
reject
"disable" or handle it and if it breaks XICS - that's too bad
for the
stupid guest. Would that be acceptable?
Can't you prohibit the guest from writing to the MSI configuration
registers itself? Then you don't need to do sanity checks.
I could for emulated devices but VFIO uses the same code. For
example,
there is an IBM SCSI IPR card which does a "self test". For that, it
saves
MSIX BAR content, does reboot via some backdoor interface and
restores MSIX
BAR. It has been solved for VFIO in the host kernel by restoring
MSIX data
from cached values when guest is trying to restore it with what it
thinks
is actual MSIX data (it is virtualized because of x86). But there is
cache
We already have a cache because we don't access the real PCI
registers with
msi_set_message(), no?
For emulated devices there is no cache. And in any case the guest is
allowed to write to it... Who knows what AIX does? I do not.
Tried GHashTable for keeping bus:dev.fn <-> (irq, num), more or less ok
but
how to migrate such thing? Temporary cache somewhere and then unpack
it? Or
use old style migration callbacks?
Could you try to introduce a new vmstate type that just serializes and
deserializes hash tables? Maybe there is already a serialization
function for it in glib?
I have not found any (most likely I do not know how to search there),
I added mine, here are VMSTATE_HASH + its use for SPAPR.


Is this a movement to right direction? I need to put key/value sizes
into VMSTATE definition somehow but do not really want to touch
VMStateField.
I'm not sure. Juan?
I also tried to simplify this particular thing more by assuming that the
key is always "int" and put size of value to VMStateField::size. But there
is no way to get the size in VMStateInfo::get(). Or I could do a
"subsection" and try implementing has as an array (and have get/put defined
for items, should work) but in this case I'll need to cache number of
elements of the hash table somewhere so I'll need a wrapper around
GHashTable.

Making generalized version is not obvious for me :(
Juan?
Alex?
Anybody?
Ping.

How do I migrate GHashTable? If I am allowed to use custom and bit more
polished get/put from "[PATCH 1/2] vmstate: Add helper to enable GHashTable
migration", I'll be fine.
Yeah, I think it's ok to be custom in this case. Or another crazy idea -
could you flatten the hash table into an array of structs that you can
describe using VMState? You could then convert from the flat array to/from
the GHashTable with pre_load/post_load hooks.

Array is exactly what I am trying to get rid of. Ok, I'll remove hashmap at
all and implement dynamic flat array (yay, yet another bicycle!).

Huh? The array would only live during the migration. It would be size=0 during normal execution, but in a pre_save hook we could make size = hash.length() and reuse the existing, working VMState infrastructure.



The benefit of that would be that the data becomes more easily
introspectible with tools like my live migration parser.
Wow. What is that parser?

https://lists.gnu.org/archive/html/qemu-devel/2013-10/msg02949.html

I still need to rewrite that thing to always send the JSON description of the stream after the migration's EOF marker. And implement support for the HTAB stream ;). But overall it works reasonably well.


Alex




reply via email to

[Prev in Thread] Current Thread [Next in Thread]