[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 0/7] FWNMI fixes / changes
From: |
Greg Kurz |
Subject: |
Re: [PATCH 0/7] FWNMI fixes / changes |
Date: |
Thu, 5 Mar 2020 11:19:11 +0100 |
On Thu, 5 Mar 2020 14:59:31 +1000
Nicholas Piggin <address@hidden> wrote:
> Hi,
>
Hi Nick,
I'm afraid you haven't Cc'd qemu-devel which is mandatory for patch
submission, while qemu-ppc is barely a tag for mail filters.
Cheers,
--
Greg
> Since RFC, I addressed review comments (thank you), I've also dropped
> the MCE changes for now just to concentrate on the basics.
>
> The major problem is PAPR requires system reset to also be delivered
> to the address registered by nmi-register when FWNMI is enabled.
>
> This series first allows FWNMI to work on TCG, because sresets can be
> injected under TCG (unlike MCEs) so it becomes important. Then it adds
> sreset to the migration state. Then it wire up sreset delivery to the
> sreset fwnmi address.
>
> One remaining issue we have now is that Linux and IBM's proprietary
> hypervisor seem to deviate from PAPR when delivering fwnmi sreset
> interrupts. PAPR says all registers should be set as at interrupt
> time, as opposed to MCE delivery where r3 is specified to contain the
> address of memory which contains the original r3 and an error record.
>
> PowerVM delivers fwnmi sreset with this r3 scheme, and Linux tries to
> restore r3 from that address. Linux also calls "ibm,nmi-interlock" after
> loading r3 from that address, whereas PAPR does not require it. PowerVM
> seems to ignore the call and doesn't synchronize anything.
>
> I've asked PowerVM developers about this and waiting to see what comes
> of that before going further here. This is still broken on Linux at
> worst we can corrupt r3 with a random value from the rtas blob, at
> best we get:
>
> [ 43.148121] FWNMI: corrupt r3 0x0000000000000000
> [ 43.148231] FWNMI: nmi-interlock failed: -3
> [ 43.148286] FWNMI: corrupt r3 0x0000000000000000
> [ 43.148296] FWNMI: nmi-interlock failed: -3
>
> when handling sresets. However that problem is not introduced by this
> patch, because the Linux kernel fwnmi and 0x100 sreset entry points are
> identical, and this bug happens inside a common handler in an
> 'if (fwnmi_active)' branch.
>
> We can probably provide this compatible behaviour without being totally
> broken with respect to sreset on multiple CPUs concurrently or nesting
> inside/outside of a machine check handler, if we provide a small
> per-cpu area for the r3 save. That would require an RTAS/SLOF change to
> allocate more space, so I haven't implemented that yet.
>
> Thanks,
> Nick
>
>
> Nicholas Piggin (7):
> ppc/spapr: Fix FWNMI machine check failure handling
> ppc/spapr: Change FWNMI names
> ppc/spapr: Add FWNMI System Reset state
> ppc/spapr: Fix FWNMI machine check interrupt delivery
> ppc/spapr: Allow FWNMI on TCG
> target/ppc: allow ppc_cpu_do_system_reset to take an alternate vector
> ppc/spapr: Implement FWNMI System Reset delivery
>
> hw/ppc/spapr.c | 35 ++++++++------
> hw/ppc/spapr_caps.c | 19 ++++----
> hw/ppc/spapr_events.c | 36 ++++----------
> hw/ppc/spapr_rtas.c | 29 ++++++++----
> include/hw/ppc/spapr.h | 28 +++++++----
> target/ppc/cpu.h | 3 +-
> target/ppc/excp_helper.c | 79 ++++++++++++++++++++++---------
> tests/qtest/libqos/libqos-spapr.h | 2 +-
> 8 files changed, 135 insertions(+), 96 deletions(-)
>
- [PATCH 2/7] ppc/spapr: Change FWNMI names, (continued)
- [PATCH 2/7] ppc/spapr: Change FWNMI names, Nicholas Piggin, 2020/03/05
- [PATCH 3/7] ppc/spapr: Add FWNMI System Reset state, Nicholas Piggin, 2020/03/05
- [PATCH 4/7] ppc/spapr: Fix FWNMI machine check interrupt delivery, Nicholas Piggin, 2020/03/05
- [PATCH 5/7] ppc/spapr: Allow FWNMI on TCG, Nicholas Piggin, 2020/03/05
- [PATCH 6/7] target/ppc: allow ppc_cpu_do_system_reset to take an alternate vector, Nicholas Piggin, 2020/03/05
- [PATCH 7/7] ppc/spapr: Implement FWNMI System Reset delivery, Nicholas Piggin, 2020/03/05
- Re: [PATCH 0/7] FWNMI fixes / changes,
Greg Kurz <=