qemu-ppc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-ppc] [Qemu-devel] [PATCH v3 4/4] target-ppc: Handle ibm, nmi-r


From: Aravinda Prasad
Subject: Re: [Qemu-ppc] [Qemu-devel] [PATCH v3 4/4] target-ppc: Handle ibm, nmi-register RTAS call
Date: Tue, 11 Nov 2014 12:14:31 +0530
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130510 Thunderbird/17.0.6


On Tuesday 11 November 2014 08:46 AM, David Gibson wrote:
> On Wed, Nov 05, 2014 at 12:43:15PM +0530, Aravinda Prasad wrote:
>> This patch adds FWNMI support in qemu for powerKVM
>> guests by handling the ibm,nmi-register rtas call.
>> Whenever OS issues ibm,nmi-register RTAS call, the
>> machine check notification address is saved and the
>> machine check interrupt vector 0x200 is patched to
>> issue a private hcall.
>>
>> This patch also handles the cases when multi-processors
>> experience machine check at or about the same time.
>> As per PAPR, subsequent processors serialize waiting
>> for the first processor to issue the ibm,nmi-interlock call.
>> The second processor retries if the first processor which
>> received a machine check is still reading the error log
>> and is yet to issue ibm,nmi-interlock call.
>>
>> Signed-off-by: Aravinda Prasad <address@hidden>
> 
> [snip]
>> +static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
>> +                                  sPAPREnvironment *spapr,
>> +                                  uint32_t token, uint32_t nargs,
>> +                                  target_ulong args,
>> +                                  uint32_t nret, target_ulong rets)
>> +{
>> +    int i;
>> +    uint32_t ori_inst = 0x60630000;
>> +    uint32_t branch_inst = 0x48000002;
>> +    target_ulong guest_machine_check_addr;
>> +    uint32_t trampoline[TRAMPOLINE_INSTS];
>> +    int total_inst = sizeof(trampoline) / sizeof(uint32_t);
>> +    PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
> 
> You should sanity check the RTAS arguments before doing anything - in
> particular verify that nargs and nrets have the expected values.

ok.

> 
>> +
>> +    /* Store the system reset and machine check address */
>> +    guest_machine_check_addr = rtas_ld(args, 1);
>> +
>> +    /*
>> +     * Read the trampoline instructions from RTAS Blob and patch
>> +     * the KVMPPC_H_REPORT_MC_ERR hcall number and the guest
>> +     * machine check address before copying to 0x200 vector
>> +     */
>> +    cpu_physical_memory_read(spapr->rtas_addr + RTAS_TRAMPOLINE_OFFSET,
>> +                             trampoline, sizeof(trampoline));
>> +
>> +    /* Safety Check */
>> +    QEMU_BUILD_BUG_ON(sizeof(trampoline) > MC_INTERRUPT_VECTOR_SIZE);
>> +
>> +    /* Update the KVMPPC_H_REPORT_MC_ERR value in trampoline */
>> +    ori_inst |= KVMPPC_H_REPORT_MC_ERR;
>> +    memcpy(&trampoline[TRAMPOLINE_ORI_INST_INDEX], &ori_inst,
>> +            sizeof(ori_inst));
> 
> Given that we already code the KVMPPC_H_RTAS value directly into the
> .S file, I don't think it's worth the trouble of patching the
> H_REPORT_MC_ERR value.  As Alex says, it has to stay the same for
> migration anyway.

Yes. patching of KVMPPC_H_REPORT_MC_ERR value will go-off.

> 
>> +    /*
>> +     * Sanity check guest_machine_check_addr to prevent clobbering
>> +     * operator value in branch instruction
>> +     */
>> +    if (guest_machine_check_addr & BRANCH_INST_MASK) {
>> +        fprintf(stderr, "Unable to register ibm,nmi_register: "
>> +                "Invalid machine check handler address\n");
>> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
>> +        return;
>> +    }
>> +
>> +    /*
>> +     * Update the branch instruction in trampoline
>> +     * with the absolute machine check address requested by OS.
>> +     */
>> +    branch_inst |= guest_machine_check_addr;
>> +    memcpy(&trampoline[TRAMPOLINE_BR_INST_INDEX], &branch_inst,
>> +            sizeof(branch_inst));
>> +
>> +    /* Handle all Host/Guest LE/BE combinations */
>> +    if ((*pcc->interrupts_big_endian)(cpu)) {
>> +        for (i = 0; i < total_inst; i++) {
>> +            trampoline[i] = cpu_to_be32(trampoline[i]);
>> +        }
>> +    } else {
>> +        for (i = 0; i < total_inst; i++) {
>> +            trampoline[i] = cpu_to_le32(trampoline[i]);
>> +        }
>> +    }
>> +
>> +    /* Patch 0x200 NMI interrupt vector memory area of guest */
>> +    cpu_physical_memory_write(MC_INTERRUPT_VECTOR, trampoline,
>> +                              sizeof(trampoline));
>> +
>> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>> +}
>> +
>> +static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
>> +                                   sPAPREnvironment *spapr,
>> +                                   uint32_t token, uint32_t nargs,
>> +                                   target_ulong args,
>> +                                   uint32_t nret, target_ulong rets)
>> +{
> 
> Again you should sanity check the arguments - at least check nargs and
> nrets.

will do

> 
>> +    /*
>> +     * VCPU issuing ibm,nmi-interlock is done with NMI handling,
>> +     * hence unset mc_in_progress.
>> +     */
>> +    mc_in_progress = 0;
>> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>> +}
>> +
>>  static struct rtas_call {
>>      const char *name;
>>      spapr_rtas_fn fn;
>> @@ -419,6 +506,12 @@ static void core_rtas_register_types(void)
>>                          rtas_ibm_set_system_parameter);
>>      spapr_rtas_register(RTAS_IBM_OS_TERM, "ibm,os-term",
>>                          rtas_ibm_os_term);
>> +    spapr_rtas_register(RTAS_IBM_NMI_REGISTER,
>> +                        "ibm,nmi-register",
>> +                        rtas_ibm_nmi_register);
>> +    spapr_rtas_register(RTAS_IBM_NMI_INTERLOCK,
>> +                        "ibm,nmi-interlock",
>> +                        rtas_ibm_nmi_interlock);
>>  }
>>  
>>  type_init(core_rtas_register_types)
>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>> index a2d67e9..98d0a6c 100644
>> --- a/include/hw/ppc/spapr.h
>> +++ b/include/hw/ppc/spapr.h
>> @@ -384,8 +384,10 @@ int spapr_allocate_irq_block(int num, bool lsi, bool 
>> msi);
>>  #define RTAS_GET_SENSOR_STATE                   (RTAS_TOKEN_BASE + 0x1D)
>>  #define RTAS_IBM_CONFIGURE_CONNECTOR            (RTAS_TOKEN_BASE + 0x1E)
>>  #define RTAS_IBM_OS_TERM                        (RTAS_TOKEN_BASE + 0x1F)
>> +#define RTAS_IBM_NMI_REGISTER                   (RTAS_TOKEN_BASE + 0x20)
>> +#define RTAS_IBM_NMI_INTERLOCK                  (RTAS_TOKEN_BASE + 0x21)
>>  
>> -#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x20)
>> +#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x22)
>>  
>>  /* RTAS ibm,get-system-parameter token values */
>>  #define RTAS_SYSPARM_SPLPAR_CHARACTERISTICS      20
>> @@ -488,4 +490,17 @@ int spapr_tcet_dma_dt(void *fdt, int node_off, const 
>> char *propname,
>>  #define RTAS_TRAMPOLINE_OFFSET   0x200
>>  #define RTAS_ERRLOG_OFFSET       0x800
>>  
>> +/* Machine Check Trampoline related macros
>> + *
>> + * These macros should co-relate to the code we
>> + * have in pc-bios/spapr-rtas/spapr-rtas.S
>> + */
>> +#define TRAMPOLINE_INSTS           17
>> +#define TRAMPOLINE_ORI_INST_INDEX  2
>> +#define TRAMPOLINE_BR_INST_INDEX   15
>> +
>> +/* Machine Check Interrupt related macros */
>> +#define MC_INTERRUPT_VECTOR           0x200
>> +#define MC_INTERRUPT_VECTOR_SIZE      0x100
>> +
>>  #endif /* !defined (__HW_SPAPR_H__) */
>> diff --git a/pc-bios/spapr-rtas/spapr-rtas.S 
>> b/pc-bios/spapr-rtas/spapr-rtas.S
>> index 903bec2..c315332 100644
>> --- a/pc-bios/spapr-rtas/spapr-rtas.S
>> +++ b/pc-bios/spapr-rtas/spapr-rtas.S
>> @@ -35,3 +35,41 @@ _start:
>>      ori     3,3,address@hidden
>>      sc      1
>>      blr
>> +    . = 0x200
>> +    /*
>> +     * Trampoline saves r3 in sprg2 and issues private hcall
>> +     * to request qemu to build error log. QEMU builds the
>> +     * error log, copies to rtas-blob and returns the address.
>> +     * The initial 16 bytes in return adress consist of saved
>> +     * srr0 and srr1 which we restore and pass on the actual error
>> +     * log address to OS handled mcachine check notification
>> +     * routine
>> +     *
>> +     * All the below instructions are copied to interrupt vector
>> +     * 0x200 at the time of handling ibm,nmi-register rtas call.
>> +     */
>> +    mtsprg  2,3
>> +    li      3,0
>> +    /*
>> +     * ori r3,r3,KVMPPC_H_REPORT_MC_ERR. The KVMPPC_H_REPORT_MC_ERR
>> +     * value is patched below
>> +     */
>> +1:  ori     3,3,0
>> +    sc      1               /* Issue H_CALL */
>> +    cmpdi   cr0,3,0
>> +    beq     cr0,1b          /* retry KVMPPC_H_REPORT_MC_ERR */
> 
> Having to retry the hcall from here seems very awkward.  This is a
> private hcall, so you can define it to do whatever retries are
> necessary internally (and I don't think your current implementation
> can fail anyway).

Retrying is required in the cases when multi-processors experience
machine check at or about the same time. As per PAPR, subsequent
processors should serialize and wait for the first processor to issue
the ibm,nmi-interlock call. The second processor retries if the first
processor which received a machine check is still reading the error log
and is yet to issue ibm,nmi-interlock call.

Retrying cannot be done internally in h_report_mc_err hcall: only one
thread can succeed entering qemu upon parallel hcall and hence retrying
inside the hcall will not allow the ibm,nmi-interlock from first CPU to
succeed.

> 
>> +    mtsprg  2,4
> 
> Um.. doesn't this clobber the value of r3 you saved in SPRG2 just above.

The r3 saved in SPRG2 is moved to rtas area in the private hcall and
hence it is fine to clobber r3 here

> 
>> +    ld      4,0(3)
>> +    mtsrr0  4               /* Restore srr0 */
>> +    ld      4,8(3)
>> +    mtsrr1  4               /* Restore srr1 */
>> +    ld      4,16(3)
>> +    mtcrf   0,4             /* Restore cr */
> 
> mtcrf?  aren't you restoring the whole CR?

No. I am moving only cr0. The 0 in mtcrf 0,4 represents the cr field
mask that is replaced.

> 
>> +    addi    3,3,24
>> +    mfsprg  4,2
>> +    /*
>> +     * Branch to address registered by OS. The branch address is
>> +     * patched in the ibm,nmi-register rtas call.
>> +     */
>> +    ba      0x0
>> +    b       .
> 
> The branch to self is pointless.  Even if the instruction above is
> not patched, or patched incorrectly, it's a ba, so you're not likely
> to end up at the instruction underneath.

I added it to avoid speculative execution. Based on how it is used in
arch/powerpc/kernel/exceptions-64s.S

> 
> Actually, what would probably make more sense would be to just have a
> "b ." *instead* of the ba, and have the qemu patching replace it with
> the correct ba instruction.  That will limit the damage if it somehow
> gets executed without being patched.

good idea. Will do that.

> 

-- 
Regards,
Aravinda




reply via email to

[Prev in Thread] Current Thread [Next in Thread]