qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v3] spapr: disable decrementer during reset


From: David Gibson
Subject: Re: [Qemu-devel] [PATCH v3] spapr: disable decrementer during reset
Date: Wed, 19 Jul 2017 14:32:50 +1000
User-agent: Mutt/1.8.3 (2017-05-23)

On Wed, Jul 19, 2017 at 09:20:52AM +0530, Nikunj A Dadhania wrote:
> David Gibson <address@hidden> writes:
> 
> > On Tue, Jul 18, 2017 at 10:53:01AM +0530, Nikunj A Dadhania wrote:
> >> David Gibson <address@hidden> writes:
> >> 
> >> > On Mon, Jul 17, 2017 at 09:46:39AM +0530, Nikunj A Dadhania wrote:
> >> >> Rebooting a SMP TCG guest is broken for both single/multi threaded TCG.
> >> >> 
> >> >> When reset happens, all the CPUs are in halted state. First CPU is 
> >> >> brought out
> >> >> of reset and secondary CPUs would be initialized by the guest kernel 
> >> >> using a
> >> >> rtas call start-cpu.
> >> >> 
> >> >> However, in case of TCG, decrementer interrupts keep on coming and 
> >> >> waking the
> >> >> secondary CPUs up.
> >> >> 
> >> >> These secondary CPUs would see the decrementer interrupt pending, which 
> >> >> makes
> >> >> cpu::has_work() to bring them out of wait loop and start executing
> >> >> tcg_exec_cpu().
> >> >> 
> >> >> The problem with this is all the CPUs wake up and start booting SLOF 
> >> >> image,
> >> >> causing the following exception(4 CPUs TCG VM):
> >> >
> >> > Ok, I'm still trying to understand why the behaviour on reboot is
> >> > different from the first boot.
> >> 
> >> During first boot, the cpu is in the stopped state, so
> >> cpus.c:cpu_thread_is_idle returns true and CPU remains in halted state
> >> until rtas start-cpu. Therefore, we never check the cpu_has_work()
> >> 
> >> In case of reboot, all CPUs are resumed after reboot. So we check the
> >> next condition cpu_has_work() in cpu_thread_is_idle(), where we see a
> >> DECR interrupt and remove the CPU from halted state as the CPU has
> >> work.
> >
> > Ok, so it sounds like we should set stopped on all the secondary CPUs
> > on reset as well.  What's causing them to be resumed after the reset
> > at the moment?
> 
> That is part of the main loop in vl.c, when reset is requested. All the
> vcpus are paused (stopped == true) then system reset is issued, and all
> cpus are resumed (stopped == false). Which is correct.

is it?  Seems we have a different value of 'stopped' on the first boot
compared to reoboots, which doesn't seem right.

> static bool main_loop_should_exit(void)
> {
> [...]
>     request = qemu_reset_requested();
>     if (request) {
>         pause_all_vcpus();
>         qemu_system_reset(request);
>         resume_all_vcpus();
>         if (!runstate_check(RUN_STATE_RUNNING) &&
>                 !runstate_check(RUN_STATE_INMIGRATE)) {
>             runstate_set(RUN_STATE_PRELAUNCH);
>         }
>     }
> [...]
> }
> 
> The CPUs are in halted state, i.e. cpu::halted = 1. We have set
> cpu::halted = 0 for the primary CPU, aka first_cpu, in
> spapr.c::ppc_spapr_reset()
> 
> In case of TCG, we have a decr callback (per CPU) scheduled once the
> machine is started which keeps on running and interrupting irrespective
> of the state of the machine.

Right.  The thing is "halted" means waiting-for-interrupt; it's mostly
used for things like x86 hlt instruction, H_CEDE, short-term nap modes
and so forth.  We're abusing it a little for keeping the secondary
CPUs offline until they're explicitly started.

Trouble is, there isn't a clearly better option - stopped is
automatically turned off after reset as above, so it can't be used for
"stopped under firmware / hypervisor authority".

> tb_env->decr_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, &cpu_ppc_decr_cb, cpu);
> 
> Which keeps on queueing the interrupt to the CPUs. All the other CPUs
> are in a tight loop checking cpu_thread_is_idle(), which returns false
> when it sees the decrementer interrupt, the cpu starts executing:
> 
> cpu_exec()
>   -> cpu_handle_halt()
>      -> sees cpu_has_work() and sets cpu->halted = 0
> 
> And the execution resumes, when it shouldnt have. Ideally, for secondary
> CPUs, cpu::halted flag is cleared in rtas start-cpu call.
> 
> Initially, I thought we should not have interrupt during reset state.
> That was the reason of ignoring decr interrupt when msr_ee was disabled
> in my previous patch. BenH suggested that it is wrong from HW
> perspective.
> 
> Regards,
> Nikunj
> 

-- 
David Gibson                    | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
                                | _way_ _around_!
http://www.ozlabs.org/~dgibson

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]