qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v6 wave 2 2/3] hw/isa/lpc_ich9: add broadcast SM


From: Igor Mammedov
Subject: Re: [Qemu-devel] [PATCH v6 wave 2 2/3] hw/isa/lpc_ich9: add broadcast SMI feature
Date: Wed, 18 Jan 2017 13:38:44 +0100

On Wed, 18 Jan 2017 11:23:48 +0100
Laszlo Ersek <address@hidden> wrote:

> On 01/18/17 11:19, Laszlo Ersek wrote:
> > On 01/18/17 11:03, Igor Mammedov wrote:  
> >> On Tue, 17 Jan 2017 19:53:21 +0100
> >> Laszlo Ersek <address@hidden> wrote:
> >>  
> 
> [snip]
> 
> >>> This is the code I wrote (extracting a new ich9_apm_broadcast_smi()
> >>> function from ich9_apm_ctrl_changed(), due to size reasons):
> >>>  
> >>>> static void ich9_apm_broadcast_smi(void)
> >>>> {
> >>>>     CPUState *cs;
> >>>>
> >>>>     cpu_synchronize_all_states(); /* <--------- mark this */  
> >> it probably should be executed after pause_all_vcpus(),
> >> maybe it will help.
> >>  
> >>>>     CPU_FOREACH(cs) {
> >>>>         X86CPU *cpu = X86_CPU(cs);
> >>>>         CPUX86State *env = &cpu->env;
> >>>>
> >>>>         if (env->smbase == 0x30000 && env->eip == 0xfff0) {
> >>>>             CPUClass *k = CPU_GET_CLASS(cs);
> >>>>             uint64_t cpu_arch_id = k->get_arch_id(cs);
> >>>>
> >>>>             trace_ich9_apm_broadcast_smi_skip(cpu_arch_id);
> >>>>             continue;
> >>>>         }
> >>>>
> >>>>         cpu_interrupt(cs, CPU_INTERRUPT_SMI);
> >>>>     }
> >>>> }    
> >>>  
> >> [...]  
> >>> (b) If I add the cpu_synchronize_all_states() call before the loop, then
> >>> the incorrect filter matches go away; QEMU sees the KVM VCPU states
> >>> correctly, and the SMI is broad-cast.
> >>>
> >>> However, in this case, the boot slows to a *crawl*. It's unbearable. All
> >>> VCPUs are pegged at 100%, and you can easily follow the OVMF debug log
> >>> with the naked eye, almost line by line.
> >>> I didn't expect that cpu_synchronize_all_states() would be so costly,
> >>> but it makes the idea of checking VCPU state in the APM_CNT handler a
> >>> non-starter.  
> >> I wonder were bottleneck is to slow down guest so much.
> >> What is actual cost of calling cpu_synchronize_all_states()?
> >>
> >> Maybe calling pause_all_vcpus() before cpu_synchronize_all_states()
> >> would help.  
> > 
> > If I prepend just a pause_all_vcpus() function call, at the top, then
> > the VM freezes (quite understandably) when the first SMI is raised via
> > APM_CNT.
> > 
> > If I, instead, bracket the function body with pause_all_vcpus() and
> > resume_all_vcpus(), like this:
> > 
> > static void ich9_apm_broadcast_smi(void)
> > {
> >     CPUState *cs;
> > 
> >     pause_all_vcpus();
> >     cpu_synchronize_all_states();
> >     CPU_FOREACH(cs) {
> >         X86CPU *cpu = X86_CPU(cs);
> >         CPUX86State *env = &cpu->env;
> > 
> >         if (env->smbase == 0x30000 && env->eip == 0xfff0) {
> >             CPUClass *k = CPU_GET_CLASS(cs);
> >             uint64_t cpu_arch_id = k->get_arch_id(cs);
> > 
> >             trace_ich9_apm_broadcast_smi_skip(cpu_arch_id);
> >             continue;
> >         }
> > 
> >         cpu_interrupt(cs, CPU_INTERRUPT_SMI);
> >     }
> >     resume_all_vcpus();
> > }
> > 
> > then I see the following symptoms:
> > - rather than 4 VCPUs being pegged at 100%, only one VCPU is pegged at
> >   100%
> > - the boot process, as observed from the OVMF log, is just as slow
> >   (= crawling) as without the VCPU pausing/resuming (i.e., like with
> >   cpu_synchronize_all_states() only); so ultimately the
> >   pausing/resuming doesn't help.  
> 
> I also tried this variant (bracketing only the sync call with pause/resume):
> 
> static void ich9_apm_broadcast_smi(void)
> {
>     CPUState *cs;
> 
>     pause_all_vcpus();
>     cpu_synchronize_all_states();
>     resume_all_vcpus();
>     CPU_FOREACH(cs) {
>         X86CPU *cpu = X86_CPU(cs);
>         CPUX86State *env = &cpu->env;
> 
>         if (env->smbase == 0x30000 && env->eip == 0xfff0) {
>             CPUClass *k = CPU_GET_CLASS(cs);
>             uint64_t cpu_arch_id = k->get_arch_id(cs);
> 
>             trace_ich9_apm_broadcast_smi_skip(cpu_arch_id);
>             continue;
>         }
> 
>         cpu_interrupt(cs, CPU_INTERRUPT_SMI);
>     }
> }
> 
> This behaves identically to the version where pause/resume bracket the
> entire function body (i.e., 1 VCPU pegged, super-slow boot progress).
wrapping entire function into pause_all_vcpus()/resume_all_vcpus()
looks better as then cpu_interrupt() would not need to send IPIs to CPUs
since pause_all_vcpus() would have done it.

So the left question is what this 1 CPU does to make things so slow.
Does it send a lot of SMIs or something else?




reply via email to

[Prev in Thread] Current Thread [Next in Thread]