qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Windows slow boot: contractor wanted


From: Troy Benjegerdes
Subject: Re: [Qemu-devel] Windows slow boot: contractor wanted
Date: Tue, 21 Aug 2012 10:39:46 -0500
User-agent: Mutt/1.5.20 (2009-06-14)

Do you have any way to determine what CPU groups the different VMs
are running on?

If you end up in an overcommit situation where half the 'virtual'
cpus are on one AMD socket, and the other half are on a different
AMD socket, then you'll be thrashing the hypertransport link.

At Cray we were very carefull to never overcommit runnable processes
to CPUS, and generally locked processes to a single cpu.

Have a read of 
http://berrange.com/posts/2010/02/12/controlling-guest-cpu-numa-affinity-in-libvirt-with-qemu-kvm-xen/

I'm going to speculate that when things don't work very well you end up with
memory from a booting guest scattered across many different NUMA nodes/cpus,
and then it really won't matter how good the spin loop/scheduler code is
because you are bound by the additional latency and bandwidth limitations of
running on one socekt and accessing half the memory that's resident on a
different socket.

On Tue, Aug 21, 2012 at 04:21:07PM +0100, Richard Davies wrote:
> Avi Kivity wrote:
> > Richard Davies wrote:
> > > We're running host kernel 3.5.1 and qemu-kvm 1.1.1.
> > >
> > > I hadn't though about it, but I agree this is related to cpu overcommit. 
> > > The
> > > slow boots are intermittent (and infrequent) with cpu overcommit whereas I
> > > don't think it occurs without cpu overcommit.
> > >
> > > In addition, if there is a slow boot ongoing, and you kill some other VMs 
> > > to
> > > reduce cpu overcommit then this will sometimes speed it up.
> > >
> > > I guess the question is why even with overcommit most boots are fine, but
> > > some small fraction then go slow?
> >
> > Could be a bug.  The scheduler and the spin-loop handling code fight
> > each other instead of working well.
> >
> > Please provide snapshots of 'perf top' while a slow boot is in progress.
> 
> Below are two 'perf top' snapshots during a slow boot, which appear to me to
> support your idea of a spin-lock problem.
> 
> There are a lot more "unprocessable samples recorded" messages at the end of
> each snapshot which I haven't included. I think these may be from the guest
> OS - the kernel is listed, and qemu-kvm itself is listed on some other
> traces which I did, although not these.
> 
> Richard.
> 
> 
> 
>    PerfTop:   62249 irqs/sec  kernel:96.9%  exact:  0.0% [4000Hz cycles],  
> (all, 16 CPUs)
> --------------------------------------------------------------------------------------------------------------------------------
> 
>     35.80%  [kernel]          [k] _raw_spin_lock_irqsave
>     21.64%  [kernel]          [k] isolate_freepages_block
>      5.91%  [kernel]          [k] yield_to
>      4.95%  [kernel]          [k] _raw_spin_lock
>      3.37%  [kernel]          [k] kvm_vcpu_on_spin
>      2.74%  [kernel]          [k] add_preempt_count
>      2.45%  [kernel]          [k] _raw_spin_unlock
>      2.33%  [kernel]          [k] sub_preempt_count
>      2.18%  [kernel]          [k] svm_vcpu_run
>      2.17%  [kernel]          [k] kvm_vcpu_yield_to
>      1.89%  [kernel]          [k] memcmp
>      1.50%  [kernel]          [k] get_pid_task
>      1.26%  [kernel]          [k] kvm_arch_vcpu_ioctl_run
>      1.16%  [kernel]          [k] pid_task
>      0.70%  [kernel]          [k] rcu_note_context_switch
>      0.70%  [kernel]          [k] trace_hardirqs_on
>      0.52%  [kernel]          [k] __rcu_read_unlock
>      0.51%  [kernel]          [k] trace_preempt_on
>      0.47%  [kernel]          [k] __srcu_read_lock
>      0.43%  [kernel]          [k] get_parent_ip
>      0.42%  [kernel]          [k] get_pageblock_flags_group
>      0.38%  [kernel]          [k] in_lock_functions
>      0.34%  [kernel]          [k] trace_preempt_off
>      0.34%  [kernel]          [k] trace_hardirqs_off
>      0.29%  [kernel]          [k] clear_page_c
>      0.23%  [kernel]          [k] __srcu_read_unlock
>      0.20%  [kernel]          [k] __rcu_read_lock
>      0.14%  [kernel]          [k] handle_exit
>      0.11%  libc-2.10.1.so    [.] strcmp
>      0.11%  [kernel]          [k] _raw_spin_unlock_irqrestore
>      0.11%  [kernel]          [k] _raw_spin_lock_irq
>      0.11%  [kernel]          [k] find_highest_vector
>      0.09%  [kernel]          [k] ktime_get
>      0.08%  [kernel]          [k] copy_page_c
>      0.08%  [kernel]          [k] pause_interception
>      0.08%  [kernel]          [k] kmem_cache_alloc
>      0.08%  [kernel]          [k] resched_task
>      0.08%  perf              [.] dso__find_symbol
>      0.06%  [kernel]          [k] compaction_alloc
>      0.06%  libc-2.10.1.so    [.] 0x0000000000076dab
>      0.06%  [kernel]          [k] read_tsc
>      0.06%  perf              [.] add_hist_entry
>      0.05%  [kernel]          [k] svm_read_l1_tsc
>      0.05%  [kernel]          [k] native_read_tsc
>      0.05%  perf              [.] sort__dso_cmp
>      0.05%  [kernel]          [k] copy_user_generic_string
>      0.05%  [kernel]          [k] ktime_get_update_offsets
>      0.04%  [kernel]          [k] kvm_check_async_pf_completion
>      0.04%  [kernel]          [k] __schedule
>      0.04%  [kernel]          [k] __rcu_pending
>      0.04%  [kernel]          [k] svm_complete_interrupts
>      0.04%  [kernel]          [k] perf_pmu_disable
>      0.04%  [kernel]          [k] isolate_migratepages_range
>      0.04%  [kernel]          [k] sched_clock_cpu
>      0.04%  [kernel]          [k] kvm_cpu_has_pending_timer
>      0.04%  [kernel]          [k] apic_timer_interrupt
>      0.04%  [vdso]            [.] 0x00007fff2e1ff607
>      0.04%  [kernel]          [k] apic_update_ppr
>      0.04%  [kernel]          [k] do_select
>      0.04%  [kernel]          [k] svm_scale_tsc
>      0.04%  [kernel]          [k] system_call_after_swapgs
>      0.03%  [kernel]          [k] kvm_lapic_get_cr8
>      0.03%  perf              [.] sort__sym_cmp
>      0.03%  [kernel]          [k] find_next_bit
>      0.03%  [kernel]          [k] kvm_set_cr8
>      0.03%  [kernel]          [k] rcu_check_callbacks
> 9750 unprocessable samples recorded.9751 unprocessable samples recorded.9752 
> unprocessable samples recorded.9753 unprocessable samples recorded.9754 
> unprocessable samples recorded.9755 unprocessable samples recorded.9756 
> unprocessable samples recorded.9757 u    nprocessable samples recorded.9758 
> unprocessable samples recorded.9759 unprocessable samples recorded.9760 
> unprocessable samples     recorded.9761 unprocessable samples recorded.9762 
> unprocessable samples recorded.9763 unprocessable samples recorded.
> 
> 
> 
>    PerfTop:   61584 irqs/sec  kernel:97.4%  exact:  0.0% [4000Hz cycles],  
> (all, 16 CPUs)
> --------------------------------------------------------------------------------------------------------------------------------
> 
>     36.73%  [kernel]          [k] _raw_spin_lock_irqsave
>     19.00%  [kernel]          [k] isolate_freepages_block
>      5.80%  [kernel]          [k] yield_to
>      5.23%  [kernel]          [k] _raw_spin_lock
>      3.97%  [kernel]          [k] kvm_vcpu_on_spin
>      2.98%  [kernel]          [k] add_preempt_count
>      2.45%  [kernel]          [k] sub_preempt_count
>      2.37%  [kernel]          [k] _raw_spin_unlock
>      2.22%  [kernel]          [k] svm_vcpu_run
>      2.19%  [kernel]          [k] kvm_vcpu_yield_to
>      1.90%  [kernel]          [k] memcmp
>      1.54%  [kernel]          [k] get_pid_task
>      1.39%  [kernel]          [k] kvm_arch_vcpu_ioctl_run
>      1.30%  [kernel]          [k] pid_task
>      0.75%  [kernel]          [k] rcu_note_context_switch
>      0.74%  [kernel]          [k] trace_hardirqs_on
>      0.58%  [kernel]          [k] __rcu_read_unlock
>      0.55%  [kernel]          [k] trace_preempt_on
>      0.47%  [kernel]          [k] __srcu_read_lock
>      0.44%  [kernel]          [k] get_parent_ip
>      0.41%  [kernel]          [k] clear_page_c
>      0.40%  [kernel]          [k] get_pageblock_flags_group
>      0.39%  [kernel]          [k] in_lock_functions
>      0.36%  [kernel]          [k] trace_preempt_off
>      0.35%  [kernel]          [k] trace_hardirqs_off
>      0.23%  [kernel]          [k] __srcu_read_unlock
>      0.20%  [kernel]          [k] __rcu_read_lock
>      0.15%  [kernel]          [k] _raw_spin_lock_irq
>      0.14%  [kernel]          [k] handle_exit
>      0.12%  [kernel]          [k] find_highest_vector
>      0.11%  [kernel]          [k] resched_task
>      0.10%  libc-2.10.1.so    [.] strcmp
>      0.09%  [kernel]          [k] _raw_spin_unlock_irqrestore
>      0.09%  [kernel]          [k] ktime_get
>      0.08%  [kernel]          [k] pause_interception
>      0.08%  [kernel]          [k] copy_page_c
>      0.07%  [kernel]          [k] __schedule
>      0.07%  [kernel]          [k] compact_zone
>      0.07%  perf              [.] dso__find_symbol
>      0.06%  perf              [.] add_hist_entry
>      0.06%  [kernel]          [k] read_tsc
>      0.06%  [kernel]          [k] svm_read_l1_tsc
>      0.05%  [kernel]          [k] native_read_tsc
>      0.05%  [kernel]          [k] ktime_get_update_offsets
>      0.05%  [kernel]          [k] compaction_alloc
>      0.05%  libc-2.10.1.so    [.] 0x0000000000073ae0
>      0.05%  [kernel]          [k] kmem_cache_alloc
>      0.05%  [kernel]          [k] svm_complete_interrupts
>      0.05%  [kernel]          [k] kvm_check_async_pf_completion
>      0.05%  [kernel]          [k] apic_timer_interrupt
>      0.05%  perf              [.] sort__dso_cmp
>      0.05%  [kernel]          [k] kvm_cpu_has_pending_timer
>      0.04%  [kernel]          [k] svm_scale_tsc
>      0.04%  [kernel]          [k] isolate_migratepages_range
>      0.04%  [kernel]          [k] sched_clock_cpu
>      0.04%  [kernel]          [k] __rcu_pending
>      0.04%  [kernel]          [k] apic_update_ppr
>      0.04%  [kernel]          [k] do_select
>      0.04%  [kernel]          [k] perf_pmu_disable
>      0.04%  [kernel]          [k] kvm_set_cr8
>      0.04%  [kernel]          [k] update_curr
>      0.04%  [kernel]          [k] reschedule_interrupt
>      0.03%  [kernel]          [k] kvm_lapic_get_cr8
>      0.03%  libc-2.10.1.so    [.] strstr
>      0.03%  [kernel]          [k] apic_has_pending_timer
>      0.03%  perf              [.] sort__sym_cmp
> 4963 unprocessable samples recorded.4964 unprocessable samples recorded.4965 
> unprocessable samples recorded.4966 unprocessable samples recorded.4967 
> unprocessable samples recorded.4968 unprocessable samples recorded.4969 
> unprocessable samples recorded.4970 unprocessable samples recorded.4971 
> unprocessable samples recorded.4972 unprocessable samples recorded.4973 
> unprocessable samples recorded.4974 unprocessable samples recorded.4975 
> unprocessable samples recorded.
> 



reply via email to

[Prev in Thread] Current Thread [Next in Thread]