[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] vm performance degradation after kvm live migration or
From: |
Zhanghaoyu (A) |
Subject: |
Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with ETP enabled |
Date: |
Tue, 30 Jul 2013 09:04:56 +0000 |
>> >> hi all,
>> >>
>> >> I met similar problem to these, while performing live migration or
>> >> save-restore test on the kvm platform (qemu:1.4.0, host:suse11sp2,
>> >> guest:suse11sp2), running tele-communication software suite in
>> >> guest,
>> >> https://lists.gnu.org/archive/html/qemu-devel/2013-05/msg00098.html
>> >> http://comments.gmane.org/gmane.comp.emulators.kvm.devel/102506
>> >> http://thread.gmane.org/gmane.comp.emulators.kvm.devel/100592
>> >> https://bugzilla.kernel.org/show_bug.cgi?id=58771
>> >>
>> >> After live migration or virsh restore [savefile], one process's CPU
>> >> utilization went up by about 30%, resulted in throughput
>> >> degradation of this process.
>> >>
>> >> If EPT disabled, this problem gone.
>> >>
>> >> I suspect that kvm hypervisor has business with this problem.
>> >> Based on above suspect, I want to find the two adjacent versions of
>> >> kvm-kmod which triggers this problem or not (e.g. 2.6.39, 3.0-rc1),
>> >> and analyze the differences between this two versions, or apply the
>> >> patches between this two versions by bisection method, finally find the
>> >> key patches.
>> >>
>> >> Any better ideas?
>> >>
>> >> Thanks,
>> >> Zhang Haoyu
>> >
>> >I've attempted to duplicate this on a number of machines that are as
>> >similar to yours as I am able to get my hands on, and so far have not been
>> >able to see any performance degradation. And from what I've read in the
>> >above links, huge pages do not seem to be part of the problem.
>> >
>> >So, if you are in a position to bisect the kernel changes, that would
>> >probably be the best avenue to pursue in my opinion.
>> >
>> >Bruce
>>
>> I found the first bad
>> commit([612819c3c6e67bac8fceaa7cc402f13b1b63f7e4] KVM: propagate fault r/w
>> information to gup(), allow read-only memory) which triggers this problem by
>> git bisecting the kvm kernel (download from
>> https://git.kernel.org/pub/scm/virt/kvm/kvm.git) changes.
>>
>> And,
>> git log 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4 -n 1 -p >
>> 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.log
>> git diff
>> 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4~1..612819c3c6e67bac8fceaa7cc4
>> 02f13b1b63f7e4 > 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.diff
>>
>> Then, I diffed 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.log and
>> 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.diff,
>> came to a conclusion that all of the differences between
>> 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4~1 and
>> 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4
>> are contributed by no other than 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4,
>> so this commit is the peace-breaker which directly or indirectly causes the
>> degradation.
>>
>> Does the map_writable flag passed to mmu_set_spte() function have effect on
>> PTE's PAT flag or increase the VMEXITs induced by that guest tried to write
>> read-only memory?
>>
>> Thanks,
>> Zhang Haoyu
>>
>
>There should be no read-only memory maps backing guest RAM.
>
>Can you confirm map_writable = false is being passed to __direct_map? (this
>should not happen, for guest RAM).
>And if it is false, please capture the associated GFN.
>
I added below check and printk at the start of __direct_map() at the fist bad
commit version,
--- kvm-612819c3c6e67bac8fceaa7cc402f13b1b63f7e4/arch/x86/kvm/mmu.c
2013-07-26 18:44:05.000000000 +0800
+++ kvm-612819/arch/x86/kvm/mmu.c 2013-07-31 00:05:48.000000000 +0800
@@ -2223,6 +2223,9 @@ static int __direct_map(struct kvm_vcpu
int pt_write = 0;
gfn_t pseudo_gfn;
+ if (!map_writable)
+ printk(KERN_ERR "%s: %s: gfn = %llu \n", __FILE__, __func__,
gfn);
+
for_each_shadow_entry(vcpu, (u64)gfn << PAGE_SHIFT, iterator) {
if (iterator.level == level) {
unsigned pte_access = ACC_ALL;
I virsh-save the VM, and then virsh-restore it, so many GFNs were printed, you
can absolutely describe it as flooding.
>Its probably an issue with an older get_user_pages variant (either in kvm-kmod
>or the older kernel). Is there any indication of a similar issue with upstream
>kernel?
I will test the upstream kvm
host(https://git.kernel.org/pub/scm/virt/kvm/kvm.git) later, if the problem is
still there,
I will revert the first bad commit patch:
612819c3c6e67bac8fceaa7cc402f13b1b63f7e4 on the upstream, then test it again.
And, I collected the VMEXITs statistics in pre-save and post-restore period at
first bad commit version,
pre-save:
COTS-F10S03:~ # perf stat -e "kvm:*" -a sleep 30
Performance counter stats for 'sleep 30':
1222318 kvm:kvm_entry
0 kvm:kvm_hypercall
0 kvm:kvm_hv_hypercall
351755 kvm:kvm_pio
6703 kvm:kvm_cpuid
692502 kvm:kvm_apic
1234173 kvm:kvm_exit
223956 kvm:kvm_inj_virq
0 kvm:kvm_inj_exception
16028 kvm:kvm_page_fault
59872 kvm:kvm_msr
0 kvm:kvm_cr
169596 kvm:kvm_pic_set_irq
81455 kvm:kvm_apic_ipi
245103 kvm:kvm_apic_accept_irq
0 kvm:kvm_nested_vmrun
0 kvm:kvm_nested_intercepts
0 kvm:kvm_nested_vmexit
0 kvm:kvm_nested_vmexit_inject
0 kvm:kvm_nested_intr_vmexit
0 kvm:kvm_invlpga
0 kvm:kvm_skinit
853020 kvm:kvm_emulate_insn
171140 kvm:kvm_set_irq
171534 kvm:kvm_ioapic_set_irq
0 kvm:kvm_msi_set_irq
99276 kvm:kvm_ack_irq
971166 kvm:kvm_mmio
33722 kvm:kvm_fpu
0 kvm:kvm_age_page
0 kvm:kvm_try_async_get_page
0 kvm:kvm_async_pf_not_present
0 kvm:kvm_async_pf_ready
0 kvm:kvm_async_pf_completed
0 kvm:kvm_async_pf_doublefault
30.019069018 seconds time elapsed
post-restore:
COTS-F10S03:~ # perf stat -e "kvm:*" -a sleep 30
Performance counter stats for 'sleep 30':
1327880 kvm:kvm_entry
0 kvm:kvm_hypercall
0 kvm:kvm_hv_hypercall
375189 kvm:kvm_pio
6925 kvm:kvm_cpuid
804414 kvm:kvm_apic
1339352 kvm:kvm_exit
245922 kvm:kvm_inj_virq
0 kvm:kvm_inj_exception
15856 kvm:kvm_page_fault
39500 kvm:kvm_msr
1 kvm:kvm_cr
179150 kvm:kvm_pic_set_irq
98436 kvm:kvm_apic_ipi
247430 kvm:kvm_apic_accept_irq
0 kvm:kvm_nested_vmrun
0 kvm:kvm_nested_intercepts
0 kvm:kvm_nested_vmexit
0 kvm:kvm_nested_vmexit_inject
0 kvm:kvm_nested_intr_vmexit
0 kvm:kvm_invlpga
0 kvm:kvm_skinit
955410 kvm:kvm_emulate_insn
182240 kvm:kvm_set_irq
182562 kvm:kvm_ioapic_set_irq
0 kvm:kvm_msi_set_irq
105267 kvm:kvm_ack_irq
1113999 kvm:kvm_mmio
37789 kvm:kvm_fpu
0 kvm:kvm_age_page
0 kvm:kvm_try_async_get_page
0 kvm:kvm_async_pf_not_present
0 kvm:kvm_async_pf_ready
0 kvm:kvm_async_pf_completed
0 kvm:kvm_async_pf_doublefault
30.000779718 seconds time elapsed
Thanks,
Zhang Haoyu
- Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with ETP enabled, (continued)
- Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with ETP enabled, Michael S. Tsirkin, 2013/07/11
- Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with ETP enabled, Gleb Natapov, 2013/07/11
- Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with ETP enabled, Xiao Guangrong, 2013/07/11
- Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with ETP enabled, Andreas Färber, 2013/07/11
- Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with ETP enabled, Bruce Rogers, 2013/07/11