qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] vm performance degradation after kvm live migration or


From: Gleb Natapov
Subject: Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with ETP enabled
Date: Thu, 1 Aug 2013 09:16:50 +0300

On Tue, Jul 30, 2013 at 09:04:56AM +0000, Zhanghaoyu (A) wrote:
> 
> >> >> hi all,
> >> >> 
> >> >> I met similar problem to these, while performing live migration or 
> >> >> save-restore test on the kvm platform (qemu:1.4.0, host:suse11sp2, 
> >> >> guest:suse11sp2), running tele-communication software suite in 
> >> >> guest, 
> >> >> https://lists.gnu.org/archive/html/qemu-devel/2013-05/msg00098.html
> >> >> http://comments.gmane.org/gmane.comp.emulators.kvm.devel/102506
> >> >> http://thread.gmane.org/gmane.comp.emulators.kvm.devel/100592
> >> >> https://bugzilla.kernel.org/show_bug.cgi?id=58771
> >> >> 
> >> >> After live migration or virsh restore [savefile], one process's CPU 
> >> >> utilization went up by about 30%, resulted in throughput 
> >> >> degradation of this process.
> >> >> 
> >> >> If EPT disabled, this problem gone.
> >> >> 
> >> >> I suspect that kvm hypervisor has business with this problem.
> >> >> Based on above suspect, I want to find the two adjacent versions of 
> >> >> kvm-kmod which triggers this problem or not (e.g. 2.6.39, 3.0-rc1), 
> >> >> and analyze the differences between this two versions, or apply the 
> >> >> patches between this two versions by bisection method, finally find the 
> >> >> key patches.
> >> >> 
> >> >> Any better ideas?
> >> >> 
> >> >> Thanks,
> >> >> Zhang Haoyu
> >> >
> >> >I've attempted to duplicate this on a number of machines that are as 
> >> >similar to yours as I am able to get my hands on, and so far have not 
> >> >been able to see any performance degradation. And from what I've read in 
> >> >the above links, huge pages do not seem to be part of the problem.
> >> >
> >> >So, if you are in a position to bisect the kernel changes, that would 
> >> >probably be the best avenue to pursue in my opinion.
> >> >
> >> >Bruce
> >> 
> >> I found the first bad 
> >> commit([612819c3c6e67bac8fceaa7cc402f13b1b63f7e4] KVM: propagate fault r/w 
> >> information to gup(), allow read-only memory) which triggers this problem 
> >> by git bisecting the kvm kernel (download from 
> >> https://git.kernel.org/pub/scm/virt/kvm/kvm.git) changes.
> >> 
> >> And,
> >> git log 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4 -n 1 -p > 
> >> 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.log
> >> git diff 
> >> 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4~1..612819c3c6e67bac8fceaa7cc4
> >> 02f13b1b63f7e4 > 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.diff
> >> 
> >> Then, I diffed 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.log and 
> >> 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.diff,
> >> came to a conclusion that all of the differences between 
> >> 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4~1 and 
> >> 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4
> >> are contributed by no other than 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4, 
> >> so this commit is the peace-breaker which directly or indirectly causes 
> >> the degradation.
> >> 
> >> Does the map_writable flag passed to mmu_set_spte() function have effect 
> >> on PTE's PAT flag or increase the VMEXITs induced by that guest tried to 
> >> write read-only memory?
> >> 
> >> Thanks,
> >> Zhang Haoyu
> >> 
> >
> >There should be no read-only memory maps backing guest RAM.
> >
> >Can you confirm map_writable = false is being passed to __direct_map? (this 
> >should not happen, for guest RAM).
> >And if it is false, please capture the associated GFN.
> >
> I added below check and printk at the start of __direct_map() at the fist bad 
> commit version,
> --- kvm-612819c3c6e67bac8fceaa7cc402f13b1b63f7e4/arch/x86/kvm/mmu.c     
> 2013-07-26 18:44:05.000000000 +0800
> +++ kvm-612819/arch/x86/kvm/mmu.c       2013-07-31 00:05:48.000000000 +0800
> @@ -2223,6 +2223,9 @@ static int __direct_map(struct kvm_vcpu
>         int pt_write = 0;
>         gfn_t pseudo_gfn;
> 
> +        if (!map_writable)
> +                printk(KERN_ERR "%s: %s: gfn = %llu \n", __FILE__, __func__, 
> gfn);
> +
>         for_each_shadow_entry(vcpu, (u64)gfn << PAGE_SHIFT, iterator) {
>                 if (iterator.level == level) {
>                         unsigned pte_access = ACC_ALL;
> 
> I virsh-save the VM, and then virsh-restore it, so many GFNs were printed, 
> you can absolutely describe it as flooding.
> 
The flooding you see happens during migrate to file stage because of dirty
page tracking. If you clear dmesg after virsh-save you should not see any
flooding after virsh-restore. I just checked with latest tree, I do not.


--
                        Gleb.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]