qemu-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-discuss] Puzzling performance comparison with KVM and Hyper-V


From: Tim Bell
Subject: Re: [Qemu-discuss] Puzzling performance comparison with KVM and Hyper-V
Date: Tue, 21 Jul 2015 16:16:22 +0200
User-agent: Alpine 2.02 (LRH 1266 2009-07-14)



On Tue, 21 Jul 2015, Carlos Torres wrote:



On Jul 21, 2015 5:45 AM, Tim Bell <address@hidden> wrote:
>
>  
>
> We are running a compute intensive application on a variety of virtual 
machines at CERN (a subset of Spec 2006). We have found two puzzling results
during this benchmarking and can’t find the root cause after significant effort.
>
>  
>
> 1.      Large virtual machines on KVM (32 cores) show a much worse 
performance than smaller ones
>
> 2.      Hyper-V overhead is significantly less compared to KVM
>
>  
>
> We have tuned the KSM configuration with EPT off and CPU pinning but the 
overheads remain significant.
>
>  
>
> 4 VMs 8 cores:  2.5% overhead compared to bare metal
>
> 2 VMs 16 cores: 8.4% overhead compared to bare metal
>
> 1 VM 32 cores: 12.9% overhead compared to bare metal
>
>  
>
> Running the same test using Hyper-V produced
>
>  
>
> 4 VMs 8 cores: 0.8% overhead compared to bare metal
>
> 1 VM 32 cores: 3.3% overhead compared to bare metal
>
>  
>
> Can anyone suggest how to tune KVM to get equivalent performance to Hyper-V ?
>
>  
>
> Configuration
>
>  
>
> Hardware is Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz, SMT enabled, 2GB/core
>
> CentOS 7 KVM hypervisor with CentOS 6 guest
>
> Windows 2012 Hyper-V hypervisor with CentOS 6 guest
>
> Benchmark is HEPSpec, the c++ subset of Spec 2006
>
> The benchmarks are run in parallel according the number of cores. Thus, the 
1x32 test runs 32 copies of the benchmark in a single VM on the hypervisor.
The 4x8 test runs 4 VMs on the same hypervisor, with each VM running 8 copies 
of the benchmark simultaneously.
>
>  
>
>  
>
>  

Tim,

This is really interesting, it reminds me of an issue we found on IBM Power 
hypervisor, related to the allocation by the scheduler on NUMA hardware.

I'm not KVM expert by any means, but I'll try to help.

I'm assuming power saving features are disabled, and the scaling governor on 
the kernel is set to performance, and that you CPU pinned the qemu/kvm
processes in the host to different physical CPU cores.


We've set the governor to performance (via tuned as the virtual guest profile). The pinning has been done as we are not overcommitting.

Is this a NUMA architecture? I assume it is, since each of those processor 
chips only have 8 cores, unless you are over-subscribing CPUs. What's your NUMA
topology?


Yes, it's a two processor each with 8 physical cores (16 corers with SMT).

If using NUMA, which numa policy are you using (interleave, local, other). 
Could you share the numastat output from the host (hypervisor) between runs?

My running hypothesis is that while the benchmark is CPU intensive, the 
benchmark might still be slowing down due to fetching memory across NUMA nodes
caused by the interleaving.


Quite likely.... we had suspected also (the benchmark is more compute intensive, i.e. CPU and memory).

It does seem non-intuitive that the smaller VMs have less overhead though as I would have expected the opportunity to adjust the workload placement to be better for larger VMs.

You might also experiment with making your VMs NUMA aware, especially the ones 
that use more CPUs available in a single processor chip.


We'll have another go at the NUMA configuration to be really sure.

If this data doesn't correlate at all with the behavior, then you'll need to 
profile with tools like linux perf and perf_events, or some other profiler
that can give you data from performance hardware counters.

Best of luck,
Carlos Torres



reply via email to

[Prev in Thread] Current Thread [Next in Thread]