qemu-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Optimized clocksource with AMD AVIC enabled for Windows guest


From: Kechen Lu
Subject: Optimized clocksource with AMD AVIC enabled for Windows guest
Date: Wed, 3 Feb 2021 06:40:15 +0000

[resent for the previous non-plain text format]
Hi KVM & AMD folks,
 
We are trying to enable AVIC on Windows guest and AMD host machine, on upstream 
kernel 5.8+. From our experiments and vmexit metrics, we can see AVIC brings us 
huge benefits over decreased by >80% interrupt vmexit, and totally avoid vintr 
and write_cr8 vmexits. But it seems for Windows guest, we have to give up the 
Hyper-v PV feature on the stimer (hv-stimer feature). So in order to get the 
best of both the worlds, do we have a more optimized clocksource for Windows 
guest which could co-exist with AVIC enabled (as now stimer cannot cowork AVIC) 
?

Some detailed performance analysis below -
 
>From the kvm kernel func kvm_hv_activate_synic in 
>https://elixir.bootlin.com/linux/v5.8/source/arch/x86/kvm/hyperv.c#L891, SynIC 
>enabling would prevent apicv (for AMD it's AVIC), whereas SynIC is the 
>pre-requisite of stimer. >From the actual experiments, without hyper-v stimer, 
>there are a lot of port IO vmexits which potential bring perf down cpu-bound 
>workloads, like geekbench, around 10% of single core performance regressing. 
>As the vmexits result when we enable AVIC but having the hypervclock and rtc 
>as clocksource, without stimer+synic.
 
------------------------------------------------------------------------------------------------------------
Analyze events for all VMs, all VCPUs:
             VM-EXIT    Samples  Samples%     Time%    Min Time    Max Time     
    Avg time
                  io     575088    43.42%     1.96%      0.68us    100.62us     
 7.47us ( +-   0.13% )
                 msr     434530    32.81%     0.29%      0.41us    350.50us     
 1.45us ( +-   0.30% )
                 hlt     308635    23.30%    97.75%      0.43us   3791.74us    
693.91us ( +-   0.12% )
           interrupt       4796     0.36%     0.00%      0.33us   1606.17us     
 1.89us ( +-  18.69% )
           write_cr4        752     0.06%     0.00%      0.53us     34.80us     
 1.42us ( +-   3.97% )
            read_cr4        376     0.03%     0.00%      0.40us      1.32us     
 0.62us ( +-   1.22% )
                 npf         85     0.01%     0.00%      1.68us     57.95us     
 8.33us ( +-  12.54% )
               pause         71     0.01%     0.00%      0.36us      1.44us     
 0.62us ( +-   3.45% )
               cpuid         50     0.00%     0.00%      0.33us      1.11us     
 0.45us ( +-   5.94% )
           hypercall         10     0.00%     0.00%      0.81us      1.42us     
 1.12us ( +-   5.87% )
                 nmi          1     0.00%     0.00%      0.67us      0.67us     
 0.67us ( +-   0.00% )
Total Samples:1324394, Total events handled time:219105470.74us.
-----------------------------------------------------------------------------------------------------------
It shows dramatically high IO vmexits, and we can further see which IO ports 
Windows guest accessed.
-----------------------------------------------------
Analyze events for all VMs, all VCPUs:
 
      IO Port Access    Samples  Samples%     Time%    Min Time    Max Time     
    Avg time
 
           0x70:POUT     287544    50.00%    13.10%      0.40us     23.48us     
 0.53us ( +-   0.06% )
            0x71:PIN     226154    39.33%     7.60%      0.31us     22.91us     
 0.39us ( +-   0.08% )
           0x71:POUT      61390    10.67%    79.31%     12.92us     69.99us     
14.95us ( +-   0.09% )
 
Total Samples:575088, Total events handled time:1156983.53us.
---------------------------------------------
However 0070-0071 are rtc0 port, which means there are horrible guest RTC 
access overhead. With stimer + synic on and AVIC disabled, the vmexit metrics 
look much better over IO and MSR, as below.
-----------------------------------------
Analyze events for all VMs, all VCPUs:
             VM-EXIT    Samples  Samples%     Time%    Min Time    Max Time     
    Avg time
                 hlt     166815    38.30%    99.66%      0.44us   1556.67us    
809.48us ( +-   0.11% )
           interrupt     146218    33.57%     0.13%      0.30us   1362.10us     
 1.19us ( +-   1.50% )
                 msr     105267    24.17%     0.20%      0.37us     87.47us     
 2.51us ( +-   0.31% )
               vintr       9285     2.13%     0.01%      0.50us      1.92us     
 0.78us ( +-   0.16% )
           write_cr8       7537     1.73%     0.00%      0.31us     49.14us     
 0.66us ( +-   1.08% )
               cpuid        174     0.04%     0.00%      0.31us      1.39us     
 0.46us ( +-   3.21% )
                 npf        143     0.03%     0.00%      1.49us    237.66us     
21.04us ( +-  12.04% )
           write_cr4         32     0.01%     0.00%      0.93us      5.78us     
 2.10us ( +-  11.38% )
               pause         22     0.01%     0.00%      0.45us      1.33us     
 0.84us ( +-   5.46% )
            read_cr4         16     0.00%     0.00%      0.47us      0.68us     
 0.60us ( +-   2.19% )
                 nmi         11     0.00%     0.00%      0.35us      0.70us     
 0.54us ( +-   5.06% )
           write_dr7          2     0.00%     0.00%      0.43us      0.45us     
 0.44us ( +-   2.27% )
           hypercall          1     0.00%     0.00%      0.97us      0.97us     
 0.97us ( +-   0.00% )
Total Samples:435523, Total events handled time:135488497.29us.
---------------------------------
>From the above observations, trying to see if there's a way for enabling AVIC 
>while also having the most optimized clock source for windows guest.
 
Really appreciated and looking forward to your response.

Best Regards,
Kechen





reply via email to

[Prev in Thread] Current Thread [Next in Thread]