qemu-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Optimized clocksource with AMD AVIC enabled for Windows guest


From: Kechen Lu
Subject: Optimized clocksource with AMD AVIC enabled for Windows guest
Date: Wed, 3 Feb 2021 05:40:24 +0000

Hi KVM & AMD folks,

 

We are trying to enable AVIC on Windows guest and AMD host machine, on upstream kernel 5.8+. From our experiments and vmexit metrics, we can see AVIC brings us huge benefits over decreased by >80% interrupt vmexit, and totally avoid vintr and write_cr8 vmexits. But it seems for Windows guest, we have to give up the Hyper-v PV feature on the stimer (hv-stimer feature). So in order to get the best of both the worlds, do we have a more optimized clocksource for Windows guest which could co-exist with AVIC enabled (as now stimer cannot cowork AVIC) ?

 

Some detailed performance analysis below -

 

From the kvm kernel func kvm_hv_activate_synic in https://elixir.bootlin.com/linux/v5.8/source/arch/x86/kvm/hyperv.c#L891, SynIC enabling would prevent apicv (for AMD it’s AVIC), whereas SynIC is the pre-requisite of stimer. From the actual experiments, without hyper-v stimer, there are a lot of port IO vmexits which potential bring perf down cpu-bound workloads, like geekbench, around 10% of single core performance regressing. As the vmexits result when we enable AVIC but having the hypervclock and rtc as clocksource, without stimer+synic.

 ------------------------------------------------------------------------------------------------------------

Analyze events for all VMs, all VCPUs:

             VM-EXIT    Samples  Samples%     Time%    Min Time    Max Time         Avg time

                  io     575088    43.42%     1.96%      0.68us    100.62us      7.47us ( +-   0.13% )

                 msr     434530    32.81%     0.29%      0.41us    350.50us      1.45us ( +-   0.30% )

                 hlt     308635    23.30%    97.75%      0.43us   3791.74us    693.91us ( +-   0.12% )

           interrupt       4796     0.36%     0.00%      0.33us   1606.17us      1.89us ( +-  18.69% )

           write_cr4        752     0.06%     0.00%      0.53us     34.80us      1.42us ( +-   3.97% )

            read_cr4        376     0.03%     0.00%      0.40us      1.32us      0.62us ( +-   1.22% )

                 npf         85     0.01%     0.00%      1.68us     57.95us      8.33us ( +-  12.54% )

               pause         71     0.01%     0.00%      0.36us      1.44us      0.62us ( +-   3.45% )

               cpuid         50     0.00%     0.00%      0.33us      1.11us      0.45us ( +-   5.94% )

           hypercall         10     0.00%     0.00%      0.81us      1.42us      1.12us ( +-   5.87% )

                 nmi          1     0.00%     0.00%      0.67us      0.67us      0.67us ( +-   0.00% )

Total Samples:1324394, Total events handled time:219105470.74us.

-----------------------------------------------------------------------------------------------------------

It shows dramatically high IO vmexits, and we can further see which IO ports Windows guest accessed.

-----------------------------------------------------

Analyze events for all VMs, all VCPUs:

 

      IO Port Access    Samples  Samples%     Time%    Min Time    Max Time         Avg time

 

           0x70:POUT     287544    50.00%    13.10%      0.40us     23.48us      0.53us ( +-   0.06% )

            0x71:PIN     226154    39.33%     7.60%      0.31us     22.91us      0.39us ( +-   0.08% )

           0x71:POUT      61390    10.67%    79.31%     12.92us     69.99us     14.95us ( +-   0.09% )

 

Total Samples:575088, Total events handled time:1156983.53us.

---------------------------------------------

However 0070-0071 are rtc0 port, which means there are horrible guest RTC access overhead. With stimer + synic on and AVIC disabled, the vmexit metrics look much better over IO and MSR, as below.
-----------------------------------------

Analyze events for all VMs, all VCPUs:

             VM-EXIT    Samples  Samples%     Time%    Min Time    Max Time         Avg time

                 hlt     166815    38.30%    99.66%      0.44us   1556.67us    809.48us ( +-   0.11% )

           interrupt     146218    33.57%     0.13%      0.30us   1362.10us      1.19us ( +-   1.50% )

                 msr     105267    24.17%     0.20%      0.37us     87.47us      2.51us ( +-   0.31% )

               vintr       9285     2.13%     0.01%      0.50us      1.92us      0.78us ( +-   0.16% )

           write_cr8       7537     1.73%     0.00%      0.31us     49.14us      0.66us ( +-   1.08% )

               cpuid        174     0.04%     0.00%      0.31us      1.39us      0.46us ( +-   3.21% )

                 npf        143     0.03%     0.00%      1.49us    237.66us     21.04us ( +-  12.04% )

           write_cr4         32     0.01%     0.00%      0.93us      5.78us      2.10us ( +-  11.38% )

               pause         22     0.01%     0.00%      0.45us      1.33us      0.84us ( +-   5.46% )

            read_cr4         16     0.00%     0.00%      0.47us      0.68us      0.60us ( +-   2.19% )

                 nmi         11     0.00%     0.00%      0.35us      0.70us      0.54us ( +-   5.06% )

           write_dr7          2     0.00%     0.00%      0.43us      0.45us      0.44us ( +-   2.27% )

           hypercall          1     0.00%     0.00%      0.97us      0.97us      0.97us ( +-   0.00% )

Total Samples:435523, Total events handled time:135488497.29us.

---------------------------------

From the above observations, trying to see if there’s a way for enabling AVIC while also having the most optimized clock source for windows guest.

 

Really appreciated and looking forward to your response.

 

Best Regards,

Kechen

 


reply via email to

[Prev in Thread] Current Thread [Next in Thread]