qemu-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Optimized clocksource with AMD AVIC enabled for Windows guest


From: Vitaly Kuznetsov
Subject: RE: Optimized clocksource with AMD AVIC enabled for Windows guest
Date: Thu, 25 Feb 2021 11:25:29 +0100

Kechen Lu <kechenl@nvidia.com> writes:

> Hi Vitaly and Paolo,
>
> Sorry for the delay in response, finally got chance to access a machine with 
> AVIC, and was able to test out the patch and reconfirm through some 
> benchmarks and tests again today:) 
>  
> In summary, this patch works well and resolves the issues on clocksource 
> caused high port I/O vmexits. With AVIC=1 && stimer/synic=1, 
>  
> 1.    CPU intensive workload CPU-z shows SingleThread score 15% improvement 
> 382.1=> 441.7,    
>  
> 2.    disk I/O intensive workload Passmark Disk Test gives 4% improvement 
> 12706=> 13265,              
>  
> 3.    Vmexits pattern of 30s record while running cpu workload Geekbench in 
> guest showing dramatic 90.7% decrease on port IO vmexits, so as the HLT and 
> NPF vmexits, when we get stimer benefit plus AVIC. Details as below:       
>  
> AVIC=1 && stimer/synic=0 && vapic=0:
>  
>              VM-EXIT    Samples  Samples%     Time%    Min Time    Max Time   
>       Avg time
>  
>                   io     344654    68.29%     1.10%      0.67us   2132.72us   
>    7.01us ( +-   0.19% )
>                  hlt     114046    22.60%    98.85%      0.42us  16666.32us   
> 1903.26us ( +-   0.66% )
> avic_incomplete_ipi      19679     3.90%     0.03%      0.38us     22.67us    
>   3.66us ( +-   0.71% )
>                  npf       8186     1.62%     0.01%      0.37us    235.76us   
>    1.46us ( +-   4.20% )
>             ........                      
>
>  
> AVIC=1 && stimer/synic=1 && vapic=0:
>  
>              VM-EXIT    Samples  Samples%     Time%    Min Time    Max Time   
>       Avg time
>  
>                   io      31995    38.61%     0.10%      2.79us     65.83us   
>    6.70us ( +-   0.35% )
>                  hlt      22915    27.65%    99.88%      0.42us  15959.14us   
> 9535.38us ( +-   0.50% )
> avic_incomplete_ipi       8271     9.98%     0.01%      0.39us     79.03us    
>   3.58us ( +-   1.23% )
>                  npf       1232     1.49%     0.00%      0.36us    100.25us   
>    2.58us ( +-   6.98% )
>       ..........                                                              
>                                                                              
>
> While testing, I also found out hv-vapic should be disabled as well to
> make AVIC fully functional, otherwise it shows high vmexits due to MSR
> writes which seems to be due to  increased access to HV_X64_MSR_EOI
> and HV_X64_MSR_ICR. This makes sense to me, since AVIC conflicts with
> PV EOI/ICR accesses. So far I think AVIC=1 && hv-vapic=0 &&
> stimer/synic=1 combination gives us the best performance. However,
> AVIC=1 && hv-vapic=0 && stimer/synic=1 is really unstable, and
> sometimes would lead to boot. Wanted to understand if instabilities
> with APICv/AVIC is a known bug/issue in upstream? Attached the
> reproducible kernel warning in the bottom.

Now it's my turn to apologize for the delayed reply :-)

I think it's our fault,

BIT(3) in HYPERV_CPUID_ENLIGHTMENT_INFO is

HV_X64_APIC_ACCESS_RECOMMENDED
which can be deciphered as 

"Recommend using MSRs for accessing APIC registers EOI, ICR and TPR
rather than their memory-mapped counterparts"

And we shouldn't be setting it with AVIC. The following hack is supposed
to help:

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index c8f2592ccc99..66ee85a83e9a 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -145,6 +145,13 @@ void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu)
                                           vcpu->arch.ia32_misc_enable_msr &
                                           MSR_IA32_MISC_ENABLE_MWAIT);
        }
+
+       /* Dirty hack: force HV_DEPRECATING_AEOI_RECOMMENDED. Not to be merged! 
*/
+       best = kvm_find_cpuid_entry(vcpu, HYPERV_CPUID_ENLIGHTMENT_INFO, 0);
+       if (best) {
+               best->eax &= ~HV_X64_APIC_ACCESS_RECOMMENDED;
+               best->eax |= HV_DEPRECATING_AEOI_RECOMMENDED;
+       }
 }
 EXPORT_SYMBOL_GPL(kvm_update_cpuid_runtime);

(we'll need to find a proper way to set these settings in QEMU).
 
Could you give it a spin? ("AVIC=1 && hv-vapic=1 && stimer/synic=1" 
configuration)

>  
> In all, AVIC=1 && hv-vapic=1 && stimer/synic=1 could work stably now and 
> still produce great benefits on vmexits optimization. Thanks all you folks 
> help so much, hope the patch in kernel and bit expose patch in QEMU could get 
> into upstream soon along with fixing the instabilities.
>  
> Best Regards,
> Kechen
>
> ---------------------------------------------------------------------------------------
> [ 7962.437584] ------------[ cut here ]------------
> [ 7962.437586] Invalid IPI target: index=2, vcpu=0, icr=0x4000000:0x82f
> [ 7962.437603] WARNING: CPU: 4 PID: 7109 at arch/x86/kvm/svm/avic.c:349 
> avic_incomplete_ipi_interception+0x1ff/0x240 [kvm_amd]
> [ 7962.437604] Modules linked in: kvm_amd ccp kvm msr nf_tables nfnetlink 
> bridge stp llc amd64_edac_mod edac_mce_amd nls_iso8859_1 amd_energy 
> crct10dif_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd 
> glue_helper snd_hda_codec_hdmi rapl snd_hda_intel snd_intel_dspcfg wmi_bmof 
> snd_hda_codec snd_usb_audio snd_hda_core snd_usbmidi_lib snd_hwdep 
> snd_seq_midi snd_seq_midi_event snd_rawmidi efi_pstore joydev mc input_leds 
> snd_seq snd_pcm snd_seq_device snd_timer snd soundcore k10temp mac_hid 
> sch_fq_codel lm92 parport_pc ppdev lp parport ip_tables x_tables autofs4 iavf 
> hid_generic usbhid hid nvme crc32_pclmul i40e ahci nvme_core xhci_pci libahci 
> xhci_pci_renesas i2c_piix4 atlantic macsec wmi [last unloaded: ccp]
> [ 7962.437630] CPU: 4 PID: 7109 Comm: CPU 0/KVM Tainted: P        W  OE     
> 5.8.0-41-generic #46
> [ 7962.437633] RIP: 0010:avic_incomplete_ipi_interception+0x1ff/0x240 
> [kvm_amd]

No, this is not somthing I'm aware of. Do you know if it reproduces on
the latest upstream?

> [ 7962.437635] Code: 9a 00 00 00 0f 85 2b ff ff ff 41 8b 56 24 8b 4d c8 45 89 
> e0 44 89 ee 48 c7 c7 a8 34 50 c0 c6 05 b2 9a 00 00 01 e8 d6 cc 3a fb <0f> 0b 
> e9 04 ff ff ff 48 8b 5d c0 8b 55 c8 be 10 03 00 00 48 89 df
> [ 7962.437636] RSP: 0018:ffffa7894f9bfcc0 EFLAGS: 00010282
> [ 7962.437637] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 
> ffff99347f118cd8
> [ 7962.437637] RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: 
> ffff99347f118cd0
> [ 7962.437638] RBP: ffffa7894f9bfd18 R08: 0000000000000004 R09: 
> 0000000000000831
> [ 7962.437638] R10: 0000000000000000 R11: 0000000000000001 R12: 
> 040000000000082f
> [ 7962.437639] R13: 0000000000000002 R14: ffff993345653448 R15: 
> 0000000000000002
> [ 7962.437640] FS:  0000000000000000(0053) GS:ffff99347f100000(002b) 
> knlGS:fffff80470728000
> [ 7962.437640] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 7962.437641] CR2: ffff8006ace2b000 CR3: 0000000febd88000 CR4: 
> 0000000000340ee0
> [ 7962.437641] Call Trace:
> [ 7962.437646]  handle_exit+0x134/0x420 [kvm_amd]
> [ 7962.437661]  ? kvm_set_cr8+0x22/0x40 [kvm]
> [ 7962.437674]  vcpu_enter_guest+0x862/0xd90 [kvm]
> [ 7962.437687]  vcpu_run+0x76/0x240 [kvm]
> [ 7962.437699]  kvm_arch_vcpu_ioctl_run+0x9f/0x2b0 [kvm]
> [ 7962.437711]  kvm_vcpu_ioctl+0x247/0x600 [kvm]
> [ 7962.437714]  ksys_ioctl+0x8e/0xc0
> [ 7962.437715]  __x64_sys_ioctl+0x1a/0x20
> [ 7962.437717]  do_syscall_64+0x49/0xc0
> [ 7962.437719]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [ 7962.437720] RIP: 0033:0x7f4c09b1131b
> [ 7962.437721] Code: 89 d8 49 8d 3c 1c 48 f7 d8 49 39 c4 72 b5 e8 1c ff ff ff 
> 85 c0 78 ba 4c 89 e0 5b 5d 41 5c c3 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 
> 01 f0 ff ff 73 01 c3 48 8b 0d 1d 3b 0d 00 f7 d8 64 89 01 48
> [ 7962.437721] RSP: 002b:00007f4bedffa4a8 EFLAGS: 00000246 ORIG_RAX: 
> 0000000000000010
> [ 7962.437722] RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 
> 00007f4c09b1131b
> [ 7962.437723] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 
> 0000000000000015
> [ 7962.437723] RBP: 0000563c35a94990 R08: 0000563c33b95a30 R09: 
> 0000000000000004
> [ 7962.437724] R10: 0000000000000000 R11: 0000000000000246 R12: 
> 0000000000000000
> [ 7962.437724] R13: 0000563c34196d00 R14: 0000000000000000 R15: 
> 00007f4bedffb640
> [ 7962.437726] ---[ end trace 7f0f339c3a001d7b ]---
>

-- 
Vitaly




reply via email to

[Prev in Thread] Current Thread [Next in Thread]