qemu-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-discuss] Kernel panic in VMs with large amounts ofmemory(>1TB)


From: Burkhard Linke
Subject: Re: [Qemu-discuss] Kernel panic in VMs with large amounts ofmemory(>1TB)
Date: Wed, 6 Dec 2017 16:39:32 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0

Hi,


On 11/30/2017 11:31 AM, Alberto Garcia wrote:
On Thu, Nov 30, 2017 at 09:43:04AM +0100, Burkhard Linke wrote:

VMs are running fine with less or equal 1 TB RAM. More RAM results
in a kernel panic during VM boot:
You need to patch QEMU:

https://git.centos.org/blob/rpms!!qemu-kvm.git/34b32196890e2c41b0aee042e600ba422f29db17/SOURCES!kvm-seabios-paravirt-allow-more-than-1TB-in-x86-guest.patch
https://git.centos.org/blob/rpms!!qemu-kvm.git/34b32196890e2c41b0aee042e600ba422f29db17/SOURCES!kvm-fix-guest-physical-bits-to-match-host-to-go-beyond-1.patch

And SeaBIOS:

https://git.centos.org/blob/rpms!!seabios.git/62d8d852f4675e4ab4bc3dd339050d26d397c251/SOURCES!0002-allow-1TB-of-RAM.patch

Thanks for the patches. I've applied them, and instances are able to start now, but if the workload is raised (memtester with +1TB and building kernel with 30 processes in parallel), both the VM and the hypervisor freeze:

2017-12-06T15:29:59.988085+00:00 dl580-r2-1 kernel: [ 1531.823823] NMI watchdog: BUG: soft lockup - CPU#58 stuck for 22s! [qemu-system-x86:22992] 2017-12-06T15:29:59.988099+00:00 dl580-r2-1 kernel: [ 1531.823826] Modules linked in: vhost_net vhost macvtap macvlan ebtable_filter ebtables vport_vxlan vxlan ip6_udp_tunnel udp_tunnel openvswitch nf_nat_ipv6 nf_nat_ipv4 nf_nat 8021q garp mrp bridge stp llc bonding ip6table_filter ip6_tables xt_CT iptable_raw xt_comment xt_multiport xt_conntrack iptable_filter ip_tables x_tables xfs intel_rapl sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc ipmi_ssif aesni_intel aes_x86_64 crypto_simd glue_helper cryptd intel_cstate intel_rapl_perf hpilo lpc_ich ioatdma dca ipmi_si ipmi_devintf shpchp ipmi_msghandler acpi_power_meter mac_hid kvm_intel kvm irqbypass ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nf_conntrack_proto_gre nf_conntrack_ipv6 2017-12-06T15:29:59.988102+00:00 dl580-r2-1 kernel: [ 1531.823864] nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid1 raid0 multipath linear i2c_algo_bit ttm drm_kms_helper bnx2x syscopyarea sysfillrect sysimgblt fb_sys_fops drm ptp hpsa pps_core mdio scsi_transport_sas libcrc32c wmi fjes scsi_dh_emc scsi_dh_rdac scsi_dh_alua dm_multipath 2017-12-06T15:29:59.988104+00:00 dl580-r2-1 kernel: [ 1531.823888] CPU: 58 PID: 22992 Comm: qemu-system-x86 Tainted: G             L 4.10.0-40-generic #44~16.04.1-Ubuntu 2017-12-06T15:29:59.988105+00:00 dl580-r2-1 kernel: [ 1531.823890] Hardware name: HP ProLiant DL580 Gen9/ProLiant DL580 Gen9, BIOS U17 09/12/2016 2017-12-06T15:29:59.988107+00:00 dl580-r2-1 kernel: [ 1531.823891] task: ffff948c05f50000 task.stack: ffffac622683c000 2017-12-06T15:29:59.988108+00:00 dl580-r2-1 kernel: [ 1531.823895] RIP: 0010:native_queued_spin_lock_slowpath+0x118/0x1a0 2017-12-06T15:29:59.988109+00:00 dl580-r2-1 kernel: [ 1531.823898] RSP: 0018:ffffac622683fcb0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10 2017-12-06T15:29:59.988111+00:00 dl580-r2-1 kernel: [ 1531.823900] RAX: 0000000000000000 RBX: 00000002520d9d00 RCX: ffff930c7fa19f00 2017-12-06T15:29:59.988112+00:00 dl580-r2-1 kernel: [ 1531.823901] RDX: ffff93cc7fad9f00 RSI: 0000000001300101 RDI: ffff948c0b388000 2017-12-06T15:29:59.988114+00:00 dl580-r2-1 kernel: [ 1531.823903] RBP: ffffac622683fcb0 R08: 0000000000ec0000 R09: 0000000000000000 2017-12-06T15:29:59.988149+00:00 dl580-r2-1 kernel: [ 1531.823904] R10: 00000000ffffffff R11: 0000000000000000 R12: ffff930c0a7b8000 2017-12-06T15:29:59.988151+00:00 dl580-r2-1 kernel: [ 1531.823905] R13: ffffac622683fcd0 R14: 0000000000000001 R15: ffff930c0213d500 2017-12-06T15:29:59.988152+00:00 dl580-r2-1 kernel: [ 1531.823906] FS:  00007f8fff7fe700(0000) GS:ffff930c7fa00000(0000) knlGS:0000000000000000 2017-12-06T15:29:59.988154+00:00 dl580-r2-1 kernel: [ 1531.823907] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 2017-12-06T15:29:59.988155+00:00 dl580-r2-1 kernel: [ 1531.823909] CR2: 00007fcc46332ab0 CR3: 0000023d1851f000 CR4: 00000000003426e0 2017-12-06T15:29:59.988157+00:00 dl580-r2-1 kernel: [ 1531.823909] Call Trace: 2017-12-06T15:29:59.988158+00:00 dl580-r2-1 kernel: [ 1531.823913] _raw_spin_lock+0x20/0x30 2017-12-06T15:29:59.988160+00:00 dl580-r2-1 kernel: [ 1531.823933] mmu_free_roots+0x11c/0x170 [kvm] 2017-12-06T15:29:59.988161+00:00 dl580-r2-1 kernel: [ 1531.823949] kvm_mmu_unload+0x12/0x40 [kvm] 2017-12-06T15:29:59.988162+00:00 dl580-r2-1 kernel: [ 1531.823965] vcpu_enter_guest+0x42a/0x11b0 [kvm] 2017-12-06T15:29:59.988164+00:00 dl580-r2-1 kernel: [ 1531.823971] ? vmx_sync_pir_to_irr+0x29/0x30 [kvm_intel] 2017-12-06T15:29:59.988165+00:00 dl580-r2-1 kernel: [ 1531.823989] ? kvm_apic_has_interrupt+0x98/0xc0 [kvm] 2017-12-06T15:29:59.988167+00:00 dl580-r2-1 kernel: [ 1531.824006] kvm_arch_vcpu_ioctl_run+0xc8/0x3e0 [kvm] 2017-12-06T15:29:59.988168+00:00 dl580-r2-1 kernel: [ 1531.824021] kvm_vcpu_ioctl+0x33a/0x600 [kvm] 2017-12-06T15:29:59.988170+00:00 dl580-r2-1 kernel: [ 1531.824023] ? do_futex+0x1fb/0x540 2017-12-06T15:29:59.988171+00:00 dl580-r2-1 kernel: [ 1531.824026] do_vfs_ioctl+0xa1/0x5f0 2017-12-06T15:29:59.988173+00:00 dl580-r2-1 kernel: [ 1531.824043] ? kvm_on_user_return+0x66/0xa0 [kvm] 2017-12-06T15:29:59.988174+00:00 dl580-r2-1 kernel: [ 1531.824046] SyS_ioctl+0x79/0x90 2017-12-06T15:29:59.988176+00:00 dl580-r2-1 kernel: [ 1531.824050] entry_SYSCALL_64_fastpath+0x1e/0xad 2017-12-06T15:29:59.988177+00:00 dl580-r2-1 kernel: [ 1531.824051] RIP: 0033:0x7f916c767f07 2017-12-06T15:29:59.988178+00:00 dl580-r2-1 kernel: [ 1531.824052] RSP: 002b:00007f8fff7fd938 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 2017-12-06T15:29:59.988180+00:00 dl580-r2-1 kernel: [ 1531.824054] RAX: ffffffffffffffda RBX: 00007f914c034001 RCX: 00007f916c767f07 2017-12-06T15:29:59.988181+00:00 dl580-r2-1 kernel: [ 1531.824055] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 000000000000005f 2017-12-06T15:29:59.988183+00:00 dl580-r2-1 kernel: [ 1531.824056] RBP: 0000000000000001 R08: 000055a6393346b0 R09: 00000000000000ff 2017-12-06T15:29:59.988184+00:00 dl580-r2-1 kernel: [ 1531.824057] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000 2017-12-06T15:29:59.988186+00:00 dl580-r2-1 kernel: [ 1531.824058] R13: 000055a63931f2c0 R14: 00007f914c033000 R15: 000055a63ba27ee0 2017-12-06T15:29:59.988187+00:00 dl580-r2-1 kernel: [ 1531.824059] Code: 12 48 c1 ea 0c 83 e8 01 83 e2 30 48 98 48 81 c2 00 9f 01 00 48 03 14 c5 e0 83 14 9e 48 89 0a 8b 41 08 85 c0 75 09 f3 90 8b 41 08 <85> c0 74 f7 4c 8b 09 4d 85 c9 74 08 41 0f 0d 09 eb 02 f3 90 8b


The hypervisor was running kernel linux-image-4.10.0-40-generic in this test; the stock xenial 4.4.X kernels also show a similar behavior with a similar trace. The VM in question is using the current xenial cloud image.

Any hints on this, too?

Best regards,
Burkhard Linke

--
Dr. rer. nat. Burkhard Linke
Bioinformatics and Systems Biology
Justus-Liebig-University Giessen
35392 Giessen, Germany
Phone: (+49) (0)641 9935810




reply via email to

[Prev in Thread] Current Thread [Next in Thread]