[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: qemu-system-aarch64: Failed to retrieve host CPU features
From: |
Marc Zyngier |
Subject: |
Re: qemu-system-aarch64: Failed to retrieve host CPU features |
Date: |
Sat, 13 Aug 2022 14:32:19 +0100 |
User-agent: |
Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (Gojō) APEL-LB/10.8 EasyPG/1.0.0 Emacs/27.1 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) |
On Sat, 13 Aug 2022 12:11:37 +0100,
Vitaly Chikunov <vt@altlinux.org> wrote:
>
> Marc,
>
> On Fri, Aug 12, 2022 at 04:02:37PM +0100, Marc Zyngier wrote:
> > On Fri, 12 Aug 2022 10:25:55 +0100,
> > Peter Maydell <peter.maydell@linaro.org> wrote:
> > >
> > > I've added some more relevant mailing lists to the cc.
> > >
> > > On Fri, 12 Aug 2022 at 09:45, Vitaly Chikunov <vt@altlinux.org> wrote:
> > > > On Fri, Aug 12, 2022 at 05:14:27AM +0300, Vitaly Chikunov wrote:
> > > > > I noticed that we starting to get many errors like this:
> > > > >
> > > > > qemu-system-aarch64: Failed to retrieve host CPU features
> > > > >
> > > > > Where many is 1-2% per run, depends on host, host is Kunpeng-920, and
> > > > > Linux kernel is v5.15.59, but it started to appear months before that.
> > > > >
> > > > > strace shows in erroneous case:
> > > > >
> > > > > 1152244 ioctl(9, KVM_CREATE_VM, 0x30) = -1 EINTR (Interrupted
> > > > > system call)
> > > > >
> > > > > And I see in target/arm/kvm.c:kvm_arm_create_scratch_host_vcpu:
> > > > >
> > > > > vmfd = ioctl(kvmfd, KVM_CREATE_VM, max_vm_pa_size);
> > > > > if (vmfd < 0) {
> > > > > goto err;
> > > > > }
> > > > >
> > > > > Maybe it should restart ioctl on EINTR?
> > > > >
> > > > > I don't see EINTR documented in ioctl(2) nor in Linux'
> > > > > Documentation/virt/kvm/api.rst for KVM_CREATE_VM, but for KVM_RUN it
> > > > > says "an unmasked signal is pending".
> > > >
> > > > I am suggested that almost any blocking syscall could return EINTR, so I
> > > > checked the strace log and it does not show evidence of arriving a
> > > > signal,
> > > > the log ends like this:
> > > >
> > > > 1152244 openat(AT_FDCWD, "/dev/kvm", O_RDWR|O_CLOEXEC) = 9
> > > > 1152244 ioctl(9, KVM_CHECK_EXTENSION, KVM_CAP_ARM_VM_IPA_SIZE) = 48
> > > > 1152244 ioctl(9, KVM_CREATE_VM, 0x30) = -1 EINTR (Interrupted
> > > > system call)
> > > > 1152244 close(9) = 0
> > > > 1152244 newfstatat(2, "", {st_dev=makedev(0, 0xd), st_ino=57869925,
> > > > st_mode=S_IFIFO|0600, st_nlink=1, st_uid=517, st_gid=517,
> > > > st_blksize=4096, st_blocks=0, st_size=0, st_atime=1660268019 /*
> > > > 2022-08-12T01:33:39.850436293+0000 */, st_atime_nsec=850436293,
> > > > st_mtime=1660268019 /* 2022-08-12T01:33:39.850436293+0000 */,
> > > > st_mtime_nsec=850436293, st_ctime=1660268019 /*
> > > > 2022-08-12T01:33:39.850436293+0000 */, st_ctime_nsec=850436293},
> > > > AT_EMPTY_PATH) = 0
> > > > 1152244 write(2, "qemu-system-aarch64: Failed to r"..., 58) = 58
> > > > 1152244 exit_group(1) = ?
> > > > 1152245 <... clock_nanosleep resumed> <unfinished ...>) = ?
> > > > 1152245 +++ exited with 1 +++
> > > > 1152244 +++ exited with 1 +++
> > >
> > > KVM folks: should we expect that KVM_CREATE_VM might fail EINTR
> > > and need retrying?
> >
> > In general, yes. But for this particular one, this is pretty odd.
> >
> > The only path I can so far see that would match this behaviour is if
> > mm_take_all_locks() (called from __mmu_notifier_register()) was
> > getting interrupted by a signal (I'm looking at a 5.19-ish kernel,
> > which may slightly differ from the 5.15 mentioned above).
> >
> > But as Vitaly points out, it doesn't seem to be a signal delivered
> > here.
> >
> > Vitaly: could you please share your exact test case (full qemu command
> > line), and instrument your kernel to see if mm_take_all_locks() is the
> > one failing?
>
> Full command is `qemu-system-aarch64 -M accel=kvm:tcg -m 4096M -smp
> cores=8 -nodefaults -nographic -no-reboot -fsdev
> local,id=root,path=/,security_model=none,multidevs=remap -device
> virtio-9p-pci,fsdev=root,mount_tag=/dev/root -device virtio-rng-pci
> -serial mon:stdio -kernel /boot/vmlinuz-5.18.16-un-def-alt1 -initrd
> /usr/src/tmp/initramfs-5.18.16-un-def-alt1.img -sandbox on,spawn=deny -M
> virt,gic-version=3 -cpu max -append 'console=ttyAMA0 mitigations=off
> nokaslr quiet panic=-1 SCRIPT=/usr/src/tmp/tmp.458pkF5r8d'`.
>
> But a minified reproducer is `qemu-system-aarch64 -M virt,accel=kvm -cpu max
> -kernel qwe`.
Ah, couldn't be smaller! ;-) Thanks for that!
>
> I traced with kprobe event tracer with the probes:
>
> echo 'p kvm_dev_ioctl' > kprobe_events
> echo 'r kvm_dev_ioctl $retval' >> kprobe_events
> echo 'p mm_take_all_locks' >> kprobe_events
> echo 'r mm_take_all_locks $retval' >> kprobe_events
>
> Then run reproducer in the loop
>
> for ((i=0;i<100;i++));do echo $i; strace -fvo bb.$i qemu-system-aarch64 -M
> virt,accel=kvm -cpu max -kernel qwe 2>&1 | grep Failed && break; done
>
> And on 97th iteration it's failed, strace log to check that PID is the same
> with
> the following trace:
>
> 3611463 ioctl(9, KVM_CREATE_VM, 0x30) = -1 EINTR (Interrupted system call)
>
> Event trace log:
>
> qemu-system-aar-3611463 [002] d.... 342920.535549: p_kvm_dev_ioctl_0:
> (kvm_dev_ioctl+0x0/0x910)
> qemu-system-aar-3611463 [002] d.... 342920.535635: p_mm_take_all_locks_0:
> (mm_take_all_locks+0x0/0x390)
> qemu-system-aar-3611463 [002] ..... 342920.535657: r_mm_take_all_locks_0:
> (__mmu_notifier_register+0x44/0x200 <- mm_take_all_locks) arg1=0xfffffffc
> qemu-system-aar-3611463 [002] ..... 342920.535695: r_kvm_dev_ioctl_0:
> (__arm64_sys_ioctl+0xcc/0x124 <- kvm_dev_ioctl) arg1=0xfffffffffffffffc
>
> So that's mm_take_all_locks returns EINTR.
>
> (I tried also to make C reproducer which opens /dev/kvm and call
> `ioctl(kvm, KVM_CREATE_VM, 0)` in the loop and it does not reproduce the
> problem, so it seems it's also something additional that QEMU does.)
On my fast test box, it takes several hundreds of iterations to fire,
but that's not completely impossible to trigger (strace actually helps
here). I was expecting it to fire more on a slow one, but that's not
the case.
And indeed, no signal seems to be delivered. I suspect that
FIT_NOTIFY_SIGNAL is set, but I need to investigate more. With a
reproducer, it should be easier to diagnose fully.
But we probably need to handle EINTR when creating the mini VM.
M.
--
Without deviation from the norm, progress is not possible.