qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: tests/vm infrastructure fails to notice that QEMU dying is a failure


From: John Snow
Subject: Re: tests/vm infrastructure fails to notice that QEMU dying is a failure
Date: Fri, 17 Jul 2020 18:24:18 -0400
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0

On 7/17/20 9:25 AM, Philippe Mathieu-Daudé wrote:
On 7/17/20 3:22 PM, Philippe Mathieu-Daudé wrote:
Cc'ing John & Cleber.

On 7/17/20 3:08 PM, Peter Maydell wrote:
If you run vm-build-openbsd, our makefile/scripting infrastructure
seems to fail to notice that death of the QEMU process that's
running the VM should be a failure, and ends up allowing make
to return a success condition.

I have a script which runs a VM build which basically does:

   #!/bin/sh -e
   make -C "build" "vm-build-openbsd" J=8 V=1
   echo "OK DONE openbsd"

It just gave me this output (tail end of logfile). We're executing tests,
and then the qemu-system-x86_64 that's running the OpenBSD VM gets
a signal 9 (sigkill), for unclear reasons (oom killer??). The python
scripting gets an exception, but doesn't exit with a failure status to
make, which then thinks all is fine, exits success itself and allows
the set -e script to proceed to print the "OK DONE" line...

The signal 9 is likely due to QEMU freezing on exit. The machine.py code waits about 3 seconds after trying to shut QEMU down gracefully before sending SIGKILL and cleaning up forcefully.



PASS 30 qos-test /arm/imx25-pdk/imx.i2c/i2c-bus/pca9552/pca9552-tests/tx-rx
PASS 31 qos-test /arm/imx25-pdk/imx.i2c/i2c-bus/pca9552/pca9552-tests/rx-autoinc
PASS 32 qos-test /arm/imx25-pdk/imx.i2c/i2c-bus/ds1338/ds1338-tests/tx-rx
DEBUG:QMP.qemu-26462:>>> {'execute': 'quit'}
DEBUG:QMP.qemu-26462:<<< {'timestamp': {'seconds': 1594984057,
'microseconds': 485197}, 'event': 'NIC_RX_FILTER_CHANGED', 'data':
{'path': '/machine/peripheral-anon/device[0]/virtio-backend'}}
DEBUG:QMP.qemu-26462:<<< {'timestamp': {'seconds': 1594985855,
'microseconds': 169552}, 'event': 'RTC_CHANGE', 'data': {'offset': 0}}
DEBUG:QMP.qemu-26462:<<< {'timestamp': {'seconds': 1594987655,
'microseconds': 169187}, 'event': 'RTC_CHANGE', 'data': {'offset': 0}}
DEBUG:QMP.qemu-26462:<<< {'timestamp': {'seconds': 1594989456,
'microseconds': 88866}, 'event': 'RTC_CHANGE', 'data': {'offset': 0}}
DEBUG:QMP.qemu-26462:<<< {'return': {}}
WARNING:qemu.machine:qemu received signal 9; command:
"qemu-system-x86_64 -display none -vga none -chardev
socket,id=mon,path=/var/tmp/qemu-26462-monitor.sock -mon
chardev=mon,mode=control -machine pc -chardev
socket,id=console,path=/var/tmp/qemu-26462-console.sock,server,nowait
-serial chardev:console -nodefaults -m 4G -cpu max -netdev
user,id=vnet,hostfwd=:127.0.0.1:0-:22 -device
virtio-net-pci,netdev=vnet -vnc 127.0.0.1:0,to=20 -smp 8 -enable-kvm
-drive 
file=/home/peter.maydell/.cache/qemu-vm/images/openbsd.img,snapshot=on,if=none,id=drive0,cache=writeback
-device virtio-blk,drive=drive0,bootindex=0 -drive
file=/home/peter.maydell/qemu-openbsd/build/vm-test-yzwn6xdc.tmp/data-993a1.tar,if=none,id=data-993a1,cache=writeback,format=raw
-device virtio-blk,drive=data-993a1,serial=data-993a1,bootindex=1"
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
   File 
"/home/peter.maydell/qemu-openbsd/tests/vm/../../python/qemu/machine.py",
line 436, in _do_shutdown
     self._soft_shutdown(has_quit, timeout)
   File 
"/home/peter.maydell/qemu-openbsd/tests/vm/../../python/qemu/machine.py",
line 419, in _soft_shutdown
     self._popen.wait(timeout=timeout)
   File "/usr/lib/python3.6/subprocess.py", line 1469, in wait
     raise TimeoutExpired(self.args, timeout)

This is where it times out waiting for the process to exit of its own volition. This is the "Inner" exception ...

subprocess.TimeoutExpired: Command '['qemu-system-x86_64', '-display',
'none', '-vga', 'none', '-chardev',
'socket,id=mon,path=/var/tmp/qemu-26462-monitor.sock', '-mon',
'chardev=mon,mode=control', '-machine', 'pc', '-chardev',
'socket,id=console,path=/var/tmp/qemu-26462-console.sock,server,nowait',
'-serial', 'chardev:console', '-nodefaults', '-m', '4G', '-cpu',
'max', '-netdev', 'user,id=vnet,hostfwd=:127.0.0.1:0-:22', '-device',
'virtio-net-pci,netdev=vnet', '-vnc', '127.0.0.1:0,to=20', '-smp',
'8', '-enable-kvm', '-drive',
'file=/home/peter.maydell/.cache/qemu-vm/images/openbsd.img,snapshot=on,if=none,id=drive0,cache=writeback',
'-device', 'virtio-blk,drive=drive0,bootindex=0', '-drive',
'file=/home/peter.maydell/qemu-openbsd/build/vm-test-yzwn6xdc.tmp/data-993a1.tar,if=none,id=data-993a1,cache=writeback,format=raw',
'-device', 'virtio-blk,drive=data-993a1,serial=data-993a1,bootindex=1']'
timed out after 3 seconds

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
   File 
"/home/peter.maydell/qemu-openbsd/tests/vm/../../python/qemu/machine.py",
line 466, in shutdown
     self._do_shutdown(has_quit, timeout=timeout)
   File 
"/home/peter.maydell/qemu-openbsd/tests/vm/../../python/qemu/machine.py",
line 440, in _do_shutdown
     from exc
qemu.machine.AbnormalShutdown: Could not perform graceful shutdown

And this is the "Outer" exception, the machine.py cleanup handler letting you know that while it did perform cleanup, QEMU was not shut down properly.

make: Leaving directory '/home/peter.maydell/qemu-openbsd/build'
OK DONE openbsd

John it seems due to the addition of the 'finally' clause in
193bf1c061 ("python/machine.py: split shutdown into hard and
soft flavors").


That's surprising. I'll look into this Monday, but the "Finally: " should only perform some actions without suppressing the exception. You can see here it DIDN'T suppress it.

...but maybe we don't return non-zero if an exception gets raised during atexit()? Maybe that's what's happening here.

... ... Ah, guess what, that's exactly it. atexit suppresses exceptions. You just never noticed this problem before because machine.py used to suppress problems during shutdown instead of reporting them.

```
#!/usr/bin/env python3

import atexit
from qemu import machine

def handler():
    raise machine.AbnormalShutdown("oh no")

atexit.register(handler)
```

Oops, this returns a zero code.


So:

- basevm.py should likely be converted to a context manager object just like machine.py's QEMUMachine class was so cleanup can happen explicitly when the VM object goes out of scope.

(Callers would use: "with baseVM() as foo:" pattern, or, where that's not possible, call an explicit .shutdown().)

- The real problem, though: Why is QEMU hanging? It might need a longer timeout, or it might be having problems with the console socket again.

(CC Robert Foley who has been working on the Console socket draining problems. Maybe he has some insight here?)

--js




reply via email to

[Prev in Thread] Current Thread [Next in Thread]