[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PULL 10/66] tests/qtest: enable tests for virtio-scmi
From: |
Milan Zamazal |
Subject: |
Re: [PULL 10/66] tests/qtest: enable tests for virtio-scmi |
Date: |
Thu, 20 Jul 2023 10:34:47 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux) |
Fabiano Rosas <farosas@suse.de> writes:
> Thomas Huth <thuth@redhat.com> writes:
>
>> On 18/07/2023 14.55, Milan Zamazal wrote:
>>> Thomas Huth <thuth@redhat.com> writes:
>>>
>>>> On 11/07/2023 01.02, Michael S. Tsirkin wrote:
>>>>> From: Milan Zamazal <mzamazal@redhat.com>
>>>>> We don't have a virtio-scmi implementation in QEMU and only support
>>>>
>>>>> a
>>>>> vhost-user backend. This is very similar to virtio-gpio and we add the
>>>>> same
>>>>> set of tests, just passing some vhost-user messages over the control
>>>>> socket.
>>>>> Signed-off-by: Milan Zamazal <mzamazal@redhat.com>
>>>>> Acked-by: Thomas Huth <thuth@redhat.com>
>>>>> Message-Id: <20230628100524.342666-4-mzamazal@redhat.com>
>>>>> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
>>>>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>>>>> ---
>>>>> tests/qtest/libqos/virtio-scmi.h | 34 ++++++
>>>>> tests/qtest/libqos/virtio-scmi.c | 174 +++++++++++++++++++++++++++++++
>>>>> tests/qtest/vhost-user-test.c | 44 ++++++++
>>>>> MAINTAINERS | 1 +
>>>>> tests/qtest/libqos/meson.build | 1 +
>>>>> 5 files changed, 254 insertions(+)
>>>>> create mode 100644 tests/qtest/libqos/virtio-scmi.h
>>>>> create mode 100644 tests/qtest/libqos/virtio-scmi.c
>>>>
>>>> Hi!
>>>>
>>>> I'm seeing some random failures with this new scmi test, so far only
>>>> on non-x86 systems, e.g.:
>>>>
>>>> https://app.travis-ci.com/github/huth/qemu/jobs/606246131#L4774
>>>>
>>>> It also reproduces on a s390x host here, but only if I run "make check
>>>> -j$(nproc)" - if I run the tests single-threaded, the qos-test passes
>>>> there. Seems like there is a race somewhere in this test?
>>>
>>> Hmm, it's basically the same as virtio-gpio.c test, so it should be OK.
>>> Is it possible that the two tests (virtio-gpio.c & virtio-scmi.c)
>>> interfere with each other in some way? Is there possibly a way to
>>> serialize them to check?
>>
>> I think within one qos-test, the sub-tests are already run
>> serialized. But there might be multiple qos-tests running in
>> parallel, e.g. one for the aarch64 target and one for the ppc64
>> target. And indeed, I can reproduce the problem on my x86 laptop by
>> running this in one terminal window:
>>
>> for ((x=0;x<1000;x++)); do \
>> QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon \
>> G_TEST_DBUS_DAEMON=.tests/dbus-vmstate-daemon.sh \
>> QTEST_QEMU_BINARY=./qemu-system-ppc64 \
>> MALLOC_PERTURB_=188 QTEST_QEMU_IMG=./qemu-img \
>> tests/qtest/qos-test -p \
>>
>> /ppc64/pseries/spapr-pci-host-bridge/pci-bus-spapr/pci-bus/vhost-user-scmi-pci/vhost-user-scmi/vhost-user-scmi-tests/scmi/read-guest-mem/memfile
>> \
>> || break ; \
>> done
>>
>> And this in another terminal window at the same time:
>>
>> for ((x=0;x<1000;x++)); do \
>> QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon \
>> G_TEST_DBUS_DAEMON=.tests/dbus-vmstate-daemon.sh \
>> QTEST_QEMU_BINARY=./qemu-system-aarch64 \
>> MALLOC_PERTURB_=188 QTEST_QEMU_IMG=./qemu-img \
>> tests/qtest/qos-test -p \
>>
>> /aarch64/virt/generic-pcihost/pci-bus-generic/pci-bus/vhost-user-scmi-pci/vhost-user-scmi/vhost-user-scmi-tests/scmi/read-guest-mem/memfile
>> \
>> || break ; \
>> done
>>
>> After a while, the aarch64 test broke with:
>>
>> /aarch64/virt/generic-pcihost/pci-bus-generic/pci-bus/vhost-user-scmi-pci/vhost-user-scmi/vhost-user-scmi-tests/scmi/read-guest-mem/memfile:
>> qemu-system-aarch64: Failed to set msg fds.
>> qemu-system-aarch64: Failed to set msg fds.
>> qemu-system-aarch64: vhost VQ 0 ring restore failed: -22: Invalid argument
>> (22)
>> qemu-system-aarch64: Failed to set msg fds.
>> qemu-system-aarch64: vhost VQ 1 ring restore failed: -22: Invalid argument
>> (22)
>> qemu-system-aarch64: Failed to set msg fds.
>> qemu-system-aarch64: vhost_set_vring_call failed 22
>> qemu-system-aarch64: Failed to set msg fds.
>> qemu-system-aarch64: vhost_set_vring_call failed 22
>> qemu-system-aarch64: Failed to write msg. Wrote -1 instead of 20.
>> qemu-system-aarch64: Failed to set msg fds.
>> qemu-system-aarch64: vhost VQ 0 ring restore failed: -22: Invalid argument
>> (22)
>> qemu-system-aarch64: Failed to set msg fds.
>> qemu-system-aarch64: vhost VQ 1 ring restore failed: -22: Invalid argument
>> (22)
>> qemu-system-aarch64: ../../devel/qemu/hw/pci/msix.c:659:
>> msix_unset_vector_notifiers: Assertion
>> `dev->msix_vector_use_notifier && dev->msix_vector_release_notifier'
>> failed.
>> ../../devel/qemu/tests/qtest/libqtest.c:200: kill_qemu() detected
>> QEMU death from signal 6 (Aborted) (core dumped)
>
> If it helps,
It helps a lot, thank you!
> it looks like msix_unset_vector_notifiers is being called twice, once
> from vu_scmi_set_status() and another from vu_scmi_disconnect():
Interesting. Usually, vu_scmi_stop is called only once, which explains
why the test regularly passes. Both the vu_scmi_stop callers have a
check protecting from duplicate vu_scmi_stop calls but it's perhaps not
fully reliable. I can see vhost-user-gpio has an extra protection.
I'll post a patch adding a similar thing, hopefully it will fix the
problem.
> msix_unset_vector_notifiers
> virtio_pci_set_guest_notifiers
> vu_scmi_stop
> vu_scmi_disconnect <-
> vu_scmi_event
> chr_be_event
> qemu_chr_be_event
> tcp_chr_disconnect_locked
> tcp_chr_write
> qemu_chr_write_buffer
>
> msix_unset_vector_notifiers
> virtio_pci_set_guest_notifiers
> vu_scmi_stop
> vu_scmi_set_status <-
> virtio_set_status
> virtio_vmstate_change
> vm_state_notify
> do_vm_stop
> vm_shutdown
> qemu_cleanup
- [PULL 07/66] vhost-user-gpu: implement get_edid frontend feature, (continued)
[PULL 11/66] machine: Add helpers to get cores/threads per socket, Michael S. Tsirkin, 2023/07/10
[PULL 12/66] hw/smbios: Fix smbios_smp_sockets caculation, Michael S. Tsirkin, 2023/07/10
[PULL 14/66] hw/smbios: Fix core count in type4, Michael S. Tsirkin, 2023/07/10
[PULL 13/66] hw/smbios: Fix thread count in type4, Michael S. Tsirkin, 2023/07/10
[PULL 15/66] vhost-user: Change one_time to per_device request, Michael S. Tsirkin, 2023/07/10
[PULL 16/66] vhost-user: Make RESET_DEVICE a per device message, Michael S. Tsirkin, 2023/07/10
[PULL 17/66] hw/i386/pc_q35: Resolve redundant q35_host variable, Michael S. Tsirkin, 2023/07/10
[PULL 18/66] hw/pci-host/q35: Fix double, contradicting .endianness assignment, Michael S. Tsirkin, 2023/07/10
[PULL 19/66] hw/pci-host/q35: Initialize PCMachineState::bus in board code, Michael S. Tsirkin, 2023/07/10
[PULL 21/66] hw/pci-host/q35: Initialize PCI_HOST_BYPASS_IOMMU property from board code, Michael S. Tsirkin, 2023/07/10