[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: acceptance-system-fedora failures
From: |
Philippe Mathieu-Daudé |
Subject: |
Re: acceptance-system-fedora failures |
Date: |
Wed, 7 Oct 2020 11:57:55 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.11.0 |
On 10/7/20 10:51 AM, Pavel Dovgalyuk wrote:
> On 07.10.2020 11:23, Thomas Huth wrote:
>> On 07/10/2020 09.13, Philippe Mathieu-Daudé wrote:
>>> On 10/7/20 7:20 AM, Philippe Mathieu-Daudé wrote:
>>>> On 10/7/20 1:07 AM, John Snow wrote:
>>>>> I'm seeing this gitlab test fail quite often in my Python work; I
>>>>> don't
>>>>> *think* this has anything to do with my patches, but maybe I need
>>>>> to try
>>>>> and bisect this more aggressively.
>> [...]
>>>> w.r.t. the error in your build, I told Thomas about the
>>>> test_ppc_mac99/day15/invaders.elf timeouting but he said this is
>>>> not his area. Richard has been looking yesterday to see if it is
>>>> a TCG regression, and said the test either finished/crashed raising
>>>> SIGCHLD, but Avocado parent is still waiting for a timeout, so the
>>>> children become zombie and the test hang.
>>>
>>> Expected output:
>>>
>>> Quiescing Open Firmware ...
>>> Booting Linux via __start() @ 0x01000000 ...
>>>
>>> But QEMU exits in replay_char_write_event_load():
>>>
>>> Quiescing Open Firmware ...
>>> qemu-system-ppc: Missing character write event in the replay log
>>> $ echo $?
>>> 1
>>>
>>> Latest events are CHECKPOINT CHECKPOINT INTERRUPT INTERRUPT INTERRUPT.
>>>
>>> Replay file is ~22MiB. End of record using "system_powerdown + quit"
>>> in HMP.
>>>
>>> I guess we have 2 bugs:
>>> - replay log
>>> - avocado doesn't catch children exit(1)
>>>
>>> Quick reproducer:
>>>
>>> $ make qemu-system-ppc check-venv
>>> $ tests/venv/bin/python -m \
>>> avocado --show=app,console,replay \
>>> run --job-timeout 300 -t machine:mac99 \
>>> tests/acceptance/replay_kernel.py
>>
>> Thanks, that was helpful. ... and the winner is:
>>
>> commit 55adb3c45620c31f29978f209e2a44a08d34e2da
>> Author: John Snow <jsnow@redhat.com>
>> Date: Fri Jul 24 01:23:00 2020 -0400
>> Subject: ide: cancel pending callbacks on SRST
>>
>> ... starting with this commit, the tests starts failing. John, any
>> idea what
>> might be causing this?
>
> This patch includes the following lines:
>
> + aio_bh_schedule_oneshot(qemu_get_aio_context(),
> + ide_bus_perform_srst, bus);
>
> replay_bh_schedule_oneshot_event should be used instead of this
> function, because it synchronizes non-deterministic BHs.
Why do we have 2 different functions? BH are already complex
enough, and we need to also think about the replay API...
What about the other cases such vhost-user (blk/net), virtio-blk?
>
>
>>
>> Thomas
>>
>
- acceptance-system-fedora failures, John Snow, 2020/10/06
- Re: acceptance-system-fedora failures, Philippe Mathieu-Daudé, 2020/10/07
- Re: acceptance-system-fedora failures, Philippe Mathieu-Daudé, 2020/10/07
- Re: acceptance-system-fedora failures, Thomas Huth, 2020/10/07
- Re: acceptance-system-fedora failures, Pavel Dovgalyuk, 2020/10/07
- Re: acceptance-system-fedora failures,
Philippe Mathieu-Daudé <=
- Re: acceptance-system-fedora failures, Alex Bennée, 2020/10/07
- Re: acceptance-system-fedora failures, Pavel Dovgalyuk, 2020/10/07
- Re: acceptance-system-fedora failures, Philippe Mathieu-Daudé, 2020/10/07
- Re: acceptance-system-fedora failures, Pavel Dovgalyuk, 2020/10/07
- Re: acceptance-system-fedora failures, Philippe Mathieu-Daudé, 2020/10/08
- Re: acceptance-system-fedora failures, Kevin Wolf, 2020/10/08
- Re: acceptance-system-fedora failures, Pavel Dovgalyuk, 2020/10/09
- Re: acceptance-system-fedora failures, Philippe Mathieu-Daudé, 2020/10/13
- Re: acceptance-system-fedora failures, Pavel Dovgalyuk, 2020/10/07
- Re: acceptance-system-fedora failures, John Snow, 2020/10/07