qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 6/6] gitlab-ci.d/buildtest: Disintegrate the build-coroutine-


From: Thomas Huth
Subject: Re: [PATCH 6/6] gitlab-ci.d/buildtest: Disintegrate the build-coroutine-sigaltstack job
Date: Mon, 6 Feb 2023 08:44:29 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.13.0

On 03/02/2023 22.14, Juan Quintela wrote:
Peter Maydell <peter.maydell@linaro.org> wrote:
On Fri, 3 Feb 2023 at 15:44, Thomas Huth <thuth@redhat.com> wrote:

On 03/02/2023 13.08, Kevin Wolf wrote:
Am 03.02.2023 um 12:23 hat Thomas Huth geschrieben:
On 30/01/2023 11.58, Daniel P. Berrangé wrote:
On Mon, Jan 30, 2023 at 11:44:46AM +0100, Thomas Huth wrote:
We can get rid of the build-coroutine-sigaltstack job by moving
the configure flags that should be tested here to other jobs:
Move --with-coroutine=sigaltstack to the build-without-defaults job
and --enable-trace-backends=ftrace to the cross-s390x-kvm-only job.

The biggest user of coroutines is the block layer. So we probably
ought to have coroutines aligned with a job that triggers the
'make check-block' for iotests.  IIUC,  the without-defaults
job won't do that. How about, arbitrarily, using either the
'check-system-debian' or 'check-system-ubuntu' job. Those distros
are closely related, so getting sigaltstack vs ucontext coverage
between them is a good win, and they both trigger the block jobs
IIUC.

I gave it a try with the ubuntu job, but this apparently trips up the iotests:

   https://gitlab.com/thuth/qemu/-/jobs/3705965062#L212

Does anybody have a clue what could be going wrong here?

I'm not sure how changing the coroutine backend could cause it, but
primarily this looks like an assertion failure in migration code.

Dave, Juan, any ideas what this assertion checks and why it could be
failing?

Ah, I think it's the bug that will be fixed by:

   20230202160640.2300-2-quintela@redhat.com/">https://lore.kernel.org/qemu-devel/20230202160640.2300-2-quintela@redhat.com/

The fix hasn't hit the master branch yet (I think), and I had another patch
in my CI that disables the aarch64 binary in that runner, so the iotests
suddenly have been executed with the alpha binary there --> migration fails.

So never mind, it will be fixed as soon as Juan's pull request gets included.

The migration tests have been flaky for a while now,
including setups where host and guest page sizes are the same.
(For instance, my x86 macos box pretty reliably sees failures
when the machine is under load.)

I *thought* that we had fixed all of those.

But it is difficult for me to know because:
- I only happens when one runs "make check"
- running ./migration-test have never failed to me
- When it fails (and it has been a while since it has failed to me)
   it is impossible to me to detect what is going on, and as said, I have
   never been able to reproduce running only migration-test.

I will try to run several at the same time and see if it happens.

And as Thomas said, I *think* that the fix that Peter Xu posted should
fix this issue.  Famous last words.

The patch from Peter should fix my problems that I triggered via the iotests - but the migration-qtest is still unstable independent from that issue, I think. See for example the latest staging pipeline:

 https://gitlab.com/qemu-project/qemu/-/pipelines/767961842

The migration qtest failed in both, the x86-freebsd-build and the ubuntu-20.04-s390x-all pipelin.

 Thomas





reply via email to

[Prev in Thread] Current Thread [Next in Thread]