Re: [PATCH 0/5] QEMU Gating CI

qemu-devel
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 0/5] QEMU Gating CI

From:	Cleber Rosa
Subject:	Re: [PATCH 0/5] QEMU Gating CI
Date:	Mon, 27 Apr 2020 01:12:10 -0400
On Thu, 23 Apr 2020 23:28:21 +0200
Philippe Mathieu-Daudé <address@hidden> wrote:

> On 4/23/20 7:13 PM, Daniel P. Berrangé wrote:
> > On Thu, Apr 23, 2020 at 01:04:13PM -0400, Cleber Rosa wrote:
> >> ----- Original Message -----
> >>> From: "Peter Maydell" <address@hidden>
> >>> To: "Markus Armbruster" <address@hidden>
> >>> Cc: "Fam Zheng" <address@hidden>, "Thomas Huth"
> >>> <address@hidden>, "Beraldo Leal" <address@hidden>, "Erik
> >>> Skultety" <address@hidden>, "Alex Bennée"
> >>> <address@hidden>, "Wainer Moschetta"
> >>> <address@hidden>, "QEMU Developers" <address@hidden>,
> >>> "Wainer dos Santos Moschetta" <address@hidden>, "Willian
> >>> Rampazzo" <address@hidden>, "Cleber Rosa"
> >>> <address@hidden>, "Philippe Mathieu-Daudé" <address@hidden>,
> >>> "Eduardo Habkost" <address@hidden> Sent: Tuesday, April 21,
> >>> 2020 8:53:49 AM Subject: Re: [PATCH 0/5] QEMU Gating CI
> >>>
> >>> On Thu, 19 Mar 2020 at 16:33, Markus Armbruster
> >>> <address@hidden> wrote:
> >>>> Peter Maydell <address@hidden> writes:
> >>>>> I think we should start by getting the gitlab setup working
> >>>>> for the basic "x86 configs" first. Then we can try adding
> >>>>> a runner for s390 (that one's logistically easiest because
> >>>>> it is a project machine, not one owned by me personally or
> >>>>> by Linaro) once the basic framework is working, and expand
> >>>>> from there.
> >>>>
> >>>> Makes sense to me.
> >>>>
> >>>> Next steps to get this off the ground:
> >>>>
> >>>> * Red Hat provides runner(s) for x86 stuff we care about.
> >>>>
> >>>> * If that doesn't cover 'basic "x86 configs" in your judgement,
> >>>> we fill the gaps as described below under "Expand from there".
> >>>>
> >>>> * Add an s390 runner using the project machine you mentioned.
> >>>>
> >>>> * Expand from there: identify the remaining gaps, map them to
> >>>> people / organizations interested in them, and solicit
> >>>> contributions from these guys.
> >>>>
> >>>> A note on contributions: we need both hardware and people.  By
> >>>> people I mean maintainers for the infrastructure, the tools and
> >>>> all the runners. Cleber & team are willing to serve for the
> >>>> infrastructure, the tools and the Red Hat runners.
> >>>
> >>> So, with 5.0 nearly out the door it seems like a good time to
> >>> check in on this thread again to ask where we are progress-wise
> >>> with this. My impression is that this patchset provides most of
> >>> the scripting and config side of the first step, so what we need
> >>> is for RH to provide an x86 runner machine and tell the gitlab CI
> >>> it exists. I appreciate that the whole coronavirus and
> >>> working-from-home situation will have upended everybody's plans,
> >>> especially when actual hardware might be involved, but how's it
> >>> going ?
> >>>
> >>
> >> Hi Peter,
> >>
> >> You hit the nail in the head here.  We were affected indeed with
> >> our ability to move some machines from one lab to another (across
> >> the country), but we're actively working on it.
> > 
> > For x86, do we really need to be using custom runners ?
> > 
> > With GitLab if someone forks the repo to their personal namespace,
> > they cannot use any custom runners setup by the origin project. So
> > if we use custom runners for x86, people forking won't be able to
> > run the GitLab CI jobs.
> > 
> > As a sub-system maintainer I wouldn't like this, because I ideally
> > want to be able to run the same jobs on my staging tree, that Peter
> > will run at merge time for the PULL request I send.
> > 
> > Thus my strong preference would be to use the GitLab runners in
> > every scenario where they are viable to use. Only use custom
> > runners in the cases where GitLab runners are clearly inadequate
> > for our needs.
> > 
> > Based on what we've setup in GitLab for libvirt,  the shared runners
> > they have work fine for x86. Just need the environments you are
> > testing to be provided as Docker containers (you can actually build
> > and cache the container images during your CI job too).  IOW, any
> > Linux distro build and test jobs should be able to use shared
> > runners on x86, and likewise mingw builds. Custom runners should
> > only be needed if the jobs need todo *BSD / macOS builds, and/or
> > have access to specific hardware devices for some reason.
> 
> Thanks to insist with that point Daniel. I'd rather see every 
> configuration reproducible, so if we loose a hardware sponsor, we can 
> find another one and start another runner.

I also believe that having jobs that can be reproducible is key, but I
differ when it comes to believing that the hardware *alone* should
define if a job is gating or not.

My point is that even with easily accessible systems and software,
different jobs can easily yield different results, because of how the
underlying system is configured.  Sure, containers, but again, we have
to consider non container usage too.

In the RFC I tried to gather feedback on a plan to promote and demote
jobs from a gating status.  IMO, most jobs would begin their lives
being non-gating, and would prove both their stability and their
mantainer's responsiveness.  Even when such jobs are already gating,
it should be trivial to simply demote a gating job because of (and
not limited to) any of the following reasons:

 * job fails in a non-reproducible way
 * hardware is unresponsive and takes too long to produce results
 * maintainer is MIA

Some or all of the gating runners could also pick up jobs sent to
a branch other than "staging", say, a branch called "reproducer". That
branch could be writable by maintainers that need to reproduce a given
failure.

> Also note, if it is not easy to reproduce a runner, it will be very
> hard to debug a reported build/test error.
> 

One of the goals of the patches you'll find on this series is to
propose (I would say *require*) that new jobs that require new hardware
(even easily accessible systems such as x86) should provide easy to run
scripts to recreate those environments.  This is inline with my previous
point that it's not enough to just have the same hardware.

> A non-reproducible runner can not be used as gating, because if they 
> fail it is not acceptable to lock the project development process.
> 

Other people may be more familiar with this, but I do remember projects
such as OpenStack deferring test of hardware/software combinations to
other entities.  One specific party won't be able to
reproduce all configurations unless it's decided to be kept really
small.  In my opinion, it's better to face it and acknowledge that
fact, and have plans to be put to action in the exceptional cases where
the environment to reproduce a test is now unavailable.

> 
> In some cases custom runners are acceptable. These runners won't be 
> "gating" but can post informative log and status.
> 

Well, I have the feeling that some people maintaining those runners
will *not* want to have them as "informational" only.  If they invest a
good amount of time on them, I believe they'll want to reap the
benefits such as other not breaking the code they rely on.  If their
system is not gating, they lose that and may find breakage that CI did
not catch.  Again, I don't think "easily accessible" hardware should be
the only criteria for gating/non-gating status.

For instance, would you consider, say, a "Raspberry Pi 4 Model
B", running KVM jobs to be a reproducible runner?  Would you blame a
developer that breaks a Gating CI job on such a platform and says that
he can not reproduce it?

> [*] Specific hardware that is not easily available.
> 
> - Alistair at last KVM forum talked about a RISCV board
>    (to test host TCG)
> - Aleksandar said at last KVM forum Wavecomp could plug a CI20 MIPS
>    (to test host TCG)
> - Lemote seems interested to setup some Loongson MIPSr6 board
>    (to test interaction with KVM)
> 
> [*] To run code requiring accepting License Agreements
> 
> [*] To run non Free / Open Source code
> 
> 
> Owner of these runners take the responsibility to provide enough 
> time/information about reported bugs, or to debug them themselves.
> 

I do think that the owner of such runners may *not* want to have them
with Gating jobs, but I don't think the opposite should be the case,
because I find it very hard to define, without some prejudice, what
"easily available runner" means.

> 
> Now the problem is GitLab runner is not natively available on the 
> architectures listed in this mail, so custom setup is required. A
> dumb script running ssh to a machine also works (tested) but lot of
> manual tuning/maintenance expected.
> 

That's where I'm trying to help.  I built and tested the gitlab-runner
for a number of non-supported environments, and I expect to build
further on that (say contributing code or feedback back to GitLab so
they become official builds?).

Cheers,
- Cleber.
[Prev in Thread]
Current Thread
[Next in Thread]
Re: [PATCH 0/5] QEMU Gating CI, (continued)
Prev by Date: Re: [PATCH 0/5] QEMU Gating CI
Next by Date: Re: [PATCH 0/5] QEMU Gating CI
Previous by thread: Re: [PATCH 0/5] QEMU Gating CI
Next by thread: Re: [PATCH 0/5] QEMU Gating CI
Index(es):
- Date
- Thread