qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC 1/1] docs/deve/ci-plan: define a high-level plan for the QEMU G


From: Willian Rampazzo
Subject: Re: [RFC 1/1] docs/deve/ci-plan: define a high-level plan for the QEMU GitLab CI
Date: Wed, 15 Sep 2021 12:59:59 -0300

On Wed, Sep 15, 2021 at 11:07 AM Daniel P. Berrangé <berrange@redhat.com> wrote:
>
> On Wed, Sep 15, 2021 at 10:51:56AM -0300, Willian Rampazzo wrote:
> > On Wed, Sep 15, 2021 at 6:00 AM Daniel P. Berrangé <berrange@redhat.com> 
> > wrote:
> > >
> > > On Tue, Sep 14, 2021 at 03:48:30PM -0300, Willian Rampazzo wrote:
> > > > This adds a high-level plan for the QEMU GitLab CI based on use cases.
> > > > The idea is to have a base for evolving the QEMU CI. It sets high-level
> > > > characteristics for the QEMU CI use cases, which helps guide its
> > > > development.
> > > >
> > > > Signed-off-by: Willian Rampazzo <willianr@redhat.com>
> > > > ---
> > > >  docs/devel/ci-plan.rst | 77 ++++++++++++++++++++++++++++++++++++++++++
> > > >  docs/devel/ci.rst      |  1 +
> > > >  2 files changed, 78 insertions(+)
> > > >  create mode 100644 docs/devel/ci-plan.rst
> > > >
> > > > diff --git a/docs/devel/ci-plan.rst b/docs/devel/ci-plan.rst
> > > > new file mode 100644
> > > > index 0000000000..5e95b6bcea
> > > > --- /dev/null
> > > > +++ b/docs/devel/ci-plan.rst
> > > > @@ -0,0 +1,77 @@
> > > > +The GitLab CI structure
> > > > +=======================
> > > > +
> > > > +This section describes the current state of the QEMU GitLab CI and the
> > > > +high-level plan for its future.
> > > > +
> > > > +Current state
> > > > +-------------
> > > > +
> > > > +The mainstream QEMU project considers the GitLab CI its primary CI 
> > > > system.
> > > > +Currently, it runs 120+ jobs, where ~36 are container build jobs, 69 
> > > > are QEMU
> > > > +build jobs, ~22 are test jobs, 1  is a web page deploy job, and 1 is an
> > > > +external job covering Travis jobs execution.
> > > > +
> > > > +In the current state, every push a user does to its fork runs most of 
> > > > the jobs
> > > > +compared to the jobs running on the main repository. The exceptions 
> > > > are the
> > > > +acceptance tests jobs, which run automatically on the main repository 
> > > > only.
> > > > +Running most of the jobs in the user's fork or the main repository is 
> > > > not
> > > > +viable. The job number tends to increase, becoming impractical to run 
> > > > all of
> > > > +them on every single push.
> > > > +
> > > > +Future of QEMU GitLab CI
> > > > +------------------------
> > > > +
> > > > +Following is a proposal to establish a high-level plan and set the
> > > > +characteristics for the QEMU GitLab CI. The idea is to organize the CI 
> > > > by use
> > > > +cases, avoid wasting resources and CI minutes, anticipating the time 
> > > > GitLab
> > > > +starts to enforce minutes limits soon.
> > > > +
> > > > +Use cases
> > > > +^^^^^^^^^
> > > > +
> > > > +Below is a list of the most common use cases for the QEMU GitLab CI.
> > > > +
> > > > +Gating
> > > > +""""""
> > > > +
> > > > +The gating set of jobs runs on the maintainer's pull requests when the 
> > > > project
> > > > +leader pushes them to the staging branch of the project. The gating CI 
> > > > pipeline
> > > > +has the following characteristics:
> > > > +
> > > > + * Jobs tagged as gating run as part of the gating CI pipeline;
> > > > + * The gating CI pipeline consists of stable jobs;
> > > > + * The execution duration of the gating CI pipeline should, as much as 
> > > > possible,
> > > > +   have an upper bound limit of 2 hours.
> > > > +
> > > > +Developers
> > > > +""""""""""
> > > > +
> > > > +A developer working on a new feature or fixing an issue may want to 
> > > > run/propose
> > > > +a specific set of tests. Those tests may, eventually, benefit other 
> > > > developers.
> > > > +A developer CI pipeline has the following characteristics:
> > > > +
> > > > + * It is easy to run current tests available in the project;
> > > > + * It is easy to add new tests or remove unneeded tests;
> > > > + * It is flexible enough to allow changes in the current jobs.
> > > > +
> > > > +Maintainers
> > > > +"""""""""""
> > > > +
> > > > +When accepting developers' patches, a maintainer may want to run a 
> > > > specific
> > > > +test set. A maintainer CI pipeline has the following characteristics:
> > > > +
> > > > + * It consists of tests that are valuable for the subsystem;
> > > > + * It is easy to run a set of specific tests available in the project;
> > > > + * It is easy to add new tests or remove unneeded tests.
> > >
> > >
> > > Neither of these describe why I use CI as a developer and/or subsys
> > > maintainer.
> > >
> > > My desire with using CI is to (as close as possible) be able to
> > > execute the exact same  set of tests that will be run by gating CI
> > > on pull requests.
> >
> > I totally understand your desire and I think it is valid.
> >
> > What I'm trying with this proposal is the same strategy we used when
> > we started planning for the gating CI. The decision was to start
> > small. Today the CI grew and we don´t have a so called gating CI yet,
> > but a bunch of jobs that runs on staging branch, some needing
> > reevaluation whether they should run on staging or not.
>
> Of course we have a gating CI today, it is the very thing you just
> described. Whether or not the set of CI jobs that run on staging is
> designed ground up, or evolved organically is irrelevant. It is what
> exists today and is used to test merges to master, so by definition
> is is our gating CI.  The set of jobs will never be perfect because
> we're in a changing world, so they will always need re-evaluation
> periodically to judge whether they're the right mix for our current
> needs.

Okay, let me rephrase my sentence. Today the CI grew, and we have an
opportunity to improve the gating CI to reduce the number of manual
interventions we have and make it fit the project better. For example,
during the release freeze window, or right before it, sometimes the
gating CI execution was ignored because it took too much time to
execute. Another example is the set of flaky tests we have running
today. They should not be part of the gating CI.

>
> > > My goal is to minimize (ideally eliminate) the risk that a patch
> > > series or pull request gets rejected with a need to re-spin due
> > > to CI failures. Each such rejection causes a round trip delaying
> > > merge, and this wastes my time & the maintainer/gate keepers' time.
> > > It is also hard to debug failures if I can't replicate the gating
> > > CI myself.
> >
> > Again, I totally agree with you. That would be the perfect scenario.
>
> Aside from the custom runners, it is the scenario that exists today
> and is relied on by many people. That existing usage and starting
> point has to be acknowledged in any CI plan that is proposed.

If I understood correctly, we should first find a way to let the
developers run the same jobs as the gating CI and then think about
other improvements, right? I can adjust the proposal to list that, no
problem. At least we have a plan.

>
> > The barrier I see to have it working the way you described is the
> > hardware access. The staging branch runs on two different custom
> > runners. We have two possible solutions to accomplish the scenario you
> > described: remove the custom runners from the staging branch and let
> > the jobs run on the GitLab CI shared runners, which everyone with
> > access to GitLab can use, or allow developers to access the custom
> > runners.
>
> It isn't that large of a barrier IMHO. It will be slow, but people
> can bring up custom runners for ppc/s390 using QEMU VMs if they lack
> access to hardware. The most important is the build coverage and
> that's already acquired via the cross compilers. The custom runners
> essentially only add "make check" as a benefit.

Okay, I can adjust the plan to list this too. My only concern is about
those developers that do not have access to a custom runner, but we
can discuss it during the implementation.

>
> > Today, I don´t think any of those options are feasible or bring value
> > to the project. That is one of the reasons I'm not covering it now in
> > the future plan. As I mentioned before, let's take another small step
> > and organize a gating CI with some ground rules. When we reach it, the
> > future future step can be to implement merge requests, think about
> > reproducibility, and so on.
>
> Being able to replicate gating CI jobs as a contributor is the most
> critical starting point for any plan. Historically diagnosing failures
> in gating CI has been the biggest pain point in submitting code to QEMU,
> and why myself and others have spent so much time on Travis, and now
> GitLab config to let us have a well defined environment and ruleset for
> build jobs. That can't be ignored by any proposed CI plan.

Alright, I can adjust the plan to add this too.

And just a side note, I never said the work done until now is not
valuable. I'm sure all the work done until now in the CI is valuable.
I feel today that we reached a point where we need to talk about the
next steps. I personally find it difficult to contribute with the CI
because there are diverging ideas about what we should do next, so
having a high-level plan helps newcomers interested in contributing
with the CI.

>
> Regards,
> Daniel
> --
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
>




reply via email to

[Prev in Thread] Current Thread [Next in Thread]