[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 2/2] tests/qtest: make more migration pre-copy scenarios run
From: |
Daniel P . Berrangé |
Subject: |
Re: [PATCH 2/2] tests/qtest: make more migration pre-copy scenarios run non-live |
Date: |
Thu, 20 Apr 2023 16:58:31 +0100 |
User-agent: |
Mutt/2.2.9 (2022-11-12) |
On Thu, Apr 20, 2023 at 02:59:00PM +0200, Juan Quintela wrote:
> Daniel P. Berrangé <berrange@redhat.com> wrote:
> > There are 27 pre-copy live migration scenarios being tested. In all of
> > these we force non-convergance and run for one iteration, then let it
> > converge and wait for completion during the second (or following)
> > iterations. At 3 mbps bandwidth limit the first iteration takes a very
> > long time (~30 seconds).
> >
> > While it is important to test the migration passes and convergance
> > logic, it is overkill to do this for all 27 pre-copy scenarios. The
> > TLS migration scenarios in particular are merely exercising different
> > code paths during connection establishment.
> >
> > To optimize time taken, switch most of the test scenarios to run
> > non-live (ie guest CPUs paused) with no bandwidth limits. This gives
> > a massive speed up for most of the test scenarios.
> >
> > For test coverage the following scenarios are unchanged
> >
> > * Precopy with UNIX sockets
> > * Precopy with UNIX sockets and dirty ring tracking
> > * Precopy with XBZRLE
> > * Precopy with multifd
>
> Just for completeness: the other test that is still slow is
> /migration/vcpu_dirty_limit.
>
> > - migrate_ensure_non_converge(from);
> > + if (args->live) {
> > + migrate_ensure_non_converge(from);
> > + } else {
> > + migrate_ensure_converge(from);
> > + }
>
> Looks ... weird?
> But the only way that I can think of improving it is to pass args to
> migrate_ensure_*() and that is a different kind of weird.
I'm going to change this a little anyway. Currently for the non-live
case, I start the migration and then stop the CPUs. I want to reverse
that order, as we should have CPUs paused before even starting the
migration to ensure we don't have any re-dirtied pages at all.
>
> > } else {
> > - if (args->iterations) {
> > - while (args->iterations--) {
> > + if (args->live) {
> > + if (args->iterations) {
> > + while (args->iterations--) {
> > + wait_for_migration_pass(from);
> > + }
> > + } else {
> > wait_for_migration_pass(from);
> > }
> > +
> > + migrate_ensure_converge(from);
>
> I think we should change iterations to be 1 when we create args, but
> otherwise, treat 0 as 1 and change it to something in the lines of:
>
> if (args->live) {
> while (args->iterations-- >= 0) {
> wait_for_migration_pass(from);
> }
> migrate_ensure_converge(from);
>
> What do you think?
I think in retrospect 'iterations' was overkill as we only use the
values 0 (implicitly 1) or 2. IOW we could just just a 'bool multipass'
to distinguish the two cases.
> > - qtest_qmp_eventwait(to, "RESUME");
> > + if (!args->live) {
> > + qtest_qmp_discard_response(to, "{ 'execute' : 'cont'}");
> > + }
> > + if (!got_resume) {
> > + qtest_qmp_eventwait(to, "RESUME");
> > + }
> >
> > wait_for_serial("dest_serial");
> > }
>
> I was looking at the "culprit" of Lukas problem, and it is not directly
> obvious. I see that when we expect one event, we just drop any event
> that we are not interested in. I don't know if that is the proper
> behaviour or if that is what affecting this test.
I've not successfully reproduced it yet, nor figured out a real
scenario where it could plausibly happen. i'm looking to add more
debug to help us out.
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|