qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 4/6] tests/qtest: make more migration pre-copy scenarios r


From: Fabiano Rosas
Subject: Re: [PATCH v2 4/6] tests/qtest: make more migration pre-copy scenarios run non-live
Date: Mon, 24 Apr 2023 18:01:36 -0300

Daniel P. Berrangé <berrange@redhat.com> writes:

> There are 27 pre-copy live migration scenarios being tested. In all of
> these we force non-convergance and run for one iteration, then let it
> converge and wait for completion during the second (or following)
> iterations. At 3 mbps bandwidth limit the first iteration takes a very
> long time (~30 seconds).
>
> While it is important to test the migration passes and convergance
> logic, it is overkill to do this for all 27 pre-copy scenarios. The
> TLS migration scenarios in particular are merely exercising different
> code paths during connection establishment.
>
> To optimize time taken, switch most of the test scenarios to run
> non-live (ie guest CPUs paused) with no bandwidth limits. This gives
> a massive speed up for most of the test scenarios.
>
> For test coverage the following scenarios are unchanged
>
>  * Precopy with UNIX sockets
>  * Precopy with UNIX sockets and dirty ring tracking
>  * Precopy with XBZRLE
>  * Precopy with multifd
>
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> ---
>  tests/qtest/migration-test.c | 60 ++++++++++++++++++++++++++++++------
>  1 file changed, 50 insertions(+), 10 deletions(-)
>
> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> index 6492ffa7fe..40d0f75480 100644
> --- a/tests/qtest/migration-test.c
> +++ b/tests/qtest/migration-test.c
> @@ -568,6 +568,9 @@ typedef struct {
>          MIG_TEST_FAIL_DEST_QUIT_ERR,
>      } result;
>  
> +    /* Whether the guest CPUs should be running during migration */
> +    bool live;
> +
>      /* Postcopy specific fields */
>      void *postcopy_data;
>      bool postcopy_preempt;
> @@ -1324,8 +1327,6 @@ static void test_precopy_common(MigrateCommon *args)
>          return;
>      }
>  
> -    migrate_ensure_non_converge(from);
> -
>      if (args->start_hook) {
>          data_hook = args->start_hook(from, to);
>      }
> @@ -1335,6 +1336,31 @@ static void test_precopy_common(MigrateCommon *args)
>          wait_for_serial("src_serial");
>      }
>  
> +    if (args->live) {
> +        /*
> +         * Testing live migration, we want to ensure that some
> +         * memory is re-dirtied after being transferred, so that
> +         * we exercise logic for dirty page handling. We achieve
> +         * this with a ridiculosly low bandwidth that guarantees
> +         * non-convergance.
> +         */
> +        migrate_ensure_non_converge(from);
> +    } else {
> +        /*
> +         * Testing non-live migration, we allow it to run at
> +         * full speed to ensure short test case duration.
> +         * For tests expected to fail, we don't need to
> +         * change anything.
> +         */
> +        if (args->result == MIG_TEST_SUCCEED) {
> +            qtest_qmp_assert_success(from, "{ 'execute' : 'stop'}");
> +            if (!got_stop) {
> +                qtest_qmp_eventwait(from, "STOP");
> +            }
> +            migrate_ensure_converge(from);
> +        }
> +    }
> +
>      if (!args->connect_uri) {
>          g_autofree char *local_connect_uri =
>              migrate_get_socket_address(to, "socket-address");
> @@ -1352,19 +1378,29 @@ static void test_precopy_common(MigrateCommon *args)
>              qtest_set_expected_status(to, EXIT_FAILURE);
>          }
>      } else {
> -        wait_for_migration_pass(from);
> +        if (args->live) {
> +            wait_for_migration_pass(from);
>  
> -        migrate_ensure_converge(from);
> +            migrate_ensure_converge(from);
>  
> -        /* We do this first, as it has a timeout to stop us
> -         * hanging forever if migration didn't converge */
> -        wait_for_migration_complete(from);
> +            /*
> +             * We do this first, as it has a timeout to stop us
> +             * hanging forever if migration didn't converge
> +             */
> +            wait_for_migration_complete(from);
> +
> +            if (!got_stop) {
> +                qtest_qmp_eventwait(from, "STOP");
> +            }
> +        } else {
> +            wait_for_migration_complete(from);
>  
> -        if (!got_stop) {
> -            qtest_qmp_eventwait(from, "STOP");
> +            qtest_qmp_assert_success(to, "{ 'execute' : 'cont'}");

I retested and the problem still persists. The issue is with this wait +
cont sequence:

wait_for_migration_complete(from);
qtest_qmp_assert_success(to, "{ 'execute' : 'cont'}");

We wait for the source to finish but by the time qmp_cont executes, the
dst is still INMIGRATE, autostart gets set and I never see the RESUME
event.

When the dst migration finishes the VM gets put in RUN_STATE_PAUSED (at
process_incoming_migration_bh):

    if (!global_state_received() ||
        global_state_get_runstate() == RUN_STATE_RUNNING) {
        if (autostart) {
            vm_start();
        } else {
            runstate_set(RUN_STATE_PAUSED);
        }
    } else if (migration_incoming_colo_enabled()) {
        migration_incoming_disable_colo();
        vm_start();
    } else {
        runstate_set(global_state_get_runstate());  <-- HERE
    }

Do we need to add something to that routine like this?

    if (autostart &&
        global_state_get_runstate() != RUN_STATE_RUNNING) {
        vm_start();
    }

Otherwise it seems we'll just ignore a 'cont' that was received when the
migration is still ongoing.

>          }
>  
> -        qtest_qmp_eventwait(to, "RESUME");
> +        if (!got_resume) {
> +            qtest_qmp_eventwait(to, "RESUME");
> +        }
>  
>          wait_for_serial("dest_serial");
>      }



reply via email to

[Prev in Thread] Current Thread [Next in Thread]