[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 2/2] migration: failover: continue to wait card unplug on err
From: |
Juan Quintela |
Subject: |
Re: [PATCH 2/2] migration: failover: continue to wait card unplug on error |
Date: |
Wed, 30 Jun 2021 11:13:55 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) |
Laurent Vivier <lvivier@redhat.com> wrote:
> On 29/06/2021 19:50, Juan Quintela wrote:
>> Laurent Vivier <lvivier@redhat.com> wrote:
>>> If the user cancels the migration in the unplug-wait state,
>>> QEMU will try to plug back the card and this fails because the card
>>> is partially unplugged.
>>> To avoid the problem, continue to wait the card unplug, but to
>>> allow the migration to be canceled if the card never finishes to unplug
>>> use a timeout.
>>>
>>> Bug: https://bugzilla.redhat.com/show_bug.cgi?id=1976852
>>> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
>>> ---
>>> migration/migration.c | 11 +++++++++++
>>> 1 file changed, 11 insertions(+)
>>>
>>> diff --git a/migration/migration.c b/migration/migration.c
>>> index 3e92c405a2b6..3b06d43a7f42 100644
>>> --- a/migration/migration.c
>>> +++ b/migration/migration.c
>>> @@ -3679,6 +3679,17 @@ static void qemu_savevm_wait_unplug(MigrationState
>>> *s, int old_state,
>>> qemu_savevm_state_guest_unplug_pending()) {
>>> qemu_sem_timedwait(&s->wait_unplug_sem, 250);
>>> }
>>> + if (s->state != MIGRATION_STATUS_WAIT_UNPLUG) {
>>> + int timeout = 120; /* 30 seconds */
>>> + /*
>>> + * migration has been canceled
>>> + * but as we have started an unplug we must wait the end
>>> + * to be able to plug back the card
>>> + */
>>> + while (timeout-- && qemu_savevm_state_guest_unplug_pending()) {
>>> + qemu_sem_timedwait(&s->wait_unplug_sem, 250);
>>> + }
>>> + }
>>>
>>> migrate_set_state(&s->state, MIGRATION_STATUS_WAIT_UNPLUG,
>>> new_state);
>>> } else {
>> I agree with the idea. But if we are getting out due to timeout == 0,
>> shouldn't we return some error, warning, whatever?
>
> In that case, we keep the current behaviour: guest kernel will report
> an error when it
> will try to plug back the card that has not been unplugged. This is a
> corner case: if it
> happens we have something really wrong with the machine. Perhaps we can
> remove the
> timeout, but I don't like to block the user, or increase it to be sure.
Oh, I whole agree that it is a corner case, and that it shouldn't
happen.
But if it happens, we don't log it anywhere. That was my complaint.
Later, Juan.