qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH for-4.0 v8 3/7] migration: fix the multifd code


From: Markus Armbruster
Subject: Re: [Qemu-devel] [PATCH for-4.0 v8 3/7] migration: fix the multifd code when receiving less channels
Date: Wed, 19 Dec 2018 15:11:36 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux)

Fei Li <address@hidden> writes:

> On 12/13/2018 02:17 PM, Markus Armbruster wrote:
>> Fei Li <address@hidden> writes:
>>
>>> In our current code, when multifd is used during migration, if there
>>> is an error before the destination receives all new channels, the
>>> source keeps running, however the destination does not exit but keeps
>>> waiting until the source is killed deliberately.
>>>
>>> Fix this by dumping the specific error and let users decide whether
>>> to quit from the destination side when failing to receive packet via
>>> some channel.
>>>
>>> Cc: Dr. David Alan Gilbert <address@hidden>
>>> Signed-off-by: Fei Li <address@hidden>
>>> Reviewed-by: Peter Xu <address@hidden>
[...]
>>> diff --git a/migration/migration.h b/migration/migration.h
>>> index e413d4d8b6..02b7304610 100644
>>> --- a/migration/migration.h
>>> +++ b/migration/migration.h
>>> @@ -229,7 +229,7 @@ struct MigrationState
>>>   void migrate_set_state(int *state, int old_state, int new_state);
>>>     void migration_fd_process_incoming(QEMUFile *f);
>>> -void migration_ioc_process_incoming(QIOChannel *ioc);
>>> +void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp);
>>>   void migration_incoming_process(void);
>>>     bool  migration_has_all_channels(void);
>>> diff --git a/migration/ram.c b/migration/ram.c
>>> index 7e7deec4d8..c7e3d6b0fd 100644
>>> --- a/migration/ram.c
>>> +++ b/migration/ram.c
>>> @@ -1323,7 +1323,7 @@ bool multifd_recv_all_channels_created(void)
>>>   }
>>>     /* Return true if multifd is ready for the migration, otherwise
>>> false */
>>> -bool multifd_recv_new_channel(QIOChannel *ioc)
>>> +bool multifd_recv_new_channel(QIOChannel *ioc, Error **errp)
>>>   {
>>>       MultiFDRecvParams *p;
>>>       Error *local_err = NULL;
>>> @@ -1331,6 +1331,10 @@ bool multifd_recv_new_channel(QIOChannel *ioc)
>>>         id = multifd_recv_initial_packet(ioc, &local_err);
>>>       if (id < 0) {
>>> +        error_propagate_prepend(errp, local_err,
>>> +                                "failed to receive packet"
>>> +                                " via multifd channel %d: ",
>>> +                                atomic_read(&multifd_recv_state->count));
>>>           multifd_recv_terminate_threads(local_err);
>>>           return false;
>> Here, we return false without setting an error.
> I am not sure whether I understand correctly, but here I think the above
> error_propagate_prepend() set the error to errp.

You're right, I got confused.

However, you shouldn't access @local_err after error_propagate() or
similar.  Please insert error_propagate_prepend() after
multifd_recv_terminate_threads(), lik you do in the next hunk.

>>>       }
>>> @@ -1340,6 +1344,7 @@ bool multifd_recv_new_channel(QIOChannel *ioc)
>>>           error_setg(&local_err, "multifd: received id '%d' already setup'",
>>>                      id);
>>>           multifd_recv_terminate_threads(local_err);
>>> +        error_propagate(errp, local_err);
>>>           return false;
>> Here, we return false with setting an error.
>>
>>>       }
>>>       p->c = ioc;
>>> @@ -1351,7 +1356,8 @@ bool multifd_recv_new_channel(QIOChannel *ioc)
>>>       qemu_thread_create(&p->thread, p->name, multifd_recv_thread, p,
>>>                          QEMU_THREAD_JOINABLE);
>>>       atomic_inc(&multifd_recv_state->count);
>>> -    return multifd_recv_state->count == migrate_multifd_channels();
>>> +    return atomic_read(&multifd_recv_state->count) ==
>>> +           migrate_multifd_channels();
>> Here, we return either true of false without setting an error.
> yes.
>> Taken together, there are three cases:
>>
>> 1. Succeed and return true
> Yes, when all multifd channels are correctly received.
>> 2. Succeed and return false
> Yes, when the current multifd channel is received correctly, but
> have not received all the channels.

Aha.

>> 3. Fail (set an error) and return false.
> Yes. And with the propagated error, the code just returns and
> report the error in migration_channel_process_incoming().
>> Assuming that's what we want: please update the function comment to
>> spell them out.
> Ok, I will update the three cases in the comment to clarify in detail.
>
> Have a nice day, thanks :)

You're welcome!



reply via email to

[Prev in Thread] Current Thread [Next in Thread]