[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v5 08/10] migration: create a dedicated thread t
From: |
Gal Shachaf |
Subject: |
Re: [Qemu-devel] [PATCH v5 08/10] migration: create a dedicated thread to release rdma resource |
Date: |
Mon, 23 Jul 2018 14:54:24 +0000 |
On Thu, Jul 5, 2018 at 10:26 PM, 858585 jemmy <address@hidden> wrote:
> On Thu, Jun 28, 2018 at 2:59 AM, Dr. David Alan Gilbert
> <address@hidden> wrote:
>> * Lidong Chen (address@hidden) wrote:
>>> ibv_dereg_mr wait for a long time for big memory size virtual server.
>>>
>>> The test result is:
>>> 10GB 326ms
>>> 20GB 699ms
>>> 30GB 1021ms
>>> 40GB 1387ms
>>> 50GB 1712ms
>>> 60GB 2034ms
>>> 70GB 2457ms
>>> 80GB 2807ms
>>> 90GB 3107ms
>>> 100GB 3474ms
>>> 110GB 3735ms
>>> 120GB 4064ms
>>> 130GB 4567ms
>>> 140GB 4886ms
>>>
>>> this will cause the guest os hang for a while when migration finished.
>>> So create a dedicated thread to release rdma resource.
>>>
>>> Signed-off-by: Lidong Chen <address@hidden>
>>> ---
>>> migration/rdma.c | 43 +++++++++++++++++++++++++++----------------
>>> 1 file changed, 27 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/migration/rdma.c b/migration/rdma.c index
>>> dfa4f77..f12e8d5 100644
>>> --- a/migration/rdma.c
>>> +++ b/migration/rdma.c
>>> @@ -2979,35 +2979,46 @@ static void
>>> qio_channel_rdma_set_aio_fd_handler(QIOChannel *ioc,
>>> }
>>> }
>>>
>>> -static int qio_channel_rdma_close(QIOChannel *ioc,
>>> - Error **errp)
>>> +static void *qio_channel_rdma_close_thread(void *arg)
>>> {
>>> - QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
>>> - RDMAContext *rdmain, *rdmaout;
>>> - trace_qemu_rdma_close();
>>> + RDMAContext **rdma = arg;
>>> + RDMAContext *rdmain = rdma[0];
>>> + RDMAContext *rdmaout = rdma[1];
>>>
>>> - rdmain = rioc->rdmain;
>>> - if (rdmain) {
>>> - atomic_rcu_set(&rioc->rdmain, NULL);
>>> - }
>>> -
>>> - rdmaout = rioc->rdmaout;
>>> - if (rdmaout) {
>>> - atomic_rcu_set(&rioc->rdmaout, NULL);
>>> - }
>>> + rcu_register_thread();
>>>
>>> synchronize_rcu();
>>
>> * see below
>>
>>> -
>>> if (rdmain) {
>>> qemu_rdma_cleanup(rdmain);
>>> }
>>> -
>>> if (rdmaout) {
>>> qemu_rdma_cleanup(rdmaout);
>>> }
>>>
>>> g_free(rdmain);
>>> g_free(rdmaout);
>>> + g_free(rdma);
>>> +
>>> + rcu_unregister_thread();
>>> + return NULL;
>>> +}
>>> +
>>> +static int qio_channel_rdma_close(QIOChannel *ioc,
>>> + Error **errp) {
>>> + QemuThread t;
>>> + QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
>>> + RDMAContext **rdma = g_new0(RDMAContext*, 2);
>>> +
>>> + trace_qemu_rdma_close();
>>> + if (rioc->rdmain || rioc->rdmaout) {
>>> + rdma[0] = rioc->rdmain;
>>> + rdma[1] = rioc->rdmaout;
>>> + qemu_thread_create(&t, "rdma cleanup",
>>> qio_channel_rdma_close_thread,
>>> + rdma, QEMU_THREAD_DETACHED);
>>> + atomic_rcu_set(&rioc->rdmain, NULL);
>>> + atomic_rcu_set(&rioc->rdmaout, NULL);
>>
>> I'm not sure this pair is ordered with the synchronise_rcu above;
>> Doesn't that mean, on a bad day, that you could get:
>>
>>
>> main-thread rdma_cleanup another-thread
>> qmu_thread_create
>> synchronise_rcu
>> reads rioc->rdmain
>> starts doing something with rdmain
>> atomic_rcu_set
>> rdma_cleanup
>>
>>
>> so the another-thread is using it during the cleanup?
>> Would just moving the atomic_rcu_sets before the qemu_thread_create
>> fix that?
> yes, I will fix it.
>
>>
>> However, I've got other worries as well:
>> a) qemu_rdma_cleanup does:
>> migrate_get_current()->state == MIGRATION_STATUS_CANCELLING
>>
>> which worries me a little if someone immediately tries to restart
>> the migration.
Because the current qemu version don't wait for RDMA_CM_EVENT_DISCONNECTED
event after rdma_disconnect, so I think it's not necessary to send
RDMA_CONTROL_ERROR.
compare to send RDMA_CONTROL_ERROR, I think use cm event to notify peer qemu is
better.
maybe the rdma is already in error_state, and RDMA_CONTROL_ERROR cannot send
successfully.
For peer qemu, in qemu_rdma_wait_comp_channel function, it's should not only
poll rdma->comp_channel->fd, it's should also poll rdma->channel->fd.
[GS] I agree that we should poll on CM events in addition to CQ events.
for the source qemu, it's fixed by this patch.
migration: poll the cm event while wait RDMA work request completion
and for destination qemu, it's need a new patch to fix it.
so I prefer to remove MIGRATION_STATUS_CANCELLING in qemu_rdma_cleanup function.
[GS] The intention of this patch is to move the costly ibv_dereg_mr to a
separate thread, so we do not actually need to also move the RDMA_CONTROL_ERROR
and rdma_disconnect, that come beforehand in qemu_rdma_cleanup, to a thread.
I suggest that we move them out of qemu_rdma_cleanup to a new function
"qemu_rdma_disconnect" and call it on each RDMAcontext before the
qemu_thread_create.
>>
>> b) I don't understand what happens if someone does try and restart
>> the migration after that, but in the ~5s it takes the ibv cleanup
>> to happen.
I prefer to add a new variable in current_migration. if the rdma cleanup
thread has not exited. it's should not start a new migration.
[GS] I support this suggestion.
>
> yes, I will try to fix it.
>
>>
>> Dave
>>
>>
>>> + }
>>>
>>> return 0;
>>> }
>>> --
>>> 1.8.3.1
>>>
>> --
>> Dr. David Alan Gilbert / address@hidden / Manchester, UK