[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v4 11/12] migration: poll the cm event while wai
From: |
858585 jemmy |
Subject: |
Re: [Qemu-devel] [PATCH v4 11/12] migration: poll the cm event while wait RDMA work request completion |
Date: |
Thu, 31 May 2018 15:36:08 +0800 |
On Thu, May 31, 2018 at 1:33 AM, Dr. David Alan Gilbert
<address@hidden> wrote:
> * Lidong Chen (address@hidden) wrote:
>> If the peer qemu is crashed, the qemu_rdma_wait_comp_channel function
>> maybe loop forever. so we should also poll the cm event fd, and when
>> receive any cm event, we consider some error happened.
>>
>> Signed-off-by: Lidong Chen <address@hidden>
>
> I don't understand enough about the way the infiniband fd's work to
> fully review this; so I'd appreciate if some one who does could
> comment/add their review.
Hi Avaid:
we need your help. I also not find any document about the cq
channel event fd and
cm channel event f.
Should we set the events to G_IO_IN | G_IO_HUP | G_IO_ERR? or
G_IO_IN is enough?
pfds[0].events = G_IO_IN | G_IO_HUP | G_IO_ERR;
pfds[1].events = G_IO_IN | G_IO_HUP | G_IO_ERR;
Thanks.
>
>> ---
>> migration/rdma.c | 35 ++++++++++++++++++++++++-----------
>> 1 file changed, 24 insertions(+), 11 deletions(-)
>>
>> diff --git a/migration/rdma.c b/migration/rdma.c
>> index 1b9e261..d611a06 100644
>> --- a/migration/rdma.c
>> +++ b/migration/rdma.c
>> @@ -1489,6 +1489,9 @@ static uint64_t qemu_rdma_poll(RDMAContext *rdma,
>> uint64_t *wr_id_out,
>> */
>> static int qemu_rdma_wait_comp_channel(RDMAContext *rdma)
>> {
>> + struct rdma_cm_event *cm_event;
>> + int ret = -1;
>> +
>> /*
>> * Coroutine doesn't start until migration_fd_process_incoming()
>> * so don't yield unless we know we're running inside of a coroutine.
>> @@ -1504,25 +1507,35 @@ static int qemu_rdma_wait_comp_channel(RDMAContext
>> *rdma)
>> * But we need to be able to handle 'cancel' or an error
>> * without hanging forever.
>> */
>> - while (!rdma->error_state && !rdma->received_error) {
>> - GPollFD pfds[1];
>> + while (!rdma->error_state && !rdma->received_error) {
>> + GPollFD pfds[2];
>> pfds[0].fd = rdma->comp_channel->fd;
>> pfds[0].events = G_IO_IN | G_IO_HUP | G_IO_ERR;
>> + pfds[0].revents = 0;
>> +
>> + pfds[1].fd = rdma->channel->fd;
>> + pfds[1].events = G_IO_IN | G_IO_HUP | G_IO_ERR;
>> + pfds[1].revents = 0;
>> +
>> /* 0.1s timeout, should be fine for a 'cancel' */
>> - switch (qemu_poll_ns(pfds, 1, 100 * 1000 * 1000)) {
>> - case 1: /* fd active */
>> - return 0;
>> + qemu_poll_ns(pfds, 2, 100 * 1000 * 1000);
>
> Shouldn't we still check the return value of this; if it's negative
> something has gone wrong.
I will fix this.
Thanks.
>
> Dave
>
>> - case 0: /* Timeout, go around again */
>> - break;
>> + if (pfds[1].revents) {
>> + ret = rdma_get_cm_event(rdma->channel, &cm_event);
>> + if (!ret) {
>> + rdma_ack_cm_event(cm_event);
>> + }
>> + error_report("receive cm event while wait comp channel,"
>> + "cm event is %d", cm_event->event);
>>
>> - default: /* Error of some type -
>> - * I don't trust errno from qemu_poll_ns
>> - */
>> - error_report("%s: poll failed", __func__);
>> + /* consider any rdma communication event as an error */
>> return -EPIPE;
>> }
>>
>> + if (pfds[0].revents) {
>> + return 0;
>> + }
>> +
>> if (migrate_get_current()->state ==
>> MIGRATION_STATUS_CANCELLING) {
>> /* Bail out and let the cancellation happen */
>> return -EPIPE;
>> --
>> 1.8.3.1
>>
> --
> Dr. David Alan Gilbert / address@hidden / Manchester, UK
- Re: [Qemu-devel] [PATCH v4 04/12] migration: avoid concurrent invoke channel_close by different threads, (continued)
[Qemu-devel] [PATCH v4 06/12] migration: Stop rdma yielding during incoming postcopy, Lidong Chen, 2018/05/30
[Qemu-devel] [PATCH v4 05/12] migration: implement bi-directional RDMA QIOChannel, Lidong Chen, 2018/05/30
[Qemu-devel] [PATCH v4 07/12] migration: not wait RDMA_CM_EVENT_DISCONNECTED event after rdma_disconnect, Lidong Chen, 2018/05/30
[Qemu-devel] [PATCH v4 08/12] migration: implement io_set_aio_fd_handler function for RDMA QIOChannel, Lidong Chen, 2018/05/30
[Qemu-devel] [PATCH v4 09/12] migration: invoke qio_channel_yield only when qemu_in_coroutine(), Lidong Chen, 2018/05/30
[Qemu-devel] [PATCH v4 11/12] migration: poll the cm event while wait RDMA work request completion, Lidong Chen, 2018/05/30
[Qemu-devel] [PATCH v4 10/12] migration: create a dedicated thread to release rdma resource, Lidong Chen, 2018/05/30
[Qemu-devel] [PATCH v4 12/12] migration: implement the shutdown for RDMA QIOChannel, Lidong Chen, 2018/05/30