qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] rdma: Fix cleanup in error paths


From: Padmanabh Ratnakar
Subject: Re: [Qemu-devel] [PATCH] rdma: Fix cleanup in error paths
Date: Wed, 25 Mar 2015 13:50:41 +0000

> -----Original Message-----
> From: Dr. David Alan Gilbert [mailto:address@hidden
> Sent: Wednesday, March 25, 2015 4:10 PM
> To: Padmanabh Ratnakar
> Cc: address@hidden; address@hidden; address@hidden;
> Padmanabh Ratnakar; Meghana Cheripady; address@hidden
> Subject: Re: [PATCH] rdma: Fix cleanup in error paths
> 
> > As part of commit e325b49a320b493cc5d69e263751ff716dc458fe,
> > order in which resources are destroyed was changed for fixing a seg
> > fault. Due to this change, CQ will never get destroyed as CQ should be
> > destroyed after QP destruction. Seg fault is caused improper cleanup
> > when connection fails. Fixing cleanup after connection failure and
> > order in which resources are destroyed in qemu_rdma_cleanup() routine.
> 
> Do you have a test case that triggers the seg fault?

We reverted the change of destroy CQ before destroy QP as it was preventing CQ
from getting destroyed.
Then we connected RoCE adapter to a regular NIC adapter.
We tried to establish connection using qemu  from RoCE adapter to other side.
Connection establishment times out and qemu hits seg fault while QP is getting 
destroyed. 

> 
> > Signed-off-by: Meghana Cheripady <address@hidden>
> > Signed-off-by: Padmanabh Ratnakar <address@hidden>
> > ---
> >  migration/rdma.c |   22 ++++++++--------------
> >  1 files changed, 8 insertions(+), 14 deletions(-)
> >
> > diff --git a/migration/rdma.c b/migration/rdma.c index
> > e6c3a67..77e3444 100644
> > --- a/migration/rdma.c
> > +++ b/migration/rdma.c
> > @@ -2194,6 +2194,10 @@ static void qemu_rdma_cleanup(RDMAContext
> *rdma)
> >          }
> >      }
> >
> > +    if (rdma->qp) {
> > +        rdma_destroy_qp(rdma->cm_id);
> > +        rdma->qp = NULL;
> > +    }
> 
> OK, that makes sense, the manpage for rdma_destroy_qp says 'Users must
> destroy any QP associated with an rdma_cm_id before destroying the ID.'
> so it's probably good to have this early.
> 
> >      if (rdma->cq) {
> >          ibv_destroy_cq(rdma->cq);
> >          rdma->cq = NULL;
> > @@ -2206,18 +2210,14 @@ static void qemu_rdma_cleanup(RDMAContext
> *rdma)
> >          ibv_dealloc_pd(rdma->pd);
> >          rdma->pd = NULL;
> >      }
> > -    if (rdma->listen_id) {
> > -        rdma_destroy_id(rdma->listen_id);
> > -        rdma->listen_id = NULL;
> > -    }
> >      if (rdma->cm_id) {
> > -        if (rdma->qp) {
> > -            rdma_destroy_qp(rdma->cm_id);
> > -            rdma->qp = NULL;
> > -        }
> >          rdma_destroy_id(rdma->cm_id);
> >          rdma->cm_id = NULL;
> >      }
> > +    if (rdma->listen_id) {
> > +        rdma_destroy_id(rdma->listen_id);
> > +        rdma->listen_id = NULL;
> > +    }
> 
> Can you explain this reorder - why is the order of listen_id and cm_id
> important?
> is that a failure on the destination?

listen_id is used only in server side.
Saw that listen_id gets created before cm_id.
So for symmetry destroyed cm_id before listen_id.
Checked the status of rdma_destroy_id() and both connection
identifier were getting destroyed successfully.
Please let me know if I should revert this change.  
 
> 
> >      if (rdma->channel) {
> >          rdma_destroy_event_channel(rdma->channel);
> >          rdma->channel = NULL;
> > @@ -2309,8 +2309,6 @@ static int qemu_rdma_connect(RDMAContext
> *rdma, Error **errp)
> >      if (ret) {
> >          perror("rdma_connect");
> >          ERROR(errp, "connecting to destination!");
> > -        rdma_destroy_id(rdma->cm_id);
> > -        rdma->cm_id = NULL;
> >          goto err_rdma_source_connect;
> >      }
> >
> > @@ -2319,8 +2317,6 @@ static int qemu_rdma_connect(RDMAContext
> *rdma, Error **errp)
> >          perror("rdma_get_cm_event after rdma_connect");
> >          ERROR(errp, "connecting to destination!");
> >          rdma_ack_cm_event(cm_event);
> > -        rdma_destroy_id(rdma->cm_id);
> > -        rdma->cm_id = NULL;
> >          goto err_rdma_source_connect;
> >      }
> >
> > @@ -2328,8 +2324,6 @@ static int qemu_rdma_connect(RDMAContext
> *rdma, Error **errp)
> >          perror("rdma_get_cm_event != EVENT_ESTABLISHED after
> rdma_connect");
> >          ERROR(errp, "connecting to destination!");
> >          rdma_ack_cm_event(cm_event);
> > -        rdma_destroy_id(rdma->cm_id);
> > -        rdma->cm_id = NULL;
> >          goto err_rdma_source_connect;
> >      }
> >      rdma->connected = true;
> > --
> > 1.7.1
> 
> Dave
> --
> Dr. David Alan Gilbert / address@hidden / Manchester, UK

Thanks,
Padmanabh



reply via email to

[Prev in Thread] Current Thread [Next in Thread]