qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] An RDMA race?


From: Michael R. Hines
Subject: Re: [Qemu-devel] An RDMA race?
Date: Sun, 20 Dec 2015 15:08:19 +0800

Adding such a control message would defeat the benefits of RDMA, as there shouldn't be any signalling in the actual DMA path, or RDMA latency would be too high. If you're sending control messages for individual writes, then you need to change up your design. It's OK to design ACKs for groups of writes, depending on the requirements.

So, the out-of-order issue you're seeing is only with your new message, not the original messages?

Can you describe/document it in more detail so I can help advise?

- Michael

On Mon, Dec 14, 2015 at 6:53 PM, Dr. David Alan Gilbert <address@hidden> wrote:
* Michael R. Hines (address@hidden) wrote:
> David,
>
> Thanks for including my email directly. It helps a lot.
>
> Below, I'm going to assume that only "dest" is calling
> qemu_rdma_exchange_recv()
> and only src is calling qemu_rdma_exchange_send(), since you didn't specify
> who
> is sending and who is receiving.
>
> If that assumption is wrong, please respond again.

That's correct.

> Comments inline.....
>
> On Sat, Dec 12, 2015 at 1:48 AM, Dr. David Alan Gilbert <address@hidden
> > wrote:
>
> > Hi Michael,
> >    I think I've got an RDMA race condition, but I'm being a little
> > cautious at the moment and wondered if you agree with the following
> > diagnosis.
> >
> > It's showing up in a world of mine that's sending more control messages
> > from the destination->source and I'm seeing the following.
> >
> > We normally expect:
> >
> >    src                        dest
> >      ----------->control ready->
> >
>
> If src is sending, this is not correct. Dest should send the ready message
> if it is receiving, not src, which breaks the above assumption. So, I'll
> reverse the assumption previously and continue with your observation and
> assume that src is receiving instead of dest, which should instead look
> like:

Gah! Yes, I got the label the wrong way around; it's dest sending control ready.

> src  (receiving)                      dest (sending)
>      ----------->control ready->
>
>
>
> >    Sees SEND_CONTROL signal to ack that it has been sent
> >
>
> I'll assume here that you meant that dest sees the ready message and is
> then later sends something.
>
>
> >          <-----control message--
> >    Sees RECV_CONTROL message from dest
> >
> >
> Similar assumption for the receiver (src).
>
>
> > but what I'm seeing is:
> >    src                        dest
> >      ----------->control ready->
> >          <-----control message--
> >    Sees RECV_CONTROL message from dest
> >
>
> hmmmmm....
>
>
> >    Sees SEND_CONTROL signal to ack that it has been sent
> >
> >
> There's not enough information here....... do you have a multi-threaded
> send or receive or something?

No, I've been trying to wire RDMA into the COLO fault-tolerant setup;
so the change which got me to trigger this bug was that I'd
added a new control message 'notify write' which explicitly
told the destination it had a page written to; at the RDMA level
that was the only change.

> Do the work request IDs match up?

Yes I think so; I also added a sequence number to the 'ready' messages
to check I wasn't losing one.
I had a chat to one of our RDMA guys (Doug Ledford) and he said
it's perfectly legal for RDMA to take longer to return the signal
from the send than for the round trip of the destination responding;
the 'signal' doesn't happen until an ack has been received from the
destination card anyway, so the ack can get delayed or retried.
So I think we do need to fix this; the question then is how do we fix
it for all control messages without breaking anything else.   Are there
any cases that rely on having received the signal from the send before
continuing, or could i just do what I'm doing for all control messages?

Dave

> - Michael
--
Dr. David Alan Gilbert / address@hidden / Manchester, UK



--
/*
 * Michael R. Hines
 * https://michael.hinespot.com
 */

reply via email to

[Prev in Thread] Current Thread [Next in Thread]