qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v11 14/15] rdma: introduce MIG_STATE_NONE and ch


From: Michael R. Hines
Subject: Re: [Qemu-devel] [PATCH v11 14/15] rdma: introduce MIG_STATE_NONE and change MIG_STATE_SETUP state transition
Date: Wed, 26 Jun 2013 10:09:07 -0400
User-agent: Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130329 Thunderbird/17.0.5

On 06/26/2013 08:39 AM, Paolo Bonzini wrote:
Il 26/06/2013 14:37, Michael R. Hines ha scritto:
On 06/26/2013 02:37 AM, Paolo Bonzini wrote:
Il 26/06/2013 02:31, Michael R. Hines ha scritto:
On 06/25/2013 05:06 PM, Paolo Bonzini wrote:
Il 25/06/2013 22:56, Michael R. Hines ha scritto:
I was wrong - this does require a protocol extension.

This is because the RDMA transfers are asynchronous, and thus
we cannot know in advance that it is safe to unregister the memory
associated with each individual transfer before the transfer actually
completes.

While the destination currently uses the protocol to participate in
*registering* the page, the destination does not participate in the
RDMA transfers themselves, only the source does, and thus would
require a new exchange of messages to block and instruct the
destination to unpin the memory.
Yes, that's what I recalled too (really what mst told me :)).  Does it
need to be blocking though?  As long as the pinning is blocking, and
messages are processed in order, the source can proceed immediately
after sending an unpin message.  This assumes of course that the chunk
is not being transmitted, and I am not sure how easy the source can
determine that.
No, they're not processed in order. In fact, not only does the device
write out of order, but also the PCI bus writes out of order.
This was such a problem in fact, that I fixed several bugs as a result
a few weeks ago (v7 of the patch with an in-depth description).

The destination simply cannot assume whatsoever what the ordering
of the writes are - that's really the whole point of using RDMA in the
first place so that the software can get out of the way of the transfer
process to lower the latency of each transfer.
The memory is processed out of order, but what about the messages?
Those must be in order.

Note that I wrote above "This assumes of course that the chunk is not
being transmitted".  Can the source know when an asynchronous transfer
finished, and delay the unpinning until that time?
Yes, the source does know. There's no problem unpinning on the source.

But both sides must do the unpinning, not just the source.

Did I misunderstand you? Are you suggesting *only* unpinning on the source?
I'm suggesting (if possible) that the source only tells the destination
to unpin once it knows it is safe for the destination to do it.  As long
as unpin and pin messages are not reordered, it should be possible to do
it with an asynchronous message from the source to the destination.

Paolo


Oh, certainly. I agree. That's not a trivial patch, though (as we were
originally shooting for).

(I'll list the steps below on the QEMU wiki, for the record).

This requires some steps:
1. First, maintain a new data structure: something like
    "These memory ranges are 'being unpinned'" - block all potential writes
    to these addresses until the unpinning completes.
2. Once the source unpin completes, send the asynchronous control channel message
    to the other side for unpinning.
2. Mark the data structure and return and allow the migration to continue
    with the next RDMA write.
3. Upon completion of the unpinning on the destination,
    respond to the source that it was finished.
4. Source then clears the data structure for the successfully unpinned memory ranges.
5. At this point, one or more writes may (or may not) be blocking on the
    unpinned memory areas and will poll the data structure and find that
    the unpinning has completed.
6. Then issue the new writes and proceed as normal.
7. Repeat step 1.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]