|
From: | Michael R. Hines |
Subject: | Re: [Qemu-devel] [PATCH v11 14/15] rdma: introduce MIG_STATE_NONE and change MIG_STATE_SETUP state transition |
Date: | Wed, 26 Jun 2013 10:09:07 -0400 |
User-agent: | Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130329 Thunderbird/17.0.5 |
On 06/26/2013 08:39 AM, Paolo Bonzini
wrote:
Il 26/06/2013 14:37, Michael R. Hines ha scritto:On 06/26/2013 02:37 AM, Paolo Bonzini wrote:Il 26/06/2013 02:31, Michael R. Hines ha scritto:On 06/25/2013 05:06 PM, Paolo Bonzini wrote:Il 25/06/2013 22:56, Michael R. Hines ha scritto:I was wrong - this does require a protocol extension. This is because the RDMA transfers are asynchronous, and thus we cannot know in advance that it is safe to unregister the memory associated with each individual transfer before the transfer actually completes. While the destination currently uses the protocol to participate in *registering* the page, the destination does not participate in the RDMA transfers themselves, only the source does, and thus would require a new exchange of messages to block and instruct the destination to unpin the memory.Yes, that's what I recalled too (really what mst told me :)). Does it need to be blocking though? As long as the pinning is blocking, and messages are processed in order, the source can proceed immediately after sending an unpin message. This assumes of course that the chunk is not being transmitted, and I am not sure how easy the source can determine that.No, they're not processed in order. In fact, not only does the device write out of order, but also the PCI bus writes out of order. This was such a problem in fact, that I fixed several bugs as a result a few weeks ago (v7 of the patch with an in-depth description). The destination simply cannot assume whatsoever what the ordering of the writes are - that's really the whole point of using RDMA in the first place so that the software can get out of the way of the transfer process to lower the latency of each transfer.The memory is processed out of order, but what about the messages? Those must be in order. Note that I wrote above "This assumes of course that the chunk is not being transmitted". Can the source know when an asynchronous transfer finished, and delay the unpinning until that time?Yes, the source does know. There's no problem unpinning on the source. But both sides must do the unpinning, not just the source. Did I misunderstand you? Are you suggesting *only* unpinning on the source?I'm suggesting (if possible) that the source only tells the destination to unpin once it knows it is safe for the destination to do it. As long as unpin and pin messages are not reordered, it should be possible to do it with an asynchronous message from the source to the destination. Paolo Oh, certainly. I agree. That's not a trivial patch, though (as we were originally shooting for). (I'll list the steps below on the QEMU wiki, for the record). This requires some steps: 1. First, maintain a new data structure: something like "These memory ranges are 'being unpinned'" - block all potential writes to these addresses until the unpinning completes. 2. Once the source unpin completes, send the asynchronous control channel message to the other side for unpinning. 2. Mark the data structure and return and allow the migration to continue with the next RDMA write. 3. Upon completion of the unpinning on the destination, respond to the source that it was finished. 4. Source then clears the data structure for the successfully unpinned memory ranges. 5. At this point, one or more writes may (or may not) be blocking on the unpinned memory areas and will poll the data structure and find that the unpinning has completed. 6. Then issue the new writes and proceed as normal. 7. Repeat step 1. |
[Prev in Thread] | Current Thread | [Next in Thread] |