[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v3 resend/cleanup 1/8] rdma: update documentatio
From: |
Eric Blake |
Subject: |
Re: [Qemu-devel] [PATCH v3 resend/cleanup 1/8] rdma: update documentation to reflect new unpin support |
Date: |
Fri, 12 Jul 2013 11:09:02 -0600 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130625 Thunderbird/17.0.7 |
On 07/12/2013 08:40 AM, address@hidden wrote:
> From: "Michael R. Hines" <address@hidden>
>
> As requested, the protocol now includes memory unpinning support.
> This has been implemented in a non-optimized manner, in such a way
> that one could devise an LRU or other workload-specific information
> on top of the basic mechanism to influence the way unpinning happens
> during runtime.
>
> The feature is not yet user-facing, and is thus can only be enabled
> at compile-time.
>
> Reviewed-by: Eric Blake <address@hidden>
> Signed-off-by: Michael R. Hines <address@hidden>
> ---
> docs/rdma.txt | 51 ++++++++++++++++++++++++++++++---------------------
> 1 file changed, 30 insertions(+), 21 deletions(-)
I suggest splitting this patch into two; and cc-ing the first of the two
patches through qemu-trivial (since formatting cleanups can be applied
now, even while still waiting for a comprehensive review of the
algorithm in the rest of the series)
>
> diff --git a/docs/rdma.txt b/docs/rdma.txt
> index 45a4b1d..45d1c8a 100644
> --- a/docs/rdma.txt
> +++ b/docs/rdma.txt
> @@ -35,7 +35,7 @@ memory tracked during each live migration iteration round
> cannot keep pace
> with the rate of dirty memory produced by the workload.
>
> RDMA currently comes in two flavors: both Ethernet based (RoCE, or RDMA
> -over Convered Ethernet) as well as Infiniband-based. This implementation of
> +over Converged Ethernet) as well as Infiniband-based. This implementation of
Trivial
> migration using RDMA is capable of using both technologies because of
> the use of the OpenFabrics OFED software stack that abstracts out the
> programming model irrespective of the underlying hardware.
> @@ -188,9 +188,9 @@ header portion and a data portion (but together are
> transmitted
> as a single SEND message).
>
> Header:
> - * Length (of the data portion, uint32, network byte order)
> - * Type (what command to perform, uint32, network byte order)
> - * Repeat (Number of commands in data portion, same type only)
> + * Length (of the data portion, uint32, network byte order)
> + * Type (what command to perform, uint32, network byte
> order)
> + * Repeat (Number of commands in data portion, same type
> only)
trivial
>
> The 'Repeat' field is here to support future multiple page registrations
> in a single message without any need to change the protocol itself
> @@ -202,17 +202,19 @@ The maximum number of repeats is hard-coded to 4096.
> This is a conservative
> limit based on the maximum size of a SEND message along with emperical
> observations on the maximum future benefit of simultaneous page
> registrations.
>
> -The 'type' field has 10 different command values:
> - 1. Unused
> - 2. Error (sent to the source during bad things)
> - 3. Ready (control-channel is available)
> - 4. QEMU File (for sending non-live device state)
> - 5. RAM Blocks request (used right after connection setup)
> - 6. RAM Blocks result (used right after connection setup)
> - 7. Compress page (zap zero page and skip registration)
> - 8. Register request (dynamic chunk registration)
> - 9. Register result ('rkey' to be used by sender)
> - 10. Register finished (registration for current iteration finished)
> +The 'type' field has 12 different command values:
> + 1. Unused
> + 2. Error (sent to the source during bad things)
> + 3. Ready (control-channel is available)
> + 4. QEMU File (for sending non-live device state)
> + 5. RAM Blocks request (used right after connection setup)
> + 6. RAM Blocks result (used right after connection setup)
> + 7. Compress page (zap zero page and skip registration)
> + 8. Register request (dynamic chunk registration)
> + 9. Register result ('rkey' to be used by sender)
> + 10. Register finished (registration for current iteration
> finished)
reformatting is trivial,
> + 11. Unregister request (unpin previously registered memory)
> + 12. Unregister finished (confirmation that unpin completed)
addition belongs in the second patch (so that we don't have to wade
through that much trivial stuff to find the real changes)
>
> A single control message, as hinted above, can contain within the data
> portion an array of many commands of the same type. If there is more than
> @@ -243,7 +245,7 @@ qemu_rdma_exchange_send(header, data, optional response
> header & data):
> from the receiver to tell us that the receiver
> is *ready* for us to transmit some new bytes.
> 2. Optionally: if we are expecting a response from the command
> - (that we have no yet transmitted), let's post an RQ
> + (that we have not yet transmitted), let's post an RQ
trivial
> work request to receive that data a few moments later.
> 3. When the READY arrives, librdmacm will
> unblock us and we immediately post a RQ work request
> @@ -293,8 +295,10 @@ librdmacm provides the user with a 'private data' area
> to be exchanged
> at connection-setup time before any infiniband traffic is generated.
>
> Header:
> - * Version (protocol version validated before send/recv occurs), uint32,
> network byte order
> - * Flags (bitwise OR of each capability), uint32, network byte order
> + * Version (protocol version validated before send/recv occurs),
> + uint32, network byte order
> + * Flags (bitwise OR of each capability),
> + uint32, network byte order
trivial
>
> There is no data portion of this header right now, so there is
> no length field. The maximum size of the 'private data' section
> @@ -313,7 +317,7 @@ If the version is invalid, we throw an error.
> If the version is new, we only negotiate the capabilities that the
> requested version is able to perform and ignore the rest.
>
> -Currently there is only *one* capability in Version #1: dynamic page
> registration
> +Currently there is only one capability in Version #1: dynamic page
> registration
trivial
>
> Finally: Negotiation happens with the Flags field: If the primary-VM
> sets a flag, but the destination does not support this capability, it
> @@ -326,8 +330,8 @@ QEMUFileRDMA Interface:
>
> QEMUFileRDMA introduces a couple of new functions:
>
> -1. qemu_rdma_get_buffer() (QEMUFileOps rdma_read_ops)
> -2. qemu_rdma_put_buffer() (QEMUFileOps rdma_write_ops)
> +1. qemu_rdma_get_buffer() (QEMUFileOps rdma_read_ops)
> +2. qemu_rdma_put_buffer() (QEMUFileOps rdma_write_ops)
trivial
>
> These two functions are very short and simply use the protocol
> describe above to deliver bytes without changing the upper-level
> @@ -413,3 +417,8 @@ TODO:
> the use of KSM and ballooning while using RDMA.
> 4. Also, some form of balloon-device usage tracking would also
> help alleviate some issues.
> +5. Move UNREGISTER requests to a separate thread.
> +6. Use LRU to provide more fine-grained direction of UNREGISTER
> + requests for unpinning memory in an overcommitted environment.
> +7. Expose UNREGISTER support to the user by way of workload-specific
> + hints about application behavior.
>
new content
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature
- [Qemu-devel] [PATCH v3 resend/cleanup 0/8] rdma: core logic, mrhines, 2013/07/12
- [Qemu-devel] [PATCH v3 resend/cleanup 2/8] rdma: bugfix: ram_control_save_page(), mrhines, 2013/07/12
- [Qemu-devel] [PATCH v3 resend/cleanup 4/8] rdma: core logic, mrhines, 2013/07/12
- [Qemu-devel] [PATCH v3 resend/cleanup 3/8] rdma: introduce ram_handle_compressed(), mrhines, 2013/07/12
- [Qemu-devel] [PATCH v3 resend/cleanup 6/8] rdma: allow state transitions between other states besides ACTIVE, mrhines, 2013/07/12
- [Qemu-devel] [PATCH v3 resend/cleanup 1/8] rdma: update documentation to reflect new unpin support, mrhines, 2013/07/12
- Re: [Qemu-devel] [PATCH v3 resend/cleanup 1/8] rdma: update documentation to reflect new unpin support,
Eric Blake <=
[Qemu-devel] [PATCH v3 resend/cleanup 8/8] rdma: account for the time spent in MIG_STATE_SETUP through QMP, mrhines, 2013/07/12
[Qemu-devel] [PATCH v3 resend/cleanup 7/8] rdma: introduce MIG_STATE_NONE and change MIG_STATE_SETUP state transition, mrhines, 2013/07/12
[Qemu-devel] [PATCH v3 resend/cleanup 5/8] rdma: send pc.ram, mrhines, 2013/07/12