[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 0/6] refactor RDMA live migration based on rsocket API
From: |
Jinpu Wang |
Subject: |
Re: [PATCH 0/6] refactor RDMA live migration based on rsocket API |
Date: |
Fri, 7 Jun 2024 07:53:34 +0200 |
Hi Gonglei, hi folks on the list,
On Tue, Jun 4, 2024 at 2:14 PM Gonglei <arei.gonglei@huawei.com> wrote:
>
> From: Jialin Wang <wangjialin23@huawei.com>
>
> Hi,
>
> This patch series attempts to refactor RDMA live migration by
> introducing a new QIOChannelRDMA class based on the rsocket API.
>
> The /usr/include/rdma/rsocket.h provides a higher level rsocket API
> that is a 1-1 match of the normal kernel 'sockets' API, which hides the
> detail of rdma protocol into rsocket and allows us to add support for
> some modern features like multifd more easily.
>
> Here is the previous discussion on refactoring RDMA live migration using
> the rsocket API:
>
> https://lore.kernel.org/qemu-devel/20240328130255.52257-1-philmd@linaro.org/
>
> We have encountered some bugs when using rsocket and plan to submit them to
> the rdma-core community.
>
> In addition, the use of rsocket makes our programming more convenient,
> but it must be noted that this method introduces multiple memory copies,
> which can be imagined that there will be a certain performance degradation,
> hoping that friends with RDMA network cards can help verify, thank you!
First thx for the effort, we are running migration tests on our IB
fabric, different generation of HCA from mellanox, the migration works
ok,
there are a few failures, Yu will share the result later separately.
The one blocker for the change is the old implementation and the new
rsocket implementation;
they don't talk to each other due to the effect of different wire
protocol during connection establishment.
eg the old RDMA migration has special control message during the
migration flow, which rsocket use a different control message, so
there lead to no way
to migrate VM using rdma transport pre to the rsocket patchset to a
new version with rsocket implementation.
Probably we should keep both implementation for a while, mark the old
implementation as deprecated, and promote the new implementation, and
high light in doc,
they are not compatible.
Regards!
Jinpu
>
> Jialin Wang (6):
> migration: remove RDMA live migration temporarily
> io: add QIOChannelRDMA class
> io/channel-rdma: support working in coroutine
> tests/unit: add test-io-channel-rdma.c
> migration: introduce new RDMA live migration
> migration/rdma: support multifd for RDMA migration
>
> docs/rdma.txt | 420 ---
> include/io/channel-rdma.h | 165 ++
> io/channel-rdma.c | 798 ++++++
> io/meson.build | 1 +
> io/trace-events | 14 +
> meson.build | 6 -
> migration/meson.build | 3 +-
> migration/migration-stats.c | 5 +-
> migration/migration-stats.h | 4 -
> migration/migration.c | 13 +-
> migration/migration.h | 9 -
> migration/multifd.c | 10 +
> migration/options.c | 16 -
> migration/options.h | 2 -
> migration/qemu-file.c | 1 -
> migration/ram.c | 90 +-
> migration/rdma.c | 4205 +----------------------------
> migration/rdma.h | 67 +-
> migration/savevm.c | 2 +-
> migration/trace-events | 68 +-
> qapi/migration.json | 13 +-
> scripts/analyze-migration.py | 3 -
> tests/unit/meson.build | 1 +
> tests/unit/test-io-channel-rdma.c | 276 ++
> 24 files changed, 1360 insertions(+), 4832 deletions(-)
> delete mode 100644 docs/rdma.txt
> create mode 100644 include/io/channel-rdma.h
> create mode 100644 io/channel-rdma.c
> create mode 100644 tests/unit/test-io-channel-rdma.c
>
> --
> 2.43.0
>
- Re: [PATCH 0/6] refactor RDMA live migration based on rsocket API, (continued)
Re: [PATCH 0/6] refactor RDMA live migration based on rsocket API, Daniel P . Berrangé, 2024/06/07
Re: [PATCH 0/6] refactor RDMA live migration based on rsocket API, Michael S. Tsirkin, 2024/06/05
Re: [PATCH 0/6] refactor RDMA live migration based on rsocket API,
Jinpu Wang <=