Re: [PATCH v10 09/12] vfio/migration: Implement VFIO migration protocol

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v10 09/12] vfio/migration: Implement VFIO migration protocol

From:	Avihai Horon
Subject:	Re: [PATCH v10 09/12] vfio/migration: Implement VFIO migration protocol v2
Date:	Wed, 15 Feb 2023 20:23:12 +0200
User-agent:	Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.6.1


On 15/02/2023 15:01, Juan Quintela wrote:

External email: Use caution opening links or attachments


Avihai Horon <avihaih@nvidia.com> wrote:

Implement the basic mandatory part of VFIO migration protocol v2.
This includes all functionality that is necessary to support
VFIO_MIGRATION_STOP_COPY part of the v2 protocol.

The two protocols, v1 and v2, will co-exist and in the following patches
v1 protocol code will be removed.

There are several main differences between v1 and v2 protocols:
- VFIO device state is now represented as a finite state machine instead
   of a bitmap.

- Migration interface with kernel is now done using VFIO_DEVICE_FEATURE
   ioctl and normal read() and write() instead of the migration region.

- Pre-copy is made optional in v2 protocol. Support for pre-copy will be
   added later on.

Detailed information about VFIO migration protocol v2 and its difference
compared to v1 protocol can be found here [1].

[1]
https://lore.kernel.org/all/20220224142024.147653-10-yishaih@nvidia.com/

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
+/*
+ * Migration size of VFIO devices can be as little as a few KBs or as big as
+ * many GBs. This value should be big enough to cover the worst case.
+ */
+#define VFIO_MIG_STOP_COPY_SIZE (100 * GiB)

Wow O:-)

+
+/*
+ * Only exact function is implemented and not estimate function. The reason is
+ * that during pre-copy phase of migration the estimate function is called
+ * repeatedly while pending RAM size is over the threshold, thus migration
+ * can't converge and querying the VFIO device pending data size is useless.
+ */

You can do it after this is merge, but I think you can do better than
this.  Something in the lines of:


// I put it in a global variable, but it really needs to be in
VFIODevice to be // able to support several devices.  You get the idea
O:-)

static uint64_t cached_size = -1;

static void vfio_state_pending_exact(void *opaque, uint64_t *res_precopy_only,
                                      uint64_t *res_compatible,
                                      uint64_t *res_postcopy_only)
{
     VFIODevice *vbasedev = opaque;
     uint64_t stop_copy_size = VFIO_MIG_STOP_COPY_SIZE;

     /*
      * If getting pending migration size fails, VFIO_MIG_STOP_COPY_SIZE is
      * reported so downtime limit won't be violated.
      */
     vfio_query_stop_copy_size(vbasedev, &stop_copy_size);
     *res_precopy_only += stop_copy_size;
     cached_size = stop_copy_size;

     trace_vfio_state_pending_exact(vbasedev->name, *res_precopy_only,
                                    *res_postcopy_only, *res_compatible,
                                    stop_copy_size);
}


static void vfio_state_pending_estimate(void *opaque, uint64_t 
*res_precopy_only,
                                         uint64_t *res_compatible,
                                         uint64_t *res_postcopy_only)
{
     VFIODevice *vbasedev = opaque;
     uint64_t stop_copy_size = VFIO_MIG_STOP_COPY_SIZE;

     if (cached_size == -1) {
         uint64_t res_precopy;
         uint64_t res_compatible;
         uint64_t res_postcopy;
         vfio_state_pending_exact(opaque, &res_precopy, &res_compatible, 
&res_postcopy);
     }
     *res_precopy_only += cached_size;
}

In the next series, which will add pre-copy support to VFIO migration(v1 was sent [1] but isn't rebased on your pull reqs yet), I am going todo something similar to what you suggested.It will be like you did here but with pre-copy data size (data which canbe transferred during pre-copy phase) instead of the stop_copy_size.

Plus, I don't think caching the stop_copy_size and reporting the cachedvalue in the estimate handler fits the best here,because stop_copy_size doesn't decrease by pre-copy iterations asopposed to RAM pre-copy data, for example.

So I would rather keep things as they are and add something similar toyour suggestion in the pre-copy series.


Thanks!

[1]https://lore.kernel.org/qemu-devel/20230126184948.10478-2-avihaih@nvidia.com/

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [PATCH v10 04/12] migration/qemu-file: Add qemu_file_get_to_fd(), (continued)
- [PATCH v10 05/12] vfio/common: Change vfio_devices_all_running_and_saving() logic to equivalent one, Avihai Horon, 2023/02/09
- [PATCH v10 06/12] vfio/migration: Block multiple devices migration, Avihai Horon, 2023/02/09
  - Re: [PATCH v10 06/12] vfio/migration: Block multiple devices migration, Cédric Le Goater, 2023/02/10
  - Re: [PATCH v10 06/12] vfio/migration: Block multiple devices migration, Juan Quintela, 2023/02/15
- [PATCH v10 08/12] vfio/migration: Rename functions/structs related to v1 protocol, Avihai Horon, 2023/02/09
- [PATCH v10 07/12] vfio/migration: Move migration v1 logic to vfio_migration_init(), Avihai Horon, 2023/02/09
- [PATCH v10 09/12] vfio/migration: Implement VFIO migration protocol v2, Avihai Horon, 2023/02/09
  - Re: [PATCH v10 09/12] vfio/migration: Implement VFIO migration protocol v2, Juan Quintela, 2023/02/15
    - Re: [PATCH v10 09/12] vfio/migration: Implement VFIO migration protocol v2, Avihai Horon <=
    - Re: [PATCH v10 09/12] vfio/migration: Implement VFIO migration protocol v2, Alex Williamson, 2023/02/15
    - Re: [PATCH v10 09/12] vfio/migration: Implement VFIO migration protocol v2, Avihai Horon, 2023/02/16
- [PATCH v10 11/12] vfio: Alphabetize migration section of VFIO trace-events file, Avihai Horon, 2023/02/09
  - Re: [PATCH v10 11/12] vfio: Alphabetize migration section of VFIO trace-events file, Juan Quintela, 2023/02/15
- [PATCH v10 10/12] vfio/migration: Remove VFIO migration protocol v1, Avihai Horon, 2023/02/09
  - Re: [PATCH v10 10/12] vfio/migration: Remove VFIO migration protocol v1, Juan Quintela, 2023/02/15
- [PATCH v10 12/12] docs/devel: Align VFIO migration docs to v2 protocol, Avihai Horon, 2023/02/09

Prev by Date: Re: [PATCH 1/4] util/cacheflush: fix illegal instruction on windows-arm64
Next by Date: Re: [PATCH v2 3/3] virtio-scsi: reset SCSI devices from main loop thread
Previous by thread: Re: [PATCH v10 09/12] vfio/migration: Implement VFIO migration protocol v2
Next by thread: Re: [PATCH v10 09/12] vfio/migration: Implement VFIO migration protocol v2
Index(es):
- Date
- Thread