[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 12/17] vfio/migration: Implement VFIO migration protocol v

From: Avihai Horon
Subject: Re: [PATCH v3 12/17] vfio/migration: Implement VFIO migration protocol v2
Date: Thu, 24 Nov 2022 14:41:00 +0200
User-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.5.0

On 20/11/2022 11:34, Avihai Horon wrote:

On 17/11/2022 19:38, Alex Williamson wrote:
External email: Use caution opening links or attachments

On Thu, 17 Nov 2022 19:07:10 +0200
Avihai Horon <avihaih@nvidia.com> wrote:
On 16/11/2022 20:29, Alex Williamson wrote:
On Thu, 3 Nov 2022 18:16:15 +0200
Avihai Horon <avihaih@nvidia.com> wrote:
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index e784374453..62afc23a8c 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -44,8 +44,84 @@
   #define VFIO_MIG_FLAG_DEV_SETUP_STATE (0xffffffffef100003ULL)
   #define VFIO_MIG_FLAG_DEV_DATA_STATE (0xffffffffef100004ULL)

+#define VFIO_MIG_DATA_BUFFER_SIZE (1024 * 1024)
Add comment explaining heuristic of this size.
This is an arbitrary size we picked with mlx5 state size in mind.
Increasing this size to higher values (128M, 1G) didn't improve
performance in our testing.

How about this comment:
This is an initial value that doesn't consume much memory and provides
good performance.

Do you have other suggestion?
I'd lean more towards your description above, ex:

  * This is an arbitrary size based on migration of mlx5 devices, where
  * the worst case total device migration size is on the order of 100s
  * of MB.  Testing with larger values, ex. 128MB and 1GB, did not show
  * a performance improvement.

I think that provides sufficient information for someone who might come
later to have an understanding of the basis if they want to try to
optimize further.

OK, sounds good, I will add a comment like this.

@@ -804,34 +1090,51 @@ static int vfio_migration_init(VFIODevice *vbasedev)
           return -EINVAL;

-    ret = vfio_get_dev_region_info(vbasedev,
-                                   &info);
-    if (ret) {
-        return ret;
-    }
+    ret = vfio_migration_query_flags(vbasedev, &mig_flags);
+    if (!ret) {
+        /* Migration v2 */
+        /* Basic migration functionality must be supported */
+        if (!(mig_flags & VFIO_MIGRATION_STOP_COPY)) {
+            return -EOPNOTSUPP;
+        }
+        vbasedev->migration = g_new0(VFIOMigration, 1);
+        vbasedev->migration->device_state = VFIO_DEVICE_STATE_RUNNING; +        vbasedev->migration->data_buffer_size = VFIO_MIG_DATA_BUFFER_SIZE;
+        vbasedev->migration->data_buffer =
+ g_malloc0(vbasedev->migration->data_buffer_size);
So VFIO_MIG_DATA_BUFFER_SIZE is our chunk size, but why doesn't the
later addition of estimated device data size make any changes here?
I'd think we'd want to scale the buffer to the minimum of the reported
data size and some well documented heuristic for an upper bound.
As I wrote above, increasing this size to higher values (128M, 1G)
didn't improve performance in our testing.
We can always change it later on if some other heuristics are proven to
improve performance.
Note that I'm asking about a minimum buffer size, for example if
hisi_acc reports only 10s of KB for an estimated device size, why would
we still allocate VFIO_MIG_DATA_BUFFER_SIZE here?  Thanks,

This buffer is rather small and has little memory footprint.
Do you think it is worth the extra complexity of resizing the buffer?

Alex, WDYT?
Note that the reported estimated size is dynamic and might change from query to the other, potentially leaving us with smaller buffer size.

Also, as part of v4 I moved this allocation to vfio_save_setup(), so it will be allocated only during migration (when it's actually used) and only by src side.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]