[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 12/17] vfio/migration: Implement VFIO migration protocol v

From: Avihai Horon
Subject: Re: [PATCH v3 12/17] vfio/migration: Implement VFIO migration protocol v2
Date: Sun, 20 Nov 2022 10:46:13 +0200
User-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.4.1

On 17/11/2022 19:24, Jason Gunthorpe wrote:
On Thu, Nov 17, 2022 at 07:07:10PM +0200, Avihai Horon wrote:
+    }
+    if (mig_state->data_fd != -1) {
+        if (migration->data_fd != -1) {
+            /*
+             * This can happen if the device is asynchronously reset and
+             * terminates a data transfer.
+             */
+            error_report("%s: data_fd out of sync", vbasedev->name);
+            close(mig_state->data_fd);
+            return -1;
Should we go to recover_state here?  Is migration->device_state
invalid?  -EBADF?
Yes, we should.
Although VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE ioctl above succeeded, setting
the device state didn't *really* succeed, as the data_fd went out of sync.
So we should go to recover_state and return -EBADF.
The state did succeed and it is now "new_state". Getting an
unexpected data_fd means it did something like RUNNING->PRE_COPY_P2P
when the code was expecting PRE_COPY->PRE_COPY_P2P.

It is actually in PRE_COPY_P2P but the in-progress migration must be
stopped and the kernel would have made the migration->data_fd
permanently return some error when it went async to RUNNING.

The recovery is to resart the migration (of this device?) from the

Yes, and restart is what's happening here - the -EBADF that we return here will cause the migration to be aborted. I didn't mean that we should go to recover_state *instead* of returning an error.

But rethinking about it, I think you are right - recover_state should be used only if we can't set the device in new_state (i.e., there is some error in device functionality). In the "data_fd out of sync" case, we did set the device in new_state (no error in device functionality), but data_fd got mixed up, so we should just abort migration and restart it again.

So bottom line, I think we should just return -EBADF here to abort migration.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]