qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 03/20] vfio/migration: Add VFIO migration pre-copy support


From: Alex Williamson
Subject: Re: [PATCH v2 03/20] vfio/migration: Add VFIO migration pre-copy support
Date: Mon, 27 Feb 2023 09:14:44 -0700

On Sun, 26 Feb 2023 18:43:50 +0200
Avihai Horon <avihaih@nvidia.com> wrote:

> On 23/02/2023 23:16, Alex Williamson wrote:
> > External email: Use caution opening links or attachments
> >
> >
> > On Thu, 23 Feb 2023 17:25:12 +0200
> > Avihai Horon <avihaih@nvidia.com> wrote:
> >  
> >> On 22/02/2023 22:58, Alex Williamson wrote:  
> >>> External email: Use caution opening links or attachments
> >>>
> >>>
> >>> On Wed, 22 Feb 2023 19:48:58 +0200
> >>> Avihai Horon <avihaih@nvidia.com> wrote:
> >>>  
> >>>> @@ -302,23 +380,44 @@ static void vfio_save_cleanup(void *opaque)
> >>>>        trace_vfio_save_cleanup(vbasedev->name);
> >>>>    }
> >>>>
> >>>> +static void vfio_state_pending_estimate(void *opaque, uint64_t 
> >>>> threshold_size,
> >>>> +                                        uint64_t *must_precopy,
> >>>> +                                        uint64_t *can_postcopy)
> >>>> +{
> >>>> +    VFIODevice *vbasedev = opaque;
> >>>> +    VFIOMigration *migration = vbasedev->migration;
> >>>> +
> >>>> +    if (migration->device_state != VFIO_DEVICE_STATE_PRE_COPY) {
> >>>> +        return;
> >>>> +    }
> >>>> +
> >>>> +    /*
> >>>> +     * Initial size should be transferred during pre-copy phase so 
> >>>> stop-copy
> >>>> +     * phase will not be slowed down. Report threshold_size to force 
> >>>> another
> >>>> +     * pre-copy iteration.
> >>>> +     */
> >>>> +    *must_precopy += migration->precopy_init_size ?
> >>>> +                         threshold_size :
> >>>> +                         migration->precopy_dirty_size;  
> >>> This sure feels like we're feeding false data back to the iterator to
> >>> spoof it to run another iteration, when the vfio migration protocol
> >>> only recommends that initial_bytes reaches zero before proceeding to
> >>> stop-copy, it's not a requirement.  What benefit is actually observed
> >>> from this?  Why is this required for initial pre-copy support?  It
> >>> seems devious.  
> >> As previously discussed in the thread that added the pre-copy uAPI [1],
> >> the init_bytes can be used by drivers to reduce the downtime.
> >> For example, mlx5 transfers some metadata to the target so it will be
> >> able to pre-allocate resources etc.
> >>
> >> [1]
> >> https://lore.kernel.org/kvm/ae4a6259-349d-0131-896c-7a6ea775cc9e@nvidia.com/
> >>   
> > Yes, but how does that become a requirement to QEMU that it must
> > iterate until the initial segment is complete?  Especially when we need
> > to trigger that behavior via such nefarious means.  AIUI, QEMU should
> > be allowed to move to stop-copy at any point.  We should make efforts
> > that QEMU would never decide on its own to move from pre-copy to
> > stop-copy without completing the init_bytes (which sounds suspiciously
> > like the purpose of @must_precopy),  
> 
> @must_precopy represents the pending bytes that must be transferred 
> during pre-copy or stop-copy. If it's under the threshold, then 
> migration will move to stop-copy and be completed.
> So simply adding init_bytes to @must_precopy will not guarantee that we 
> send all init_bytes before moving to stop-copy, since the transition to 
> stop-copy can happen when @must_precopy != 0.

But we have no requirement to send all init_bytes before stop-copy.
This is a hack to achieve a theoretical benefit that a driver might be
able to improve the latency on the target by completing another
iteration.  If drivers are filling in a "must_precopy" arg, it sounds
like even if migration moves to stop-copy, that data should be migrated
first and deferring stop-copy could potentially extend the migration in
other areas.

> >   but if, for instance a user forces a
> > transition to stop-copy, I don't see that we have any business to
> > impose a policy to delay that until the init_bytes is complete.  
> 
> Is there a way a user can force the migration to move to stop-copy?
> Looking at migration code, it seems that the only way to move to 
> stop-copy is if @must_precopy is below the threshold.
> If so, then this is our effort to make QEMU send all init_bytes before 
> moving to stop_copy and we can only benefit from it.

But we have no requirement to send all init_bytes before stop-copy.
This is a hack to achieve a theoretical benefit that a driver might be
able to improve the latency on the target by completing another
iteration.  If drivers are filling in a "must_precopy" arg, it sounds
like even if migration moves to stop-copy, that data should be migrated
first and deferring stop-copy could potentially extend the migration in
other areas.
 
> Regarding how to do it -- maybe instead of spoofing @must_precopy we can 
> introduce a new parameter in upper migration layer (e.g., @init_precopy) 
> and add another condition in migration layer that it must be zero to 
> move to stop-copy.

Why not just move to stop-copy but transfer all must_precopy data
first?  That would seem to align with the naming to me.  I don't think
the device actually cares if the transfer happens while the device is
running or stopped, it just wants it at the target device early enough
to start configuration, right?

I'd drop this for an initial implementation, the uAPI does not require
that QEMU complete init_bytes before transitioning to stop-copy and
this is clearly not a very clean or well justified means to try to
achieve that as a non-requirement.  Thanks,

Alex




reply via email to

[Prev in Thread] Current Thread [Next in Thread]