qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Emulating device configuration / max_virtqueue_pairs in vhost-vdpa a


From: Eugenio Perez Martin
Subject: Re: Emulating device configuration / max_virtqueue_pairs in vhost-vdpa and vhost-user
Date: Thu, 2 Feb 2023 19:32:20 +0100

On Thu, Feb 2, 2023 at 4:41 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2023/2/1 15:49, Eugenio Perez Martin 写道:
> > On Wed, Feb 1, 2023 at 4:29 AM Jason Wang <jasowang@redhat.com> wrote:
> >> On Wed, Feb 1, 2023 at 3:11 AM Eugenio Perez Martin <eperezma@redhat.com> 
> >> wrote:
> >>> On Tue, Jan 31, 2023 at 8:10 PM Eugenio Perez Martin
> >>> <eperezma@redhat.com> wrote:
> >>>> Hi,
> >>>>
> >>>> The current approach of offering an emulated CVQ to the guest and map
> >>>> the commands to vhost-user is not scaling well:
> >>>> * Some devices already offer it, so the transformation is redundant.
> >>>> * There is no support for commands with variable length (RSS?)
> >>>>
> >>>> We can solve both of them by offering it through vhost-user the same
> >>>> way as vhost-vdpa do. With this approach qemu needs to track the
> >>>> commands, for similar reasons as vhost-vdpa: qemu needs to track the
> >>>> device status for live migration. vhost-user should use the same SVQ
> >>>> code for this, so we avoid duplications.
> >>>>
> >>>> One of the challenges here is to know what virtqueue to shadow /
> >>>> isolate. The vhost-user device may not have the same queues as the
> >>>> device frontend:
> >>>> * The first depends on the actual vhost-user device, and qemu fetches
> >>>> it with VHOST_USER_GET_QUEUE_NUM at the moment.
> >>>> * The qemu device frontend's is set by netdev queues= cmdline parameter 
> >>>> in qemu
> >>>>
> >>>> For the device, the CVQ is the last one it offers, but for the guest
> >>>> it is the last one offered in config space.
> >>>>
> >>>> To create a new vhost-user command to decrease that maximum number of
> >>>> queues may be an option. But we can do it without adding more
> >>>> commands, remapping the CVQ index at virtqueue setup. I think it
> >>>> should be doable using (struct vhost_dev).vq_index and maybe a few
> >>>> adjustments here and there.
> >>>>
> >>>> Thoughts?
> >>>>
> >>>> Thanks!
> >>>
> >>> (Starting a separated thread to vhost-vdpa related use case)
> >>>
> >>> This could also work for vhost-vdpa if we ever decide to honor netdev
> >>> queues argument. It is totally ignored now, as opposed to the rest of
> >>> backends:
> >>> * vhost-kernel, whose tap device has the requested number of queues.
> >>> * vhost-user, that errors with ("you are asking more queues than
> >>> supported") if the vhost-user parent device has less queues than
> >>> requested (by vhost-user msg VHOST_USER_GET_QUEUE_NUM).
> >>>
> >>> One of the reasons for this is that device configuration space is
> >>> totally passthrough, with the values for mtu, rss conditions, etc.
> >>> This is not ideal, as qemu cannot check src and destination
> >>> equivalence and they can change under the feets of the guest in the
> >>> event of a migration.
> >> This looks not the responsibility of qemu but the upper layer (to
> >> provision the same config/features in src/dst).
> > I think both share it. Or, at least, that it is inconsistent that QEMU
> > is in charge of checking / providing consistency for virtio features,
> > but not virtio-net config space.
> >
> > If we follow that to the extreme, we could simply delete the feature
> > checks, right?
>
>
> Just to make sure we are at the same page.
>
> If you mean deleting the feature checks in Qemu, then I think we can't
> do that.
>

So my point is, is it expected for the user that it can trust qemu
will migrate features like packed=off/on or tx/rx_queue_size=N but it
will not migrate mtu=N or queues=N?

I know the difference is if the field depends on virtio common config
or virtio net config space. A little bit higher level, if the feature
is common to all virtio devices or only virtio-net. If I'm working
with qemu, I don't know how think from the user POV the number of
queues is gray here, would it migrate, wouldn't? :).

> What I meant is.
>
> Consider vDPA is provisioned (either netlink or other way) with featureX
> and configY. It would be sufficient to validate if the emulated device
> features and configs matches exactly what vDPA device had.
>
> Technically, it should be possible to doing any mediation in the middle
> but it may cause a lot of troubles in the management and others, consider:
>
> featureX is not provisioned but emulated by Qemu, then it's almost
> impossible for the management to check the migration compatibility. If
> feature X can be easily emulated, it should be done in the layer of vDPA
> parent not Qemu, then it could be recognized by the management.
>

I kind of buy this, although I think it would be solvable by asking
qemu what features it emulates and then add to the feature bits mix.

But I'm not proposing to emulate features (here :) ) actually, but to
treat the device config space the same way as we treat the
virtio_pci_common_cfg, and emulate it all the time, effectively
homogenizing the same way as vhost-user etc is homogenized.

I understand the provision tool is a way to do it, maybe even more
convenient. Do all devices support it? Is it reasonable to expect that
all devices that will be migrated (into) will support it?

>
> >
> >>> External tools are needed for this, duplicating
> >>> part of the effort.
> >>>
> >>> Start intercepting config space accesses and offering an emulated one
> >>> to the guest with this kind of adjustments is beneficial, as it makes
> >>> vhost-vdpa more similar to the rest of backends, making the surprise
> >>> on a change way lower.
> >> This probably needs more thought, since vDPA already provides a kind
> >> of emulation in the kernel. My understanding is that it would be
> >> sufficient to add checks to make sure the config that guests see is
> >> consistent with what host provisioned?
> >>
> > With host provisioned you mean with "vdpa" tool or with qemu?
>
>
> Make sure the features and config of emulated device provided by Qemu
> matches the vDPA device provisioned via netlink or other mgmt API.
>

Yes, that is doable for sure. It should be enough with fetching the
config with VHOST_VDPA_GET_CONFIG ioctl, isn't it?

>
> > Also, we
> > need a way to communicate the guest values to it If those checks are
> > added in the kernel.
> >
> > The reasoning here is the same as above: QEMU already filters features
> > with its own emulated layer, so the operator can specify a feature
> > that will never appear to the guest.
>
>
> This needs to be done at the time of vDPA device provisioning. Otherwise
> we will end up with a lot of corner cases. E.g if 8 queue pairs is
> provisioned, do we allow starting a guest with 4 queue pairs?
>

In my proposal, qemu would adjust the number of queue pairs the guest
sees to 4 qps.

>
> >   It has other uses (abstract
> > between transport for example), but feature filtering is definitely a
> > thing there.
> >
> > A feature set to off in a VM (or that does not exist in that
> > particular qemu version) will never appear as on even in the case of
> > migration to modern qemu versions.
> >
> > We don't have the equivalent protection for device config space. QEMU
> > could assure a consistent MTU, number of queues, etc for the guest in
> > virtio_net_get_config (and equivalent for other kinds of devices).
> > QEMU already has some transformations there. It shouldn't take a lot
> > of code.
> >
> > Having said that:
> > * I'm ok with starting just with checks there instead of
> > transformations like the queues remap proposed here.
>
>
> I think we need to keep thing easier. Technically, we could do any kind
> of the mediation/emulation via Qemu, but we need only implement the one
> that is really needed.
>

I agree with this point, but I think we are just moving complexity.

Let's go back to the original vdpa question / problems:

1) There are parameters totally ignored by qemu's vhost-vdpa (mtu,
queues, etc), and a naive operator is surprised by this behavior. And
the guest will see that the config space changes suddenly in a
migration if the device provisioning is not done providing "all and
every" device config space value.

Should qemu forbid cmdline parameters if we're using vhost-vdpa backend? like:
* Backend queues= parameter (unused from its inclusion)
* Set mtu, speed, duplex, etc in qemu cmdline as long as the backend is vdpa.

2) For the second problem, maybe a spurious config interrupt is
missing? Is the device allowed to change all of them, like reduce rss
max table length? Or should the provision tool be able to fetch all of
these values from the original device and then send it to the
provision tool in the destination?

Thanks!

> Queue remapping might complicate a lot stuffs like notification area
> mapping etc.
>
> Thanks
>
>
> > * If we choose not to implement it, I'm not proposing to actually
> > delete the features checks, as I see them useful :).
> >
> > Thanks!
> >
>




reply via email to

[Prev in Thread] Current Thread [Next in Thread]