Re: Out-of-Process Device Emulation session at KVM Forum 2020

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Out-of-Process Device Emulation session at KVM Forum 2020

From:	Jason Wang
Subject:	Re: Out-of-Process Device Emulation session at KVM Forum 2020
Date:	Tue, 3 Nov 2020 15:52:50 +0800
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0


On 2020/11/2 下午6:13, Stefan Hajnoczi wrote:

On Mon, Nov 02, 2020 at 10:51:18AM +0800, Jason Wang wrote:

On 2020/10/30 下午9:15, Stefan Hajnoczi wrote:

On Fri, Oct 30, 2020 at 12:08 PM Jason Wang <jasowang@redhat.com> wrote:

On 2020/10/30 下午7:13, Stefan Hajnoczi wrote:

On Fri, Oct 30, 2020 at 9:46 AM Jason Wang <jasowang@redhat.com> wrote:

On 2020/10/30 下午2:21, Stefan Hajnoczi wrote:

On Fri, Oct 30, 2020 at 3:04 AM Alex Williamson
<alex.williamson@redhat.com> wrote:

It's great to revisit ideas, but proclaiming a uAPI is bad solely
because the data transfer is opaque, without defining why that's bad,
evaluating the feasibility and implementation of defining a well
specified data format rather than protocol, including cross-vendor
support, or proposing any sort of alternative is not so helpful imo.

The migration approaches in VFIO and vDPA/vhost were designed for
different requirements and I think this is why there are different
perspectives on this. Here is a comparison and how VFIO could be
extended in the future. I see 3 levels of device state compatibility:

1. The device cannot save/load state blobs, instead userspace fetches
and restores specific values of the device's runtime state (e.g. last
processed ring index). This is the vhost approach.

2. The device can save/load state in a standard format. This is
similar to #1 except that there is a single read/write blob interface
instead of fine-grained get_FOO()/set_FOO() interfaces. This approach
pushes the migration state parsing into the device so that userspace
doesn't need knowledge of every device type. With this approach it is
possible for a device from vendor A to migrate to a device from vendor
B, as long as they both implement the same standard migration format.
The limitation of this approach is that vendor-specific state cannot
be transferred.

3. The device can save/load opaque blobs. This is the initial VFIO
approach.

I still don't get why it must be opaque.

If the device state format needs to be in the VMM then each device
needs explicit enablement in each VMM (QEMU, cloud-hypervisor, etc).

Let's invert the question: why does the VMM need to understand the
device state of a _passthrough_ device?

For better manageability, compatibility and debug-ability. If we depends
on a opaque structure, do we encourage device to implement its own
migration protocol? It would be very challenge.

For VFIO in the kernel, I suspect a uAPI that may result a opaque data
to be read or wrote from guest violates the Linux uAPI principle. It
will be very hard to maintain uABI or even impossible. It looks to me
VFIO is the first subsystem that is trying to do this.

I think our concepts of uAPI are different. The uAPI of read(2) and
write(2) does not define the structure of the data buffers. VFIO
device regions are exactly the same, the structure of the data is not
defined by the kernel uAPI.


I think we're talking about different things. It's not about the data
structure, it's about whether to data that reads from kernel can be
understood by userspace.

Maybe microcode and firmware loading is an example we agree on?


I think not. They are bytecodes that have

1) strict ABI definitions
2) understood by userspace

No, they can be proprietary formats that neither the Linux kernel nor
userspace can parse. For example, look at linux-firmware
(https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/about/)
it's just a collection of binary blobs. The format is not necessarily
public. The only restriction on that repo is that the binary blob must
be redistributable and users must be allowed to run them (i.e.
proprietary licenses can be used).

I think not. Obviously each firmware should have its own ABI no matterwhether its public or proprietary. For proprietary firmware, it shouldbe understood by the proprietary userspace counterpart.


Or look at other passthrough device interfaces like /dev/i2c or libusb.
They expose data to userspace without requiring a defined format. It's
the same as VFIO.

Again, it should have an ABI there (either device or spec) no matterwhether or not it's a transport layer. And there will be an endpoint inthe userspace know all the format.


In addition, look at kernel uAPIs where userspace acts simply as a data
transport for opaque data (e.g. where a userspace helper facilitates
communication but has no visibility of the data). I imagine that memory
encryption relies on this because the host kernel and userspace do not
have access to encrypted memory or associated state - but they need to
help migrate them to other hosts.



Which uAPI do you mean here?


I hope these examples show that such APIs don't pose a problem for the
Linux uAPI and are already in use. VFIO device state isn't doing
anything new here.

I feel that you tried to explain "why it can be" but not "why it mustbe". Trying to find one or two subsystems that have opaque uAPI withoutABI (though I suspect there will be one) may not convince here.


Thanks

     A device from vendor A cannot migrate to a device from
vendor B because the format is incompatible. This approach works well
when devices have unique guest-visible hardware interfaces so the
guest wouldn't be able to handle migrating a device from vendor A to a
device from vendor B anyway.

For VFIO I guess cross vendor live migration can't succeed unless we do
some cheats in device/vendor id.

Yes. I haven't looked into the details of PCI (Sub-)Device/Vendor IDs
and how to best enable migration but I hope that can be solved. The
simplest approach is to override the IDs and make them part of the
guest configuration.

That would be very tricky (or requires whitelist). E.g the opaque of the
src may match the opaque of the dst by chance.

Luckily identifying things based on magic constants has been solved
many times in the past.

A central identifier registry prevents all collisions but is a pain to
manage. Or use a 128-bit UUID and self-allocate the identifier with an
extremely low chance of collision:
https://en.wikipedia.org/wiki/Universally_unique_identifier#Collisions


I may miss something. I think we're talking about cross vendor live
migration.

Would you want src and dest have same UUID or not?

If they have different UUIDs, how could we know we can live migrate between
them.

If they have the same UUID, what's the rule of forcing the the vendors to
choose same UUID (a spec)?

I will send a separate email that describes how VFIO live migration can
work in more detail. I think it's possible to do it with existing ioctl
interface that Kirti has proposed and still prevent the risk of
incorrectly interpreting data that you have pointed out.

The document that I'm sending will allow us to discuss in more detail
and make the approach clearer.

Stefan

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Out-of-Process Device Emulation session at KVM Forum 2020, Paolo Bonzini, 2020/11/01
- Re: Out-of-Process Device Emulation session at KVM Forum 2020, Jason Wang, 2020/11/01
- Re: Out-of-Process Device Emulation session at KVM Forum 2020, Jason Wang, 2020/11/01
  - Re: Out-of-Process Device Emulation session at KVM Forum 2020, Stefan Hajnoczi, 2020/11/02
    - Re: Out-of-Process Device Emulation session at KVM Forum 2020, Jason Wang <=
    - Re: Out-of-Process Device Emulation session at KVM Forum 2020, Stefan Hajnoczi, 2020/11/03
    - Re: Out-of-Process Device Emulation session at KVM Forum 2020, Gerd Hoffmann, 2020/11/04
    - Re: Out-of-Process Device Emulation session at KVM Forum 2020, Michael S. Tsirkin, 2020/11/04
- Re: Out-of-Process Device Emulation session at KVM Forum 2020, Jason Wang, 2020/11/01
  - Re: Out-of-Process Device Emulation session at KVM Forum 2020, Stefan Hajnoczi, 2020/11/02
    - Re: Out-of-Process Device Emulation session at KVM Forum 2020, Michael S. Tsirkin, 2020/11/02
    - Re: Out-of-Process Device Emulation session at KVM Forum 2020, Stefan Hajnoczi, 2020/11/02

Prev by Date: [PATCH] target/microblaze: Fix possible array out of bounds in mmu_write()
Next by Date: Re: [PATCH v3 5/7] virtiofsd: Announce sub-mount points
Previous by thread: Re: Out-of-Process Device Emulation session at KVM Forum 2020
Next by thread: Re: Out-of-Process Device Emulation session at KVM Forum 2020
Index(es):
- Date
- Thread