[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [RFC v4 PATCH 00/49] Initial support of multi-process qemu - status
John G Johnson
Re: [RFC v4 PATCH 00/49] Initial support of multi-process qemu - status update
Mon, 13 Jan 2020 17:56:25 -0800
> On Jan 3, 2020, at 7:59 AM, Stefan Hajnoczi <address@hidden> wrote:
> On Thu, Jan 02, 2020 at 11:03:22AM +0000, Felipe Franciosi wrote:
>>> On Jan 2, 2020, at 10:42 AM, Stefan Hajnoczi <address@hidden> wrote:
>>> On Fri, Dec 20, 2019 at 10:22:37AM +0000, Daniel P. Berrangé wrote:
>>>> On Fri, Dec 20, 2019 at 09:47:12AM +0000, Stefan Hajnoczi wrote:
>>>>> On Thu, Dec 19, 2019 at 12:55:04PM +0000, Daniel P. Berrangé wrote:
>>>>>> On Thu, Dec 19, 2019 at 12:33:15PM +0000, Felipe Franciosi wrote:
>>>>>>>> On Dec 19, 2019, at 11:55 AM, Stefan Hajnoczi <address@hidden> wrote:
>>>>>>>> On Tue, Dec 17, 2019 at 10:57:17PM +0000, Felipe Franciosi wrote:
>>>>>>>>>> On Dec 17, 2019, at 5:33 PM, Stefan Hajnoczi <address@hidden> wrote:
>>>>>>>>>> On Mon, Dec 16, 2019 at 07:57:32PM +0000, Felipe Franciosi wrote:
>>>>>>>>>>>> On 16 Dec 2019, at 20:47, Elena Ufimtseva <address@hidden> wrote:
>>>>>>>>>>>> On Fri, Dec 13, 2019 at 10:41:16AM +0000, Stefan Hajnoczi wrote:
>>>>>>> To be clear: I'm very happy to have a userspace-only option for this,
>>>>>>> I just don't want to ditch the kernel module (yet, anyway). :)
>>>>>> If it doesn't create too large of a burden to support both, then I think
>>>>>> it is very desirable. IIUC, this is saying a kernel based solution as the
>>>>>> optimized/optimal solution, and userspace UNIX socket based option as the
>>>>>> generic "works everywhere" fallback solution.
>>>>> I'm slightly in favor of the kernel implementation because it keeps us
>>>>> better aligned with VFIO. That means solving problems in one place only
>>>>> and less reinventing the wheel.
>>>>> Knowing that a userspace implementation is possible is a plus though.
>>>>> Maybe that option will become attractive in the future and someone will
>>>>> develop it. In fact, a userspace implementation may be a cool Google
>>>>> Summer of Code project idea that I'd like to co-mentor.
>>>> If it is technically viable as an approach, then I think we should be
>>>> treating a fully unprivileged muser-over-UNIX socket as a higher priority
>>>> than just "maybe a GSoC student will want todo it".
>>>> Libvirt is getting strong message from KubeVirt project that they want to
>>>> be running both libvirtd and QEMU fully unprivileged. This allows their
>>>> containers to be unprivileged. Anything that requires privileges requires
>>>> jumping through extra hoops writing custom code in KubeVirt to do things
>>>> outside libvirt in side loaded privileged containers and this limits how
>>>> where those features can be used.
>>> Okay this makes sense.
>>> There needs to be a consensus on whether to go with a qdev-over-socket
>>> approach that is QEMU-specific and strongly discourages third-party
>>> device distribution or a muser-over-socket approach that offers a stable
>>> API for VMM interoperability and third-party device distribution.
>> The reason I dislike yet another offloading protocol (ie. there is
>> vhost, there is vfio, and then there would be qdev-over-socket) is
>> that we keep reinventing the wheel. I very much prefer picking
>> something solid (eg. VFIO) and keep investing on it.
> I like the idea of sticking close to VFIO too. The first step is
> figuring out whether VFIO can be mapped to a UNIX domain socket protocol
> and many non-VFIO protocol messages are required. Hopefully that extra
> non-VFIO stuff isn't too large.
I looked at this and think we could map VFIO commands over a
UNIX socket without a lot of difficulty. We'd have to use SCM
messages to pass file descriptors from the QEMU process to the
emulation process for certain operations, but that shouldn't be
a big problem. Here are the mission mode operations:
VFIO defines a number of configuration ioctl()s that we could
turn into messages, but if we make the protocol specific to PCI, then
all of the information they transmit (e.g., device regions and
interrupts) can be discovered by parsing the device's PCI config
space. A lot of the current VFIO code that parses config space could
be re-used to do this.
VFIO uses reads and writes on the VFIO file descriptor to
perform MMIOs to the device. The read/write offset encodes the VFIO
region and offset of the MMIO. (the VFIO regions correspond to PCI
BARs) These would have to be changed to send messages that include the
VFIO region and offset (and data for writes) to the emulation process.
VFIO creates eventfds that are sent to the kernel driver so it
can inject interrupts into a guest. We would have to send these
eventfds over the socket to the emulation process using SCM messages.
The emulation process could then trigger interrupts by writing on the
This is one place where I might diverge from VFIO. It uses an
ioctl to tell the kernel driver what areas of guest memory the device
can address. The driver then pins that memory so it can be programmed
into a HW IOMMU. We could avoid pinning of guest memory by adopting
the vhost-user idea of sending the file descriptors used by QEMU to
create guest memory to the emulation process, and having it mmap() the
guest itself. IOMMUs are handled by having the emulation process
request device DMA to guest PA translations from QEMU.
> If implementations can use the kernel uapi vfio header files then we're
> on track for compatibility with VFIO.
>>> This is just a more elaborate explanation for the "the cat is out of the
>>> bag" comments that have already been made on licensing. Does anyone
>>> still disagree or want to discuss further?
>>> If there is agreement that a stable API is okay then I think the
>>> practical way to do this is to first merge a cleaned-up version of
>>> multi-process QEMU as an unstable experimental API. Once it's being
>>> tested and used we can write a protocol specification and publish it as
>>> a stable interface when the spec has addressed most use cases.
>>> Does this sound good?
>> In that case, wouldn't it be preferable to revive our proposal from
>> Edinburgh (KVM Forum 2018)? Our prototypes moved more of the Qemu VFIO
>> code to "common" and added a "user" backend underneath it, similar to
>> how vhost-user-scsi moved some of vhost-scsi to vhost-scsi-common and
>> added vhost-user-scsi. It was centric on PCI, but it doesn't have to
>> be. The other side can be implemented in libmuser for facilitating things.
> That sounds good.
The emulation program API could be based on the current
libmuser API or the libvfio-user API. The protocol itself wouldn’t
care which is chosen. Our multi-processQEMU project would have to
change how devices are specified from the QEMU command line to the
emulation process command line.
Re: [RFC v4 PATCH 00/49] Initial support of multi-process qemu - status update, Elena Ufimtseva, 2020/01/02