[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM comm
From: |
Avi Cohen (A) |
Subject: |
Re: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication |
Date: |
Thu, 7 Dec 2017 07:54:48 +0000 |
There is already a virtio mechanism in which 2 VMs assigned a virtio device ,
are communicating via a veth pair in the host .
KVM just passes a pointer of the page of the writer VM to the reader VM -
resulting in excellent performance (no vSwitch in the middle)
**Question**: What is the advantage of vhost-pci compared to this ?
Best Regards
Avi
> -----Original Message-----
> From: Stefan Hajnoczi [mailto:address@hidden
> Sent: Thursday, 07 December, 2017 8:31 AM
> To: Wei Wang
> Cc: Stefan Hajnoczi; address@hidden; address@hidden; Yang,
> Zhiyong; address@hidden; address@hidden; Avi Cohen (A);
> address@hidden; address@hidden;
> address@hidden
> Subject: Re: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM
> communication
>
> On Thu, Dec 7, 2017 at 3:57 AM, Wei Wang <address@hidden> wrote:
> > On 12/07/2017 12:27 AM, Stefan Hajnoczi wrote:
> >>
> >> On Wed, Dec 6, 2017 at 4:09 PM, Wang, Wei W <address@hidden>
> wrote:
> >>>
> >>> On Wednesday, December 6, 2017 9:50 PM, Stefan Hajnoczi wrote:
> >>>>
> >>>> On Tue, Dec 05, 2017 at 11:33:09AM +0800, Wei Wang wrote:
> >>>>>
> >>>>> Vhost-pci is a point-to-point based inter-VM communication solution.
> >>>>> This patch series implements the vhost-pci-net device setup and
> >>>>> emulation. The device is implemented as a virtio device, and it is
> >>>>> set up via the vhost-user protocol to get the neessary info (e.g
> >>>>> the memory info of the remote VM, vring info).
> >>>>>
> >>>>> Currently, only the fundamental functions are implemented. More
> >>>>> features, such as MQ and live migration, will be updated in the future.
> >>>>>
> >>>>> The DPDK PMD of vhost-pci has been posted to the dpdk mailinglist here:
> >>>>> http://dpdk.org/ml/archives/dev/2017-November/082615.html
> >>>>
> >>>> I have asked questions about the scope of this feature. In
> >>>> particular, I think it's best to support all device types rather
> >>>> than just virtio-net. Here is a design document that shows how
> >>>> this can be achieved.
> >>>>
> >>>> What I'm proposing is different from the current approach:
> >>>> 1. It's a PCI adapter (see below for justification) 2. The
> >>>> vhost-user protocol is exposed by the device (not handled 100% in
> >>>> QEMU). Ultimately I think your approach would also need to do this.
> >>>>
> >>>> I'm not implementing this and not asking you to implement it.
> >>>> Let's just use this for discussion so we can figure out what the
> >>>> final vhost-pci will look like.
> >>>>
> >>>> Please let me know what you think, Wei, Michael, and others.
> >>>>
> >>> Thanks for sharing the thoughts. If I understand it correctly, the
> >>> key difference is that this approach tries to relay every vhost-user
> >>> msg to the guest. I'm not sure about the benefits of doing this.
> >>> To make data plane (i.e. driver to send/receive packets) work, I
> >>> think, mostly, the memory info and vring info are enough. Other
> >>> things like callfd, kickfd don't need to be sent to the guest, they
> >>> are needed by QEMU only for the eventfd and irqfd setup.
> >>
> >> Handling the vhost-user protocol inside QEMU and exposing a different
> >> interface to the guest makes the interface device-specific. This
> >> will cause extra work to support new devices (vhost-user-scsi,
> >> vhost-user-blk). It also makes development harder because you might
> >> have to learn 3 separate specifications to debug the system (virtio,
> >> vhost-user, vhost-pci-net).
> >>
> >> If vhost-user is mapped to a PCI device then these issues are solved.
> >
> >
> > I intend to have a different opinion about this:
> >
> > 1) Even relaying the msgs to the guest, QEMU still need to handle the
> > msg first, for example, it needs to decode the msg to see if it is the
> > ones (e.g. SET_MEM_TABLE, SET_VRING_KICK, SET_VRING_CALL) that should
> > be used for the device setup (e.g. mmap the memory given via
> > SET_MEM_TABLE). In this case, we will be likely to have 2 slave
> > handlers - one in the guest, another in QEMU device.
>
> In theory the vhost-pci PCI adapter could decide not to relay certain
> messages.
> As explained in the document, I think it's better to relay everything because
> some messages that only carry an fd still have a meaning. They are a signal
> that the master has entered a new state.
>
> The approach in this patch series doesn't really solve the 2 handler problem,
> it
> still needs to notify the guest when certain vhost-user messages are received
> from the master. The difference is just that it's non-trivial in this patch
> series
> because each message is handled on a case-by-case basis and has a custom
> interface (does not simply relay a vhost-user protocol message).
>
> A 1:1 model is simple and consistent. I think it will avoid bugs and design
> mistakes.
>
> > 2) If people already understand the vhost-user protocol, it would be
> > natural for them to understand the vhost-pci metadata - just the
> > obtained memory and vring info are put to the metadata area (no new
> things).
>
> This is debatable. It's like saying if you understand QEMU command-line
> options you will understand libvirt domain XML. They map to each other but
> how obvious that mapping is depends on the details.
> I'm saying a 1:1 mapping (reusing the vhost-user protocol message
> layout) is the cleanest option.
>
> > Inspired from your sharing, how about the following:
> > we can actually factor out a common vhost-pci layer, which handles all
> > the features that are common to all the vhost-pci series of devices
> > (vhost-pci-net, vhost-pci-blk,...) Coming to the implementation, we
> > can have a VhostpciDeviceClass (similar to VirtioDeviceClass), the
> > device realize sequence will be
> > virtio_device_realize()-->vhost_pci_device_realize()-->vhost_pci_net_d
> > evice_realize()
>
> Why have individual device types (vhost-pci-net, vhost-pci-blk, etc) instead
> of
> just a vhost-pci device?
>
> >>>> vhost-pci is a PCI adapter instead of a virtio device to allow
> >>>> doorbells and interrupts to be connected to the virtio device in
> >>>> the master VM in the most efficient way possible. This means the
> >>>> Vring call doorbell can be an ioeventfd that signals an irqfd
> >>>> inside the host kernel without host userspace involvement. The
> >>>> Vring kick interrupt can be an irqfd that is signalled by the
> >>>> master VM's virtqueue ioeventfd.
> >>>>
> >>>
> >>> This looks the same as the implementation of inter-VM notification in v2:
> >>> https://www.mail-archive.com/address@hidden/msg450005.html
> >>> which is fig. 4 here:
> >>> https://github.com/wei-w-wang/vhost-pci-discussion/blob/master/vhost
> >>> -pci-rfc2.0.pdf
> >>>
> >>> When the vhost-pci driver kicks its tx, the host signals the irqfd
> >>> of virtio-net's rx. I think this has already bypassed the host
> >>> userspace (thanks to the fast mmio implementation)
> >>
> >> Yes, I think the irqfd <-> ioeventfd mapping is good. Perhaps it
> >> even makes sense to implement a special fused_irq_ioevent_fd in the
> >> host kernel to bypass the need for a kernel thread to read the
> >> eventfd so that an interrupt can be injected (i.e. to make the
> >> operation synchronous).
> >>
> >> Is the tx virtqueue in your inter-VM notification v2 series a real
> >> virtqueue that gets used? Or is it just a dummy virtqueue that
> >> you're using for the ioeventfd doorbell? It looks like
> >> vpnet_handle_vq() is empty so it's really just a dummy. The actual
> >> virtqueue is in the vhost-user master guest memory.
> >
> >
> >
> > Yes, that tx is a dummy actually, just created to use its doorbell.
> > Currently, with virtio_device, I think ioeventfd comes with virtqueue only.
> > Actually, I think we could have the issues solved by vhost-pci. For
> > example, reserve a piece of the BAR area for ioeventfd. The bar layout can
> be:
> > BAR 2:
> > 0~4k: vhost-pci device specific usages (ioeventfd etc)
> > 4k~8k: metadata (memory info and vring info)
> > 8k~64GB: remote guest memory
> > (we can make the bar size (64GB is the default value used)
> > configurable via qemu cmdline)
>
> Why use a virtio device? The doorbell and shared memory don't fit the virtio
> architecture. There are no real virtqueues. This makes it a strange virtio
> device.
>
> Stefan
- Re: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication, (continued)
- Re: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication, Stefan Hajnoczi, 2017/12/05
- Re: [Qemu-devel] [PATCH v3 0/7] Vhost-pci for inter-VM communication, Michael S. Tsirkin, 2017/12/05
- Re: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication, Stefan Hajnoczi, 2017/12/05
- Re: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication, Stefan Hajnoczi, 2017/12/06
- Re: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication, Wang, Wei W, 2017/12/06
- Re: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication, Stefan Hajnoczi, 2017/12/07
- Re: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication,
Avi Cohen (A) <=
- Re: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication, Stefan Hajnoczi, 2017/12/07
- Re: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication, Jason Wang, 2017/12/07
- Re: [Qemu-devel] [virtio-dev] Re: [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication, Stefan Hajnoczi, 2017/12/07
- Re: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication, Michael S. Tsirkin, 2017/12/07
- Re: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication, Wei Wang, 2017/12/07
- Re: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication, Stefan Hajnoczi, 2017/12/07
- Re: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication, Michael S. Tsirkin, 2017/12/07
- Re: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication, Stefan Hajnoczi, 2017/12/07
- Re: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication, Michael S. Tsirkin, 2017/12/07
- Re: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication, Stefan Hajnoczi, 2017/12/07