qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH RFC] hw/pvrdma: Proposal of a new pvrdma device


From: Leon Romanovsky
Subject: Re: [Qemu-devel] [PATCH RFC] hw/pvrdma: Proposal of a new pvrdma device
Date: Tue, 4 Apr 2017 20:33:49 +0300
User-agent: Mutt/1.8.0 (2017-02-23)

On Tue, Apr 04, 2017 at 04:38:40PM +0300, Marcel Apfelbaum wrote:
> On 04/03/2017 09:23 AM, Leon Romanovsky wrote:
> > On Fri, Mar 31, 2017 at 06:45:43PM +0300, Marcel Apfelbaum wrote:
> > > On 03/30/2017 11:28 PM, Doug Ledford wrote:
> > > > On 3/30/17 9:13 AM, Leon Romanovsky wrote:
> > > > > On Thu, Mar 30, 2017 at 02:12:21PM +0300, Marcel Apfelbaum wrote:
> > > > > > From: Yuval Shaia <address@hidden>
> > > > > >
> > > > > >  Hi,
> > > > > >
> > > > > >  General description
> > > > > >  ===================
> > > > > >  This is a very early RFC of a new RoCE emulated device
> > > > > >  that enables guests to use the RDMA stack without having
> > > > > >  a real hardware in the host.
> > > > > >
> > > > > >  The current implementation supports only VM to VM communication
> > > > > >  on the same host.
> > > > > >  Down the road we plan to make possible to be able to support
> > > > > >  inter-machine communication by utilizing physical RoCE devices
> > > > > >  or Soft RoCE.
> > > > > >
> > > > > >  The goals are:
> > > > > >  - Reach fast and secure loos-less Inter-VM data exchange.
> > > > > >  - Support remote VMs or bare metal machines.
> > > > > >  - Allow VMs migration.
> > > > > >  - Do not require to pin all VM memory.
> > > > > >
> > > > > >
> > > > > >  Objective
> > > > > >  =========
> > > > > >  Have a QEMU implementation of the PVRDMA device. We aim to do so 
> > > > > > without
> > > > > >  any change in the PVRDMA guest driver which is already merged into 
> > > > > > the
> > > > > >  upstream kernel.
> > > > > >
> > > > > >
> > > > > >  RFC status
> > > > > >  ===========
> > > > > >  The project is in early development stages and supports
> > > > > >  only basic send/receive operations.
> > > > > >
> > > > > >  We present it so we can get feedbacks on design,
> > > > > >  feature demands and to receive comments from the
> > > > > >  community pointing us to the "right" direction.
> > > > >
> > > > > If to judge by the feedback which you got from RDMA community
> > > > > for kernel proposal [1], this community failed to understand:
> > > > > 1. Why do you need new module?
> > > >
> > > > In this case, this is a qemu module to allow qemu to provide a virt 
> > > > rdma device to guests that is compatible with the device provided by 
> > > > VMWare's ESX product.  Right now, the vmware_pvrdma driver
> > > > works only when the guest is running on a VMWare ESX server product, 
> > > > this would change that.  Marcel mentioned that they are currently 
> > > > making it compatible because that's the easiest/quickest thing to
> > > > do, but in the future they might extend beyond what VMWare's virt rdma 
> > > > driver provides/uses and might then need to either modify it to work 
> > > > with their extensions or fork and create their own virt
> > > > client driver.
> > > >
> > > > > 2. Why existing solutions are not enough and can't be extended?
> > > >
> > > > This patch is against the qemu source code, not the kernel.  There is 
> > > > no other solution in the qemu source code, so there is no existing 
> > > > solution to extend.
> > > >
> > > > > 3. Why RXE (SoftRoCE) can't be extended to perform this inter-VM
> > > > >    communication via virtual NIC?
> > > >
> > > > Eventually they want this to work on real hardware, and to be more or 
> > > > less transparent to the guest.  They will need to make it independent 
> > > > of the kernel hardware/driver in use.  That means their own
> > > > virt driver, then the virt driver will eventually hook into whatever 
> > > > hardware is present on the system, or failing that, fall back to soft 
> > > > RoCE or soft iWARP if that ever makes it in the kernel.
> > > >
> > > >
> > >
> > > Hi Leon and Doug,
> > > Your feedback is much appreciated!
> > >
> > > As Doug mentioned, the RFC is a QEMU implementation of a pvrdma device,
> > > so SoftRoCE can't help here (we are emulating a PCI device).
> >
> > I just responded to the latest email, but as you understood from my 
> > question,
> > it was related to your KDBR module.
> >
> > >
> > > Regarding the new KDBR module (Kernel Data Bridge), as the name suggests 
> > > is
> > > a bridge between different VMs or between a VM and a hardware/software 
> > > device
> > > and does not replace it.
> > >
> > > Leon, utilizing the Soft RoCE is definitely part of our roadmap from the 
> > > start,
> > > we find the project a must since most of our systems don't even have real
> > > RDMA hardware, and the question is how do best integrate with it.
> >
> > This is exactly the question, you chose as an implementation path to do
> > it with new module over char device. I'm not against your approach,
> > but would like to see the list with pros and cons for over possible
> > solutions if any. Does it make sense to do special ULP to share the data
> > between different drivers over shared memory?
>
> Hi Leon,
>
> Here are some thoughts regarding the Soft RoCE usage in our project.
> We thought about using it as backend for QEMU pvrdma device
> we didn't how it will support our requirements.
>
> 1. Does Soft RoCE support inter process (VM) fast path ? The KDBR
>    removes the need for hw resources, emulated or not, concentrating
>    on one copy from a VM to another.
>
> 2. We needed to support migration, meaning the PVRDMA device must preserve
>    the RDMA resources between different hosts. Our solution includes a clear
>    separation between the guest resources namespace and the actual hw/sw 
> device.
>    This is why the KDBR is intended to run outside the scope of the SoftRoCE
>    so it can open/close hw connections independent from the VM.
>
> 3. Our intention is for KDBR to be used in other contexts as well when we need
>    inter VM data exchange, e.g. backend for virtio devices. We didn't see how 
> this
>    kind of requirement can be implemented inside SoftRoce as we don't see any
>    connection between them.
>
> 4. We don't want all the VM memory to be pinned since it disable 
> memory-over-commit
>    which in turn will make the pvrdma device useless.
>    We weren't sure how nice would play Soft RoCE with memory pinning and we 
> wanted
>    more control on memory management. It may be a solvable issue, but combined
>    with the others lead us to our decision to come up with our kernel bridge 
> (char
>    device or not, we went for it since it was the easiest to implement for a 
> POC)

I'm not going to repeat Jason's answer, I'm completely agree with him.

Just add my 2 cents. You didn't answer on my question about other possible
implementations. It can be SoftRoCE loopback optimizations, special ULP,
RDMA transport, virtual driver with multiple VFs and single PF.

>
>
> Thanks,
> Marcel & Yuval
>
> >
> > Thanks
> >
> > >
> > > Thanks,
> > > Marcel & Yuval
> > >
> > >
> > > > >
> > > > > Can you please help us to fill this knowledge gap?
> > > > >
> > > > > [1] http://marc.info/?l=linux-rdma&m=149063626907175&w=2
> > > >
> > >
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> > > the body of a message to address@hidden
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to address@hidden
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]