[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH RFC] hw/pvrdma: Proposal of a new pvrdma device
From: |
Leon Romanovsky |
Subject: |
Re: [Qemu-devel] [PATCH RFC] hw/pvrdma: Proposal of a new pvrdma device |
Date: |
Tue, 4 Apr 2017 20:33:49 +0300 |
User-agent: |
Mutt/1.8.0 (2017-02-23) |
On Tue, Apr 04, 2017 at 04:38:40PM +0300, Marcel Apfelbaum wrote:
> On 04/03/2017 09:23 AM, Leon Romanovsky wrote:
> > On Fri, Mar 31, 2017 at 06:45:43PM +0300, Marcel Apfelbaum wrote:
> > > On 03/30/2017 11:28 PM, Doug Ledford wrote:
> > > > On 3/30/17 9:13 AM, Leon Romanovsky wrote:
> > > > > On Thu, Mar 30, 2017 at 02:12:21PM +0300, Marcel Apfelbaum wrote:
> > > > > > From: Yuval Shaia <address@hidden>
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > General description
> > > > > > ===================
> > > > > > This is a very early RFC of a new RoCE emulated device
> > > > > > that enables guests to use the RDMA stack without having
> > > > > > a real hardware in the host.
> > > > > >
> > > > > > The current implementation supports only VM to VM communication
> > > > > > on the same host.
> > > > > > Down the road we plan to make possible to be able to support
> > > > > > inter-machine communication by utilizing physical RoCE devices
> > > > > > or Soft RoCE.
> > > > > >
> > > > > > The goals are:
> > > > > > - Reach fast and secure loos-less Inter-VM data exchange.
> > > > > > - Support remote VMs or bare metal machines.
> > > > > > - Allow VMs migration.
> > > > > > - Do not require to pin all VM memory.
> > > > > >
> > > > > >
> > > > > > Objective
> > > > > > =========
> > > > > > Have a QEMU implementation of the PVRDMA device. We aim to do so
> > > > > > without
> > > > > > any change in the PVRDMA guest driver which is already merged into
> > > > > > the
> > > > > > upstream kernel.
> > > > > >
> > > > > >
> > > > > > RFC status
> > > > > > ===========
> > > > > > The project is in early development stages and supports
> > > > > > only basic send/receive operations.
> > > > > >
> > > > > > We present it so we can get feedbacks on design,
> > > > > > feature demands and to receive comments from the
> > > > > > community pointing us to the "right" direction.
> > > > >
> > > > > If to judge by the feedback which you got from RDMA community
> > > > > for kernel proposal [1], this community failed to understand:
> > > > > 1. Why do you need new module?
> > > >
> > > > In this case, this is a qemu module to allow qemu to provide a virt
> > > > rdma device to guests that is compatible with the device provided by
> > > > VMWare's ESX product. Right now, the vmware_pvrdma driver
> > > > works only when the guest is running on a VMWare ESX server product,
> > > > this would change that. Marcel mentioned that they are currently
> > > > making it compatible because that's the easiest/quickest thing to
> > > > do, but in the future they might extend beyond what VMWare's virt rdma
> > > > driver provides/uses and might then need to either modify it to work
> > > > with their extensions or fork and create their own virt
> > > > client driver.
> > > >
> > > > > 2. Why existing solutions are not enough and can't be extended?
> > > >
> > > > This patch is against the qemu source code, not the kernel. There is
> > > > no other solution in the qemu source code, so there is no existing
> > > > solution to extend.
> > > >
> > > > > 3. Why RXE (SoftRoCE) can't be extended to perform this inter-VM
> > > > > communication via virtual NIC?
> > > >
> > > > Eventually they want this to work on real hardware, and to be more or
> > > > less transparent to the guest. They will need to make it independent
> > > > of the kernel hardware/driver in use. That means their own
> > > > virt driver, then the virt driver will eventually hook into whatever
> > > > hardware is present on the system, or failing that, fall back to soft
> > > > RoCE or soft iWARP if that ever makes it in the kernel.
> > > >
> > > >
> > >
> > > Hi Leon and Doug,
> > > Your feedback is much appreciated!
> > >
> > > As Doug mentioned, the RFC is a QEMU implementation of a pvrdma device,
> > > so SoftRoCE can't help here (we are emulating a PCI device).
> >
> > I just responded to the latest email, but as you understood from my
> > question,
> > it was related to your KDBR module.
> >
> > >
> > > Regarding the new KDBR module (Kernel Data Bridge), as the name suggests
> > > is
> > > a bridge between different VMs or between a VM and a hardware/software
> > > device
> > > and does not replace it.
> > >
> > > Leon, utilizing the Soft RoCE is definitely part of our roadmap from the
> > > start,
> > > we find the project a must since most of our systems don't even have real
> > > RDMA hardware, and the question is how do best integrate with it.
> >
> > This is exactly the question, you chose as an implementation path to do
> > it with new module over char device. I'm not against your approach,
> > but would like to see the list with pros and cons for over possible
> > solutions if any. Does it make sense to do special ULP to share the data
> > between different drivers over shared memory?
>
> Hi Leon,
>
> Here are some thoughts regarding the Soft RoCE usage in our project.
> We thought about using it as backend for QEMU pvrdma device
> we didn't how it will support our requirements.
>
> 1. Does Soft RoCE support inter process (VM) fast path ? The KDBR
> removes the need for hw resources, emulated or not, concentrating
> on one copy from a VM to another.
>
> 2. We needed to support migration, meaning the PVRDMA device must preserve
> the RDMA resources between different hosts. Our solution includes a clear
> separation between the guest resources namespace and the actual hw/sw
> device.
> This is why the KDBR is intended to run outside the scope of the SoftRoCE
> so it can open/close hw connections independent from the VM.
>
> 3. Our intention is for KDBR to be used in other contexts as well when we need
> inter VM data exchange, e.g. backend for virtio devices. We didn't see how
> this
> kind of requirement can be implemented inside SoftRoce as we don't see any
> connection between them.
>
> 4. We don't want all the VM memory to be pinned since it disable
> memory-over-commit
> which in turn will make the pvrdma device useless.
> We weren't sure how nice would play Soft RoCE with memory pinning and we
> wanted
> more control on memory management. It may be a solvable issue, but combined
> with the others lead us to our decision to come up with our kernel bridge
> (char
> device or not, we went for it since it was the easiest to implement for a
> POC)
I'm not going to repeat Jason's answer, I'm completely agree with him.
Just add my 2 cents. You didn't answer on my question about other possible
implementations. It can be SoftRoCE loopback optimizations, special ULP,
RDMA transport, virtual driver with multiple VFs and single PF.
>
>
> Thanks,
> Marcel & Yuval
>
> >
> > Thanks
> >
> > >
> > > Thanks,
> > > Marcel & Yuval
> > >
> > >
> > > > >
> > > > > Can you please help us to fill this knowledge gap?
> > > > >
> > > > > [1] http://marc.info/?l=linux-rdma&m=149063626907175&w=2
> > > >
> > >
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> > > the body of a message to address@hidden
> > > More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to address@hidden
> More majordomo info at http://vger.kernel.org/majordomo-info.html
signature.asc
Description: PGP signature