qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH RFC] hw/pvrdma: Proposal of a new pvrdma device


From: Jason Gunthorpe
Subject: Re: [Qemu-devel] [PATCH RFC] hw/pvrdma: Proposal of a new pvrdma device
Date: Thu, 6 Apr 2017 14:38:23 -0600
User-agent: Mutt/1.5.24 (2015-08-30)

On Thu, Apr 06, 2017 at 10:42:20PM +0300, Yuval Shaia wrote:

> > I'd rather see someone optimize the loopback path of soft roce than
> > see KDBR :)
> 
> Can we assume that the optimized loopback path will be as fast as direct
> copy between one VM address space to another VM address space?

Well, you'd optimize it until it was a direct memory copy, so I think
that is a reasonable starting assumption.

> > > 3. Our intention is for KDBR to be used in other contexts as well when we 
> > > need
> > >    inter VM data exchange, e.g. backend for virtio devices. We didn't see 
> > > how this
> > >    kind of requirement can be implemented inside SoftRoce as we don't see 
> > > any
> > >    connection between them.
> > 
> > KDBR looks like weak RDMA to me, so it is reasonable question why not
> > use full RDMA with loopback optimization instead of creating something
> > unique.
> 
> True, KDBR exposes RDMA-like API because it's sole user is currently
> pvrdma device.  But, by design it can be expand to support other
> clients for example virtio device which might have other attributes,
> can we expect the same from SoftRoCE?

RDMA handles all sorts of complex virtio-like protocols just
fine. Unclear what 'other attributes' would be. Sounds like over
designing??

> > IMHO, it also makes more sense for something like KDBR to live as a
> > RDMA transport, not as a unique char device, it is obviously very
> > RDMA-like.
> 
> Can you elaborate more on this?
> What exactly it will solve?
> How it will be better than kdbr?

If you are going to do RDMA, then the uAPI for it from the kernel
should be the RDMA subsystem, don't invent unique cdevs that overlap
established kernel functionality without a very, very good reason.

> > .. and the char dev really can't be used when implementing user space
> > RDMA, that would just make a big mess..
> 
> The position of kdbr is not to be a layer *between* user space and device -
> it is *the device* from point of view of the process.

Any RDMA device built on top of kdbr certainly needs to support
/dev/uverbs0 and all the usual RDMA stuff, so again, I fail to see the
point of the special cdev.. Trying to mix /dev/uverbs0 and /dev/kdbr
in your provider would be too goofy and weird.

> > But obviously if you connect pvrdma to real hardware then the page pin
> > comes back.
> 
> The fact that page pin is not needed with Soft RoCE device but is needed
> with real RoCE device is exactly where kdbr can help as it isolates this
> fact from user space process.

I don't see how KDBR helps at all.

To do virtual RDMA you must transfer RDMA objects and commands
unmodified from VM to HV and implement a fairly complicated SW stack
inside the HV.

Once you do that, micro-optimizing for same-machine VM-to-VM copy is
not really such a big deal, IMHO.

The big challenge is keeping the real HW (or softrocee) RDMA objects
in sync with the VM ones and implementing some kind of RDMA-in-RDMA
tunnel to enable migration when using today's HW offload.

I see nothing in kdbr that helps with any of this. All it seems to do
is obfuscate the transfer of RDMA objects and commands to the
hypervisor, and make the transition of a RDMA channel from loopback to
network far, far, more complicated.

> Sorry, we didn't mean "easy" but "simple", and simplest solutions
> are always preferred.  IMHO, currently there is no good solution to
> do data copy between two VMs.

Don't confuse 'simple' with under featured. :)

> Can you comment on the second point - migration? Please note that we need
> it to work both with Soft RoCE and with real device.

I don't see how kdbr helps with migration, you still have to setup the
HW NIC and that needs sharing all the RDMA centric objects from VM to
HV.

Jason



reply via email to

[Prev in Thread] Current Thread [Next in Thread]