qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v7 RFC] block/vxhs: Initial commit to add Verita


From: Ketan Nilangekar
Subject: Re: [Qemu-devel] [PATCH v7 RFC] block/vxhs: Initial commit to add Veritas HyperScale VxHS block device support
Date: Fri, 25 Nov 2016 08:27:26 +0000
User-agent: Microsoft-MacOutlook/f.1c.1.161117


On 11/24/16, 9:38 PM, "Stefan Hajnoczi" <address@hidden> wrote:

    On Thu, Nov 24, 2016 at 11:31:14AM +0000, Ketan Nilangekar wrote:
    > 
    > 
    > On 11/24/16, 4:41 PM, "Stefan Hajnoczi" <address@hidden> wrote:
    > 
    >     On Thu, Nov 24, 2016 at 05:44:37AM +0000, Ketan Nilangekar wrote:
    >     > On 11/24/16, 4:07 AM, "Paolo Bonzini" <address@hidden> wrote:
    >     > >On 23/11/2016 23:09, ashish mittal wrote:
    >     > >> On the topic of protocol security -
    >     > >> 
    >     > >> Would it be enough for the first patch to implement only
    >     > >> authentication and not encryption?
    >     > >
    >     > >Yes, of course.  However, as we introduce more and more 
QEMU-specific
    >     > >characteristics to a protocol that is already QEMU-specific (it 
doesn't
    >     > >do failover, etc.), I am still not sure of the actual benefit of 
using
    >     > >libqnio versus having an NBD server or FUSE driver.
    >     > >
    >     > >You have already mentioned performance, but the design has changed 
so
    >     > >much that I think one of the two things has to change: either 
failover
    >     > >moves back to QEMU and there is no (closed source) translator 
running on
    >     > >the node, or the translator needs to speak a well-known and
    >     > >already-supported protocol.
    >     > 
    >     > IMO design has not changed. Implementation has changed 
significantly. I would propose that we keep resiliency/failover code out of 
QEMU driver and implement it entirely in libqnio as planned in a subsequent 
revision. The VxHS server does not need to understand/handle failover at all. 
    >     > 
    >     > Today libqnio gives us significantly better performance than any 
NBD/FUSE implementation. We know because we have prototyped with both. 
Significant improvements to libqnio are also in the pipeline which will use 
cross memory attach calls to further boost performance. Ofcourse a big reason 
for the performance is also the HyperScale storage backend but we believe this 
method of IO tapping/redirecting can be leveraged by other solutions as well.
    >     
    >     By "cross memory attach" do you mean
    >     process_vm_readv(2)/process_vm_writev(2)?
    >   
    > Ketan> Yes.
    >   
    >     That puts us back to square one in terms of security.  You have
    >     (untrusted) QEMU + (untrusted) libqnio directly accessing the memory 
of
    >     another process on the same machine.  That process is therefore also
    >     untrusted and may only process data for one guest so that guests stay
    >     isolated from each other.
    >     
    > Ketan> Understood but this will be no worse than the current network 
based communication between qnio and vxhs server. And although we have 
questions around QEMU trust/vulnerability issues, we are looking to implement 
basic authentication scheme between libqnio and vxhs server.
    
    This is incorrect.
    
    Cross memory attach is equivalent to ptrace(2) (i.e. debugger) access.
    It means process A reads/writes directly from/to process B memory.  Both
    processes must have the same uid/gid.  There is no trust boundary
    between them.
    
Ketan> Not if vxhs server is running as root and initiating the cross mem 
attach. Which is also why we are proposing a basic authentication mechanism 
between qemu-vxhs. But anyway the cross memory attach is for a near future 
implementation. 

    Network communication does not require both processes to have the same
    uid/gid.  If you want multiple QEMU processes talking to a single server
    there must be a trust boundary between client and server.  The server
    can validate the input from the client and reject undesired operations.

Ketan> This is what we are trying to propose. With the addition of 
authentication between qemu-vxhs server, we should be able to achieve this. 
Question is, would that be acceptable?
    
    Hope this makes sense now.
    
    Two architectures that implement the QEMU trust model correctly are:
    
    1. Cross memory attach: each QEMU process has a dedicated vxhs server
       process to prevent guests from attacking each other.  This is where I
       said you might as well put the code inside QEMU since there is no
       isolation anyway.  From what you've said it sounds like the vxhs
       server needs a host-wide view and is responsible for all guests
       running on the host, so I guess we have to rule out this
       architecture.
    
    2. Network communication: one vxhs server process and multiple guests.
       Here you might as well use NBD or iSCSI because it already exists and
       the vxhs driver doesn't add any unique functionality over existing
       protocols.

Ketan> NBD does not give us the performance we are trying to achieve. Besides 
NBD does not have any authentication support.
There is a hybrid 2.a approach which uses both 1 & 2 but I’d keep that for a 
later discussion.

    >     There's an easier way to get even better performance: get rid of 
libqnio
    >     and the external process.  Move the code from the external process 
into
    >     QEMU to eliminate the process_vm_readv(2)/process_vm_writev(2) and
    >     context switching.
    >     
    >     Can you remind me why there needs to be an external process?
    >  
    > Ketan>  Apart from virtualizing the available direct attached storage on 
the compute, vxhs storage backend (the external process) provides features such 
as storage QoS, resiliency, efficient use of direct attached storage, automatic 
storage recovery points (snapshots) etc. Implementing this in QEMU is not 
practical and not the purpose of proposing this driver.
    
    This sounds similar to what QEMU and Linux (file systems, LVM, RAID,
    etc) already do.  It brings to mind a third architecture:
    
    3. A Linux driver or file system.  Then QEMU opens a raw block device.
       This is what the Ceph rbd block driver in Linux does.  This
       architecture has a kernel-userspace boundary so vxhs does not have to
       trust QEMU.
    
    I suggest Architecture #2.  You'll be able to deploy on existing systems
    because QEMU already supports NBD or iSCSI.  Use the time you gain from
    switching to this architecture on benchmarking and optimizing NBD or
    iSCSI so performance is closer to your goal.
    
Ketan> We have made a choice to go with QEMU driver approach after serious 
evaluation of most if not all standard IO tapping mechanisms including NFS, NBD 
and FUSE. None of these has been able to deliver the performance that we have 
set ourselves to achieve. Hence the effort to propose this new IO tap which we 
believe will provide an alternate to the existing mechanisms and hopefully 
benefit the community.

    Stefan
    


reply via email to

[Prev in Thread] Current Thread [Next in Thread]