Re: [Qemu-devel] Secure KVM

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Secure KVM

From:	Anthony Liguori
Subject:	Re: [Qemu-devel] Secure KVM
Date:	Mon, 07 Nov 2011 12:03:38 -0600
User-agent:	Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.21) Gecko/20110831 Lightning/1.0b2 Thunderbird/3.1.13

On 11/07/2011 11:52 AM, Sasha Levin wrote:

Hi Anthony,

Thank you for your comments!

On Mon, 2011-11-07 at 11:37 -0600, Anthony Liguori wrote:

On 11/06/2011 02:40 PM, Sasha Levin wrote:

Hi all,

I'm planning on doing a small fork of the KVM tool to turn it into a
'Secure KVM' enabled hypervisor. Now you probably ask yourself, Huh?

The idea was discussed briefly couple of months ago, but never got off
the ground - which is a shame IMO.

It's easy to explain the problem: If an attacker finds a security hole
in any of the devices which are exposed to the guest, the attacker would
be able to either crash the guest, or possibly run code on the host
itself.

The solution is also simple to explain: Split the devices into different
processes and use seccomp to sandbox each device into the exact set of
resources it needs to operate, nothing more and nothing less.

Since I'll be basing it on the KVM tool, which doesn't really emulate
that many legacy devices, I'll focus first on the virtio family for the
sake of simplicity (and covering 90% of the options).

This is my basic overview of how I'm planning on implementing the
initial POC:

1. First I'll focus on the simple virtio-rng device, it's simple enough
to allow us to focus on the aspects which are important for the POC
while still covering most bases (i.e. sandbox to single file
- /dev/urandom and such).

2. Do it on a one process per device concept, where for each device
(notice - not device *type*) requested, a new process which handles it
will be spawned.

3. That process will be limited exactly to the resources it needs to
operate, for example - if we run a virtio-blk device, it would be able
to access only the image file which it should be using.

4. Connection between hypervisor and devices will be based on unix
sockets, this should allow for better separation compared to other
approaches such as shared memory.

5. While performance is an aspect, complete isolation is more important.
Security is primary, performance is secondary.

6. Share as much code as possible with current implementation of virtio
devices, make it possible to run virtio devices either like it's being
done now, or by spawning them as separate processes - the amount of
specific code for the separate process case should be minimal.


Thats all I have for now, comments are *very* welcome.


I thought about this a bit and have some ideas that may or may not help.

1) If you add device save/load support, then it's something you can potentially
use to give yourself quite a bit of flexibility in changing the sandbox.  At any
point in run time, you can save the device model's state in the sandbox, destroy
the sandbox, and then build a new sandbox and restore the device to its former
state.

This might turn out to be very useful in supporting things like device hotplug
and/or memory hot plug.

2) I think it's largely possible to implement all device emulation without doing
any dynamic memory allocation.  Since memory allocation DoS is something you
have to deal with anyway, I suspect most device emulation already uses a fixed
amount of memory per device.   This can potentially dramatically simplify 
things.

3) I think virtio can/should be used as a generic "backend to frontend"
transport between the device model and the tool.


virtio requires server and client to have shared memory, so if we
already go with shared memory we can just let the device manage the
actual virtio driver directly, no?

Let's say you're implementing an IDE device model in the sandbox. You can tryto implement the block layer in the sandbox but I think that quickly will becometoo difficult.

You can do as Avi suggested and do all DMA accesses from the IDE device model asRPCs, or you can map guest memory as shared memory and utilize (1) in order tochange that mapping as you need to.

At some point, you end up with a struct iovec and an offset that you want toread/write to the virtual disk. You need a way to send that to the "frontend"that will then handle that as a raw/qcow2 request.

Well, virtio is great at doing exactly that :-) So if you increase your sharedmemory to have a little bit extra to stick another vring, you can use that fordevice model -> front end communication without paying an extra memcpy.

For notifications, the easiest thing to do is setup an "event channel" bitmapand use a single eventfd to multiplex that event channel bitmap. This is prettymuch how Xen works btw. A single interrupt is reserved and a bitmap is used todispatch the actual events.


So the sandbox loop would look like:

void main() {
  setup_devices();

  read_from_event_channel(main_channel);
  for i in vrings:
     check_vring_notification(i);
}

Once vring would be used for dispatching PIO/MMIO. The remaining vrings couldbe used for anything really.

Like I mentioned elsewhere, just think of the sandbox as just an extension ofthe guests firmware. The purpose of the sandbox is to reduce a verycomplicated, legacy device model, into a very simple and easy to audit, purelyvirtio based model.


Also, things like interrupts would also require some sort of a different
IPC, which would complicate things a bit.

4) Lack of select() is really challenging.  I understand why it's not there
since it can technically be emulated but it seems like a no-risk syscall to
whitelist and it would make programming in a sandbox so much easier.  Maybe
Andrea has some comments here?  I might be missing something here.


There are several of these which would be nice to have, and if we can
get seccomp filters we have good flexibility with which APIs we allow
for each device.

Yeah, filters are nice but I fear that you lose some of the PR benefits ofsandboxing. Once the first application claims to use sandboxing, whitelists asyscall it shouldn't, you'll start getting slashdot articles about "Linuxsandbox broken, Linux security hopeless broken". Then what's the point of allof this?


Regards,

Anthony Liguori

Regards,

Anthony Liguori

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] Secure KVM, Anthony Liguori, 2011/11/07
- Re: [Qemu-devel] Secure KVM, Sasha Levin, 2011/11/07
  - Re: [Qemu-devel] Secure KVM, Anthony Liguori <=
    - Re: [Qemu-devel] Secure KVM, Rusty Russell, 2011/11/08
    - Re: [Qemu-devel] Secure KVM, Will Drewry, 2011/11/08

Prev by Date: Re: [Qemu-devel] [PATCH] KVM: Add wrapper script around QEMU to test kernels
Next by Date: [Qemu-devel] Summary of CD, DVD, BD passthrough tests with -drive if=virtio : Full success
Previous by thread: Re: [Qemu-devel] Secure KVM
Next by thread: Re: [Qemu-devel] Secure KVM
Index(es):
- Date
- Thread