[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v2] security.rst: add Security Guide to develope
From: |
Philippe Mathieu-Daudé |
Subject: |
Re: [Qemu-devel] [PATCH v2] security.rst: add Security Guide to developer docs |
Date: |
Fri, 3 May 2019 12:10:04 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1 |
On 5/3/19 11:04 AM, Alex Bennée wrote:
>
> Stefan Hajnoczi <address@hidden> writes:
>
>> At KVM Forum 2018 I gave a presentation on security in QEMU:
>> https://www.youtube.com/watch?v=YAdRf_hwxU8 (video)
>> https://vmsplice.net/~stefan/stefanha-kvm-forum-2018.pdf (slides)
>>
>> This patch adds a security guide to the developer docs. This document
>> covers things that developers should know about security in QEMU. It is
>> just a starting point that we can expand on later. I hope it will be
>> useful as a resource for new contributors and will save code reviewers
>> from explaining the same concepts many times.
>>
>> Signed-off-by: Stefan Hajnoczi <address@hidden>
>> ---
>> v2:
>> * Added mention of passthrough USB and PCI devices [philmd]
>> * Reworded resource limits [philmd]
>> * Added qemu_log_mask(LOG_GUEST_ERROR) [philmd]
>> ---
>> docs/devel/index.rst | 1 +
>> docs/devel/security.rst | 225 ++++++++++++++++++++++++++++++++++++++++
>> 2 files changed, 226 insertions(+)
>> create mode 100644 docs/devel/security.rst
>>
>> diff --git a/docs/devel/index.rst b/docs/devel/index.rst
>> index ebbab636ce..fd0b5fa387 100644
>> --- a/docs/devel/index.rst
>> +++ b/docs/devel/index.rst
>> @@ -20,3 +20,4 @@ Contents:
>> stable-process
>> testing
>> decodetree
>> + security
>> diff --git a/docs/devel/security.rst b/docs/devel/security.rst
>> new file mode 100644
>> index 0000000000..83c6fb2231
>> --- /dev/null
>> +++ b/docs/devel/security.rst
>> @@ -0,0 +1,225 @@
>> +==============
>> +Security Guide
>> +==============
>> +Overview
>> +--------
>> +This guide covers security topics relevant to developers working on QEMU.
>> It
>> +includes an explanation of the security requirements that QEMU gives its
>> users,
>> +the architecture of the code, and secure coding practices.
>> +
>> +Security Requirements
>> +---------------------
>> +QEMU supports many different use cases, some of which have stricter security
>> +requirements than others. The community has agreed on the overall security
>> +requirements that users may depend on. These requirements define what is
>> +considered supported from a security perspective.
>> +
>> +Virtualization Use Case
>> +~~~~~~~~~~~~~~~~~~~~~~~
>> +The virtualization use case covers cloud and virtual private server (VPS)
>> +hosting, as well as traditional data center and desktop virtualization.
>> These
>> +use cases rely on hardware virtualization extensions to execute guest code
>> +safely on the physical CPU at close-to-native speed.
>> +
>> +The following entities are **untrusted**, meaning that they may be buggy or
>> +malicious:
>> +
>> +* Guest
>> +* User-facing interfaces (e.g. VNC, SPICE, WebSocket)
>> +* Network protocols (e.g. NBD, live migration)
>> +* User-supplied files (e.g. disk images, kernels, device trees)
>> +* Passthrough devices (e.g. PCI, USB)
Thanks.
>> +
>> +Bugs affecting these entities are evaluated on whether they can cause
>> damage in
>> +real-world use cases and treated as security bugs if this is the case.
>> +
>> +Non-virtualization Use Case
>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> +The non-virtualization use case covers emulation using the Tiny Code
>> Generator
>> +(TCG). In principle the TCG and device emulation code used in conjunction
>> with
>> +the non-virtualization use case should meet the same security requirements
>> as
>> +the virtualization use case. However, for historical reasons much of the
>> +non-virtualization use case code was not written with these security
>> +requirements in mind.
>> +
>> +Bugs affecting the non-virtualization use case are not considered security
>> +bugs at this time. Users with non-virtualization use cases must not rely on
>> +QEMU to provide guest isolation or any security guarantees.
>> +
>> +Architecture
>> +------------
>> +This section describes the design principles that ensure the security
>> +requirements are met.
>> +
>> +Guest Isolation
>> +~~~~~~~~~~~~~~~
>> +Guest isolation is the confinement of guest code to the virtual machine.
>> When
>> +guest code gains control of execution on the host this is called escaping
>> the
>> +virtual machine. Isolation also includes resource limits such as
>> throttling of
>> +CPU, memory, disk, or network. Guests must be unable to exceed their
>> resource
>> +limits.
>> +
>> +QEMU presents an attack surface to the guest in the form of emulated
>> devices.
>> +The guest must not be able to gain control of QEMU. Bugs in emulated
>> devices
>> +could allow malicious guests to gain code execution in QEMU. At this point
>> the
>> +guest has escaped the virtual machine and is able to act in the context of
>> the
>> +QEMU process on the host.
>> +
>> +Guests often interact with other guests and share resources with them. A
>> +malicious guest must not gain control of other guests or access their data.
>> +Disk image files and network traffic must be protected from other guests
>> unless
>> +explicitly shared between them by the user.
>> +
>> +Principle of Least Privilege
>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> +The principle of least privilege states that each component only has access
>> to
>> +the privileges necessary for its function. In the case of QEMU this means
>> that
>> +each process only has access to resources belonging to the guest.
>> +
>> +The QEMU process should not have access to any resources that are
>> inaccessible
>> +to the guest. This way the guest does not gain anything by escaping into
>> the
>> +QEMU process since it already has access to those same resources from within
>> +the guest.
>> +
>> +Following the principle of least privilege immediately fulfills guest
>> isolation
>> +requirements. For example, guest A only has access to its own disk image
>> file
>> +``a.img`` and not guest B's disk image file ``b.img``.
>> +
>> +In reality certain resources are inaccessible to the guest but must be
>> +available to QEMU to perform its function. For example, host system calls
>> are
>> +necessary for QEMU but are not exposed to guests. A guest that escapes into
>> +the QEMU process can then begin invoking host system calls.
>> +
>> +New features must be designed to follow the principle of least privilege.
>> +Should this not be possible for technical reasons, the security risk must be
>> +clearly documented so users are aware of the trade-off of enabling the
>> feature.
>> +
>> +Isolation mechanisms
>> +~~~~~~~~~~~~~~~~~~~~
>> +Several isolation mechanisms are available to realize this architecture of
>> +guest isolation and the principle of least privilege. With the exception of
>> +Linux seccomp, these mechanisms are all deployed by management tools that
>> +launch QEMU, such as libvirt. They are also platform-specific so they are
>> only
>> +described briefly for Linux here.
>> +
>> +The fundamental isolation mechanism is that QEMU processes must run as
>> +**unprivileged users**. Sometimes it seems more convenient to launch QEMU
>> as
>> +root to give it access to host devices (e.g. ``/dev/net/tun``) but this
>> poses a
>> +huge security risk. File descriptor passing can be used to give an
>> otherwise
>> +unprivileged QEMU process access to host devices without running QEMU
>> as root.
>
> Should we mention that you can still maintain running as a user and just
> make the devices you need available to the user/group rather than
> becoming root? For example I generally make /dev/kvm group accessible to
> my user account.
Good suggestion.
>> +
>> +**SELinux** and **AppArmor** make it possible to confine processes beyond
>> the
>> +traditional UNIX process and file permissions model. They restrict the QEMU
>> +process from accessing processes and files on the host system that are not
>> +needed by QEMU.
>> +
>> +**Resource limits** and **cgroup controllers** provide throughput and
>> utilization
>> +limits on key resources such as CPU time, memory, and I/O bandwidth.
>> +
>> +**Linux namespaces** can be used to make process, file system, and other
>> system
>> +resources unavailable to QEMU. A namespaced QEMU process is restricted to
>> only
>> +those resources that were granted to it.
>> +
>> +**Linux seccomp** is available via the QEMU ``--sandbox`` option. It
>> disables
>> +system calls that are not needed by QEMU, thereby reducing the host kernel
>> +attack surface.
>> +
>> +Secure coding practices
>> +-----------------------
>> +At the source code level there are several points to keep in mind. Both
>> +developers and security researchers must be aware of them so that they can
>> +develop safe code and audit existing code properly.
>> +
>> +General Secure C Coding Practices
>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> +Most CVEs (security bugs) reported against QEMU are not specific to
>> +virtualization or emulation. They are simply C programming bugs. Therefore
>> +it's critical to be aware of common classes of security bugs.
>> +
>> +There is a wide selection of resources available covering secure C coding.
>> For
>> +example, the `CERT C Coding Standard
>> +<https://wiki.sei.cmu.edu/confluence/display/c/SEI+CERT+C+Coding+Standard>`_
>> +covers the most important classes of security bugs.
>> +
>> +Instead of describing them in detail here, only the names of the most
>> important
>> +classes of security bugs are mentioned:
>> +
>> +* Buffer overflows
>> +* Use-after-free and double-free
>> +* Integer overflows
>> +* Format string vulnerabilities
>> +
>> +Some of these classes of bugs can be detected by analyzers. Static
>> analysis is
>> +performed regularly by Coverity and the most obvious of these bugs are even
>> +reported by compilers. Dynamic analysis is possible with valgrind, tsan,
>> and
>> +asan.
>> +
>> +Input Validation
>> +~~~~~~~~~~~~~~~~
>> +Inputs from the guest or external sources (e.g. network, files) cannot be
>> +trusted and may be invalid. Inputs must be checked before using them in a
>> way
>> +that could crash the program, expose host memory to the guest, or otherwise
>> be
>> +exploitable by an attacker.
>> +
>> +The most sensitive attack surface is device emulation. All hardware
>> register
>> +accesses and data read from guest memory must be validated. A typical
>> example
>> +is a device that contains multiple units that are selectable by the guest
>> via
>> +an index register::
>> +
>> + typedef struct {
>> + ProcessingUnit unit[2];
>> + ...
>> + } MyDeviceState;
>> +
>> + static void mydev_writel(void *opaque, uint32_t addr, uint32_t val)
>> + {
>> + MyDeviceState *mydev = opaque;
>> + ProcessingUnit *unit;
>> +
>> + switch (addr) {
>> + case MYDEV_SELECT_UNIT:
>> + unit = &mydev->unit[val]; <-- this input wasn't validated!
>> + ...
>> + }
>> + }
>> +
>> +If ``val`` is not in range [0, 1] then an out-of-bounds memory access will
>> take
>> +place when ``unit`` is dereferenced. The code must check that ``val`` is 0
>> or
>> +1 and handle the case where it is invalid.
>> +
>> +Unexpected Device Accesses
>> +~~~~~~~~~~~~~~~~~~~~~~~~~~
>> +The guest may access device registers in unusual orders or at unexpected
>> +moments. Device emulation code must not assume that the guest follows the
>> +typical "theory of operation" presented in driver writer manuals. The guest
>> +may make nonsense accesses to device registers such as starting operations
>> +before the device has been fully initialized.
>> +
>> +A related issue is that device emulation code must be prepared for
>> unexpected
>> +device register accesses while asynchronous operations are in progress. A
>> +well-behaved guest might wait for a completion interrupt before accessing
>> +certain device registers. Device emulation code must handle the case where
>> the
>> +guest overwrites registers or submits further requests before an ongoing
>> +request completes. Unexpected accesses must not cause memory corruption or
>> +leaks in QEMU.
>> +
>> +Invalid device register accesses can be reported with
>> +``qemu_log_mask(LOG_GUEST_ERROR, ...)``. The ``-d guest_errors``
>> command-line
>> +option enables these log messages.
Thanks for adding this section!
>> +
>> +Live migration
>> +~~~~~~~~~~~~~~
>> +Device state can be saved to disk image files and shared with other users.
>> +Live migration code must validate inputs when loading device state so an
>> +attacker cannot gain control by crafting invalid device states. Device
>> state
>> +is therefore considered untrusted even though it is typically generated by
>> QEMU
>> +itself.
>> +
>> +Guest Memory Access Races
>> +~~~~~~~~~~~~~~~~~~~~~~~~~
>> +Guests with multiple vCPUs may modify guest RAM while device emulation code
>> is
>> +running. Device emulation code must copy in descriptors and other guest RAM
>> +structures and only process the local copy. This prevents
>> +time-of-check-to-time-of-use (TOCTOU) race conditions that could cause QEMU
>> to
>> +crash when a vCPU thread modifies guest RAM while device emulation is
>> +processing it.
>
> Anyway:
>
> Reviewed-by: Alex Bennée <address@hidden>
Reviewed-by: Philippe Mathieu-Daudé <address@hidden>