qemu-s390x
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [qemu-s390x] [RFC 00/19] KVM: s390/crypto/vfio: guest dedicated cryp


From: Tony Krowiak
Subject: Re: [qemu-s390x] [RFC 00/19] KVM: s390/crypto/vfio: guest dedicated crypto adapters
Date: Tue, 31 Oct 2017 15:39:09 -0400
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.0

On 10/13/2017 01:38 PM, Tony Krowiak wrote:
Ping
Overview:
--------
An adjunct processor (AP) facility is an IBM Z cryptographic facility. The
AP facility is comprised of three AP instructions and from 1 to 256 AP
adapter cards. The design takes advantage of the interpretive execution mode
provided by the SIE architecture. With interpretive execution mode, the AP
instructions executed on the guest are interpreted by the hardware. This
allows guests direct access to AP adapter cards. The first goal of this
patch series is to provide direct access by a KVM guest to an AP as a
pass-through device. The second goal is to provide administrators with the
means to configure KVM guests to grant direct access to AP facilities
assigned to the LPAR in which the host linux system is running.

To facilitate the comprehension of the design, let's present an overview of
the AP architecture.

AP Architectural Overview
-------------------------
Let's start with some definitions:

* AP adapter

   An AP adapter is an IBM Z adapter card that can perform cryptographic
   functionality. There can be from 0 to 256 adapters assigned to an LPAR.
   Each adapter is identified by a number from 0 to 255.   When
   installed, an AP is accessed by AP instructions executed by any CPU.

* AP domain

   An adapter can be partitioned into domains. An adapter can hold up to 256
   domains. Each domain is identified by a number from 0 to 255. Domains can
   be further classified into two types:
* Usage domains are domains that can be accessed directly to process AP
       commands
* Control domains are domains that are accessed indirectly by AP
       commands sent to a usage domain to control or change the domain.

* AP Queue

   An AP queue is the means by which an AP command is sent to an
   AP usage domain inside a specific AP. An AP queue is identified by a tuple
   comprised of an AP adapter ID and a usage domain index corresponding
   to a given usage domain within the adapter. This tuple forms an AP Queue
   Number (APQN) uniquely identifying an AP queue. AP instructions include
   a field containing the APQN to identify the AP queue to which the AP
   command is targetted.

* AP Instructions:

   There are three AP instructions:

   * NQAP: to enqueue an AP command-request message to a queue
   * DQAP: to dequeue an AP command-reply message from a queue
   * PQAP: to adminster the queues

Let's now see how AP instructions are interpreted by the hardware.

Start Interpretive Execution (SIE) Instruction
----------------------------------------------
A KVM guest is started by executing the Start Interpretive Execution (SIE)
instruction. The SIE state description is a control block that contains the
state information for a KVM guest and is supplied as input to the SIE
instruction. The SIE state description contains a field that references
a Crypto Control Block (CRYCB). The CRYCB contains three bitmask fields
identifying the adapters, usage domains and control domains assigned to the
KVM guest:

* The AP Mask (APM) field specifies the AP adapters assigned to the
   KVM guest. The APM controls which adapters are valid for the KVM guest.
   The bits in the mask, from left to right, correspond to APIDs
   0 up to the number of adapters that can be assigned to the LPAR. If a bit
   is set, the corresponding adapter is valid for use by the KVM guest.

* The AP Queue Mask (AQM) field specifies the AP usage domains assigned
   to the KVM guest. The bits in the mask, from left to right, correspond
   to the usage domains, from 0 up to the number of domains that can be
   assigned to the LPAR. If a bit is set, the corresponding usage domain is
   valid for use by the KVM guest.

* The AP Domain Mask field specifies the AP control domains assigned to the
   KVM guest. The ADM bitmask controls which domains can be changed by an AP
   command-request message sent to a usage domain from the guest. The bits in
   the mask, from left to right, correspond to domain 0 up to the number of
   domains that can be assigned to the LPAR. If a bit is set, the
   corresponding domain can be modified by an AP command-request message
   sent to a usage domain configured for the KVM guest.

If you recall from the description of an AP Queue, AP instructions include
an APQN to identify the AP adapter and the specific usage domain within
the adapter to which an AP command-request message is to be sent (NQAP
and PQAP instructions), or from which a command-reply message is to be
received (DQAP instruction). The validity of an APQN is defined by the
matrix calculated from the APM and AQM; it is the intersection of all
assigned adapter numbers (APM) with all assigned usage domain numbers (AQM).
For example, if adapters 1 and 2 and usage domains 5 and 6 are assigned to
a guest, the APQNs (1,5), (1,6), (2,5) and (2,6) will be valid for the
guest.

The APQNs provide secure key functionality - i.e., the key is stored on the
adapter card - so when the adapter card is not virtualized - i.e., the
adapter is accessed directly by the guest - each APQN must be assigned to
at most one guest.

    Example 1: Valid configuration:
    ------------------------------
    Guest1: adapters 1,2  domains 5,6
    Guest2: adapter  1,2  domain 7

    This is valid because both guests have a unique set of APQNs: Guest1 has
    APQNs (1,5), (1,6), (2,5) and (2,6); Guest2 has APQN (1,7) and (2,7).

    Example 2: Invalid configuration:
    --------------------------------
    Guest1: adapters 1,2  domains 5,6
    Guest2: adapter  1    domains 6,7

    This is an invalid configuration because both guests have access to
    APQNs (1,6).

Interruption architecture:

The AP interruption architecture may or may not generate interruptions to
signal to the CPU the end of an AP transaction. The SIE interruption
architecture, depending upon its configuration, may or may not redirect
AP interrupts directly to a guest if the associated queue is valid for a
guest, and may or may not report the interruption to the host.

Effective masking for guest level I and II:

A linux host running in the LPAR operates at guest-level 1 and has its own
SIE state description. When operating at guest-level 1, the masks from the
host's state description are used directly. A linux guest running in the
host operates at guest-level 2. When operating at guest-level 2, the masks
from the guest-level 1 (host) and guest-level 2 (guest) state descriptions
are combined into a single description called an effective mask by
performing a logical AND of the two state descriptions.

The effective mask algorithm is used for the APM, AQM and ADM to create
an EAPM, EAQM and EADM respectively. Use of the EAPM, EAQM and EADM
precludes a guest-level 1 host program from passing to a guest-level 2
program APQNs to which it does not have access.

Linux cryptographic bus driver:

Linux already has a cryptographic bus driver that provides one AP device per
AP adapter and one device per AP queue. There is a device driver for each
type of AP adapter device and each type of AP queue device. This design
utilizes some of the interfaces and functionality provided by the AP bus
driver.

Design Origin:
-------------

The original design was based on modelling AP Queue devices. The design
utilized the VFIO mediated device framework whereby a mediated AP queue
device would be created for each AP Queue bound to the VFIO AP Queue device
driver. This at first seemed like the most logical design choice for the
following reasons:

* Securing access to an AP Queue device by unbinding it from its default
   device driver and binding it to the VFIO device driver would not preclude
   the host from having access to the other usage domains contained within
   the same adapter card connected to the AP queue.

* An AP command is sent to a usage domain within a specific AP adapter via
   an AP queue.

It became readily apparent that modelling the design on an AP queue was very
convoluted for a number of reasons:

   * There is no convenient way to notify the VFIO device driver which guest
     will have access to a given mediated AP queue device until the mediated
     device's file descriptor is opened by the guest. Recall that the APQNs
     configured for the guest are an intersection of all of the bits set in
     both the APM and AQM, so the guest's APQNs can not be validated nor
     its SIE state description configured until all of the guest's mediated
     AP queue device file descriptors have been opened.

     For example, suppose a guest opens file descriptors for mediated AP
     queue devices representing APQNs 3,5 and 4,6. If bits 3 and 4 are set in
     the guest's APM and bits 5 and 6 are set in the guest's AQM, then APQNs
     (3,5), (3,6), (4,5) and (4,6) will be valid for the guest, but mediated
     AP queue devices have been created only for APQNs (3,5) and (4,6). In
     this case, APQNs still assigned to the host would also be available to
     the guest which is a potential security breach.

   * Control domains are not devices and are not logically modelled as
     mediated devices. In our original design, they were modelled as
     attributes of a mediated AP queue device, but this was a clumsy use of
     the VFIO mediated device model.

   * The SIE state description models the assignment of AP resources as a
     matrix via the APM, AQM and ADM.
The design we ultimately settled upon was modelled on the AP matrix as
defined by the SIE state description. Supplying the complete AP matrix
to SIE using bitmasks when starting a guest simplifies the code, is far
easier to secure, and more closely matches the model employed by SIE. This
is the design model implemented via this patch set.

The Design
----------
This design introduces four new objects:

1. AP matrix bus

    The sysfs location of the AP matrix bus is /sys/bus/ap_matrix. This
    bus will create a single AP matrix device (see below).

2. AP matrix device

    The AP matrix device is a singleton that hangs off of the AP matrix bus.
    This device holds the AP Queues that have been reserved for use by
    KVM guests. The sysfs location of the AP matrix device is
    /sys/devices/ap_matrix/matrix. It is also linked from the AP matrix
    bus at /sys/bus/ap_matrix/devices/matrix.

3. VFIO AP matrix driver

    This driver is based on the VFIO mediated device framework. When the
    driver is initialized, it will:

    * Get the AP matrix device created by AP matrix bus from the bus

    * Register with the AP bus to indicate that it can control AP Queue
      devices. This allows AP Queue devices unbound from AP device drivers
      to be bound to the VFIO AP matrix driver. The AP Queues bound to the
      VFIO AP matrix driver will be stored by the driver in the AP matrix
      device.

    * Register the AP matrix device with the VFIO mediated device
      framework (MDEV). Registration with MDEV will create the sysfs
      structures needed to create mediated matrix devices. Each MDEV matrix
      device is used to configure the AP matrix for a KVM guest. The MDEV
      matrix device's file descriptor can be used by QEMU to communicate
      with the VFIO AP matrix device driver.

    The VFIO AP matrix driver:

    * Provides the interfaces the administrator can use to secure AP Queues
      for use by KVM guests. This is accomplished by unbinding the AP Queues
      needed by each KVM guest from its AP device driver and binding it to
      the VFIO AP queue driver. This prevents the host linux system from
      using these Queues.

    * Provides an ioctl that can be used by QEMU to configure the
      CRYCB referenced by the KVM guest's SIE state description. The ioctl
      will

      * Create an EAPM, EAQM and EADM by performing a logical AND of the
        APM, AQM and ADM configured via the MDEV matrix device's sysfs
        attributes files (see below) with the APM, AQM and ADM of the host's
        SIE state description respectively.

      * Configure the SIE state description for the KVM guest using the
        effective masks created in the previous step.

4. VFIO MDEV matrix passthrough device

    An MDEV matrix passthrough device must be created for each KVM guest that
    will need access to AP facilities. An MDEV matrix passthrough device is
    used by QEMU to configure the APM, AQM and ADM fields of the CRYCB
    referenced by the KVM guest's SIE state description. The file descriptor
    for the MDEV matrix passthrough device provides the communication pathway
    between QEMU and the VFIO AP matrix device driver.

    The MDEV matrix passthrough device, like the CRYCB, contains three
    bitmasks - an APM, AQM and ADM - for specifying the AP matrix for the
    KVM guest. Three sets of attributes files will be provided to allow an
    administrator to set the bits in the MDEV matrix device's APM, AQM and
    ADM:

    * A file to assign an AP adapter
    * A file to unassign an AP adapter
    * A file to display the adapters assigned

    * A file to assign an AP domain
    * A file to unassign an AP domain
    * A file to display the domains assigned

    * A file to assign an AP control domain
    * A file to unassign an AP control domain
    * A file to display the control domains assigned

Example:
-------
Let's now provide an example to illustrate how KVM guests may be given
access to AP facilities. For this example, we will show how to configure
two guests such that executing the lszcrypt command on the guests would
look like this:

Guest1
------
CARD.DOMAIN TYPE  MODE
------------------------------
05          CEX5C CCA-Coproc
05.0004     CEX5C CCA-Coproc
05.00ab     CEX5C CCA-Coproc
06          CEX5A Accelerator
06.0004     CEX5A Accelerator
06.00ab     CEX5C CCA-Coproc

Guest2
------
CARD.DOMAIN TYPE  MODE
------------------------------
05          CEX5A Accelerator
05.0047     CEX5A Accelerator
05.00ff     CEX5A Accelerator

One thing to notice in this example is that each AP Queue set is identical.
For example, the two AP Queue sets for Guest1 both contain APQI 0004 and
00ab. It would be an invalid condition if both queue sets did not contain
the same set of queues. We could not, for example, configure Guest1 with
access to AP queue 05.00ff because the AP queue set for adapter 06 does not
contain AP queue 06.00ff. The point is, one must be careful to reserve
a valid set of AP queues for a given guest.
a valid configuration.

These are the steps for configuring the Guest1 and Guest2:
1. The first thing that needs to be done is to secure the AP queues to be
    used by the two guests so that the host can not access them. This is done
    by unbinding each AP Queue device from its respective AP driver. In our
    example, these queues are bound to the cex4queue driver. This would be
    the sysfs location of these devices:

    /sys/bus/ap
    --- [drivers]
    ------ [cex4queue]
    --------- [05.0004]
    --------- [05.0047]
    --------- [05.00ab]
    --------- [05.00ff]
    --------- [06.0004]
    --------- [06.00ab]
    --------- unbind

    To unbind AP queue 05.0004 from the cex4queue device driver:

        echo 05.0004 > unbind

    This must also be done for AP queues 05.00ab, 05.0047, 05.00ff, 06.0004,
    and 06.00ab.

2. The next step is to reserve the queues for use by the two KVM guests.
    This is accomplished by binding them to the VFIO AP matrix device driver.
    This is the sysfs location of the VFIO AP matrix device driver:

    /sys/bus/ap
    ---[drivers]
    ------ [vfio_ap_matrix]
    ---------- bind

    To bind queue 05.0004 to the vfio_ap_matrix driver:

        echo 05.0004 > bind

    This must also be done for AP queues 05.00ab, 05.0047, 05.00ff, 06.0004,
    and 06.00ab.

3. Create the mediated devices needed to configure the AP matrices for the
    two guests and to provide an interface to the vfio_ap_matrix driver for
    use by the guests:

    /sys/devices/
    --- [ap_matrix]
    ------ [matrix] (this is the matrix device)
    --------- [mdev_supported_types]
    ------------ [ap_matrix-passthrough] (passthrough mediated device type)
    --------------- create
    --------------- [devices]

    To create the mediated devices for the two guests:

        uuidgen > create
        uuidgen > create

    This will create two mediated devices in the [devices] subdirectory named
    with the UUID written to the create attribute file. We call them $uuid1
    and $uuid2:

    /sys/devices/
    --- [ap_matrix]
    ------ [matrix]
    --------- [mdev_supported_types]
    ------------ [ap_matrix-passthrough]
    --------------- [devices]
    ------------------ [$uuid1]
    --------------------- adapters
    --------------------- assign_adapter
    --------------------- assign_control_domain
    --------------------- assign_domain
    --------------------- control_domains
    --------------------- domains
    --------------------- unassign_adapter
    --------------------- unassign_control_domain
    --------------------- unassign_domain
    ------------------ [$uuid2]
    --------------------- adapters
    --------------------- assign_adapter
    --------------------- assign_control_domain
    --------------------- assign_domain
    --------------------- control_domains
    --------------------- domains
    --------------------- unassign_adapter
    --------------------- unassign_control_domain
    --------------------- unassign_domain

4. The administrator now needs to configure the matrices for mediated
    devices $uuid1 (for Guest1) and $uuid2 (for Guest2).

    This is how the matrix is configured for Guest1:

    echo 5 > assign_adapter
    echo 6 > assign_adapter
    echo 4 > assign_domain
    echo ab > assign_domain

    When the assign.xxx file is written, the corresponding bit in the
    respective MDEV matrix device's bitmask will be set. For example, when
    adapter 5 is assigned, bit 5 - numbered from left to right starting with
    bit 0 - will be set in the MDEV matrix device's APM.

    By architectural convention, all usage domains - i.e., domains assigned
    via the assign_domain attribute file - will also be configured in the ADM
    field of the KVM guest's CRYCB, so there is no need to assign control
    domains here unless you want to assign control domains that are not
    assigned as usage domains.

    If a mistake is made configuring an adapter, domain or control domain,
    you can use the unassign_xxx files to unassign the adapter, domain or
    control domain.

    To display the matrix configuration for Guest1:

    cat adapters
    cat domains
    cat control_domains

    This is how the matrix is configured for Guest2:

    echo 5 > assign_adapter
    echo 47 > assign_domain
    echo ff > assign_domain

When a KVM guest is started, QEMU will open the file descriptor for its
MDEV matrix device. The VFIO AP matrix device driver will be notified
and will store the reference to the KVM guest's SIE state description.
QEMU will then call the VFIO AP matrix ioctl requesting that the
KVM guest's matrix be configured. The matrix driver will set the bits in the
APM, AQM and ADM fields of the CRYCB referenced by the guest's SIE state
description from the EAPM, EAQM and EADM created by performing a logical AND
of the AP masks configured in the MDEV matrix device and the masks
configured in the host's SIE state description. When the guest comes up, it
will have access to the APQNs identified in the AP matrix specified in the
KVM guest's SIE state description. Programs running on the guest will then
be able to use the cryptographic functions provided by the AP facilities
configured for the guest.

Tony Krowiak (19):
   KVM: s390: SIE considerations for AP Queue virtualization
   KVM: s390: refactor crypto initialization
   s390/zcrypt: new AP matrix bus
   s390/zcrypt: create an AP matrix device on the AP matrix bus
   s390/zcrypt: base implementation of AP matrix device driver
   s390/zcrypt: register matrix device with VFIO mediated device
     framework
   KVM: s390: introduce AP matrix configuration interface
   s390/zcrypt: support for assigning adapters to matrix mdev
   s390/zcrypt: validate adapter assignment
   s390/zcrypt: sysfs interfaces supporting AP domain assignment
   s390/zcrypt: validate domain assignment
   s390/zcrypt: sysfs support for control domain assignment
   s390/zcrypt: validate control domain assignment
   KVM: s390: Connect the AP mediated matrix device to KVM
   s390/zcrypt: introduce ioctl access to VFIO AP Matrix driver
   KVM: s390: interface to configure KVM guest's AP matrix
   KVM: s390: validate input to AP matrix config interface
   KVM: s390: New ioctl to configure KVM guest's AP matrix
   s390/facilities: enable AP facilities needed by guest

  MAINTAINERS                                  |   13 +
  arch/s390/Kconfig                            |   13 +
  arch/s390/configs/default_defconfig          |    1 +
  arch/s390/configs/gcov_defconfig             |    1 +
  arch/s390/configs/performance_defconfig      |    1 +
  arch/s390/defconfig                          |    1 +
  arch/s390/include/asm/ap-config.h            |   32 +
  arch/s390/include/asm/kvm_host.h             |   26 +-
  arch/s390/kvm/Makefile                       |    2 +-
  arch/s390/kvm/ap-config.c                    |  224 ++++++++
  arch/s390/kvm/kvm-s390.c                     |   17 +-
  arch/s390/tools/gen_facilities.c             |    2 +
  drivers/s390/crypto/Makefile                 |    6 +-
  drivers/s390/crypto/ap_matrix_bus.c          |  115 ++++
  drivers/s390/crypto/ap_matrix_bus.h          |   25 +
  drivers/s390/crypto/vfio_ap_matrix_drv.c     |  107 ++++
  drivers/s390/crypto/vfio_ap_matrix_ops.c     |  790 ++++++++++++++++++++++++++
  drivers/s390/crypto/vfio_ap_matrix_private.h |   50 ++
  include/uapi/linux/vfio.h                    |   22 +
  19 files changed, 1438 insertions(+), 10 deletions(-)
  create mode 100644 arch/s390/include/asm/ap-config.h
  create mode 100644 arch/s390/kvm/ap-config.c
  create mode 100644 drivers/s390/crypto/ap_matrix_bus.c
  create mode 100644 drivers/s390/crypto/ap_matrix_bus.h
  create mode 100644 drivers/s390/crypto/vfio_ap_matrix_drv.c
  create mode 100644 drivers/s390/crypto/vfio_ap_matrix_ops.c
  create mode 100644 drivers/s390/crypto/vfio_ap_matrix_private.h





reply via email to

[Prev in Thread] Current Thread [Next in Thread]