Re: [Qemu-devel] [RFC PATCH 0/8] Towards an Heterogeneous QEMU

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH 0/8] Towards an Heterogeneous QEMU

From:	Peter Crosthwaite
Subject:	Re: [Qemu-devel] [RFC PATCH 0/8] Towards an Heterogeneous QEMU
Date:	Thu, 1 Oct 2015 09:26:42 -0700

On Tue, Sep 29, 2015 at 6:57 AM, Christian Pinto
<address@hidden> wrote:
> Hi all,
>
> This RFC patch-series introduces the set of changes enabling the
> architectural elements to model the architecture presented in a previous RFC
> letter: "[Qemu-devel][RFC] Towards an Heterogeneous QEMU".
>
> To recap the goal of such RFC:
>
> The idea is to enhance the current architecture of QEMU to enable the modeling
> of a state of-the-art SoC with an AMP processing style, where different
> processing units share the same system memory and communicate through shared
> memory and inter-processor interrupts.

This might have a lot in common with a similar inter-qemu
communication solution effort at Xilinx. Edgar talks about it at KVM
forum:

https://www.youtube.com/watch?v=L5zG5Aukfek

Around 18:30 mark. I think it might be lower level that your proposal,
remote-port is designed to export the raw hardware interfaces (busses
and pins) between QEMU and some other system, another QEMU being the
common use cases.

> An example is a multi-core ARM CPU
> working alongside with two Cortex-M micro controllers.
>

Marcin is doing something with A9+M3. It sounds like he already has a
lot working (latest emails were on some finer points). What is the
board/SoC in question here (if you are able to share)?

> From the user point of view there is usually an operating system booting on
> the Master processor (e.g. Linux) at platform startup, while the other
> processors are used to offload the Master one from some computation or to deal
> with real-time interfaces.

I feel like this is architecting hardware based on common software use
cases, rather than directly modelling the SoC in question. Can we
model the hardware (e.g. the devices that are used for rpmesg and IPIs
etc.) as regular devices, as it is in-SoC? That means AMP is just
another guest?

> It is the Master OS that triggers the boot of the
> Slave processors, and provides them also the binary code to execute (e.g.
> RTOS, binary firmware) by placing it into a pre-defined memory area that is
> accessible to the Slaves. Usually the memory for the Slaves is carved out from
> the Master OS during boot. Once a Slave is booted the two processors can
> communicate through queues in shared memory and inter-processor interrupts
> (IPIs). In Linux, it is the remoteproc/rpmsg framework that enables the
> control (boot/shutdown) of Slave processors, and also to establish a
> communication channel based on virtio queues.
>
> Currently, QEMU is not able to model such an architecture mainly because only
> a single processor can be emulated at one time,

SMP does work already. MTTCG will remove the one-run-at-a-time
limitation. Multi-arch will allow you to mix multiple CPU
architectures (e.g. PPC + ARM in same QEMU). But multiple
heterogeneous ARMs should already just work, and there is already an
in-tree precedent with the xlnx-zynqmp SoC. That SoC has 4xA53 and
2xR5 (all ARM).

Multiple system address spaces and CPUs have different views of the
address space is another common snag on this effort, and is discussed
on a recent thread between myself and Marcin.

> and the OS binary image needs
> to be placed in memory at model startup.
>

I don't see what this limitation is exactly. Can you explain more? I
do see a need to work on the ARM bootloader for AMP flows, it is a
pure SMP bootloader than assumes total control.

Can this effort be a bootloader overhaul? Two things:

1: The bootloader needs to repeatable
2: The bootloaders need to be targetable (to certain CPUs or clusters)

> This patch series adds a set of modules and introduces minimal changes to the
> current QEMU code-base to implement what described above, with master and 
> slave
> implemented as two different instances of QEMU. The aim of this work is to
> enable application and runtime programmers to test their AMP applications, or
> their new inter-SoC communtication protocol.
>
> The main changes are depicted in the following diagram and involve:
>     - A new multi-client socket implementation that allows multiple instances 
> of
>       QEMU to attach to the same socket, with only one acting as a master.
>     - A new memory backend, the shared memory backend, based on
>       the file memory backend. Such new backend enables, on the master side,
>       to allocate the whole memory as shareable (e.g. /dev/shm, or hugetlbfs).
>       On the slave side it enables the startup of QEMU without any main memory
>       allocated. The the slave goes in a waiting state, the same used in the
>       case of an incoming migration, and a callback is registered on a
>       multi-client socket shared with the master.
>       The waiting state ends when the master sends to the slave the file
>       descriptor and offset to mmap and use as memory.

This is useful in it's own right and came up in the Xilinx implementation.

>     - A new inter-processor interrupt hardware distribution module, that is 
> used
>       also to trigger the boot of slave processors. Such module uses a pair of
>       eventfd for each master-slave couple to trigger interrupts between the
>       instances. No slave-to-slave interrupts are envisioned by the current
>       implementation.

Wouldn't that just be a software interrupt in the local QEMU instance?

> The multi client-socket is used for the master to trigger
>       the boot of a slave, and also for each master-slave couple to exchancge 
> the
>       eventd file descriptors. The IDM device can be instantiated either as a
>       PCI or sysbus device.
>

So if everything is is one QEMU, IPIs can be implemented with just a
regular interrupt controller (which has a software set).

>
>                            Memory
>                            (e.g. hugetlbfs)
>
> +------------------+       +--------------+            +------------------+
> |                  |       |              |            |                  |
> |   QEMU MASTER    |       |   Master     |            |   QEMU SLAVE     |
> |                  |       |   Memory     |            |                  |
> | +------+  +------+-+     |              |          +-+------+  +------+ |
> | |      |  |SHMEM   |     |              |          |SHMEM   |  |      | |
> | | VCPU |  |Backend +----->              |    +----->Backend |  | VCPU | |
> | |      |  |        |     |              |    | +--->        |  |      | |
> | +--^---+  +------+-+     |              |    | |   +-+------+  +--^---+ |
> |    |             |       |              |    | |     |            |     |
> |    +--+          |       |              |    | |     |        +---+     |
> |       | IRQ      |       | +----------+ |    | |     |    IRQ |         |
> |       |          |       | |          | |    | |     |        |         |
> |  +----+----+     |       | | Slave    <------+ |     |   +----+---+     |
> +--+  IDM    +-----+       | | Memory   | |      |     +---+ IDM    +-----+
>    +-^----^--+             | |          | |      |         +-^---^--+
>      |    |                | +----------+ |      |           |   |
>      |    |                +--------------+      |           |   |
>      |    |                                      |           |   |
>      |    +--------------------------------------+-----------+   |
>      |   UNIX Domain Socket(send mem fd + offset, trigger boot)  |
>      |                                                           |
>      +-----------------------------------------------------------+
>                               eventfd
>

So the slave can only see a subset of the masters memory? Is the
masters memory just the full system memory and the master is doing
IOMMU setup for the slave pre-boot? Or is it a hard feature of the
physical SoC?

>
> The whole code can be checked out from:
> https://git.virtualopensystems.com/dev/qemu-het.git
> branch:
> qemu-het-rfc-v1
>
> Patches apply to the current QEMU master branch
>
> =========
> Demo
> =========
>
> This patch series comes in the form of a demo to better understand how the
> changes introduced can be exploited.
> At the current status the demo can be executed using an ARM target for both
> master and slave.
>
> The demo shows how a master QEMU instance carves out the memory for a slave,
> copies inside linux kernel image and device tree blob and finally triggers the
> boot.
>

These processes must have underlying hardware implementation, is the
master using a system controller to implement the slave boot? (setting
reset and entry points via registers?). How hard are they to model as
regular devs?

>
> How to reproduce the demo:
>
> In order to reproduce the demo a couple more extra elements need to be
> downloaded and compiled.
>
> Binary loader
> Loads the slave firmware (kernel) binary into memory and triggers the boot
> https://git.virtualopensystems.com/dev/qemu-het-tools.git
> branch:
> load-bin-boot
> To compile: just type "make"
>
> Slave kernel
> Compile a linux kernel image (zImage) for the virt machine model.
>
> IDM test kernel module
> Needed to trigger the boot of a slave
> https://git.virtualopensystems.com/dev/qemu-het-tools.git
> branch:
> IDM-kernel-module
> To compile: KDIR=kernel_path ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- make
>
> Slave DTB
> https://git.virtualopensystems.com/dev/qemu-het-tools.git
> branch:
> slave-dtb
>
> Copy binary loader, IDM kernel module, zImage and dtb inside the disk
> image or ramdisk of the master instance.
>
> Run the demo:
>
> run the master instance
>
> ./arm-softmmu/qemu-system-arm \
>     -kernel zImage \
>     -M virt -cpu cortex-a15 \
>     -drive if=none,file=disk.img,cache=writeback,id=foo1 \
>     -device virtio-blk-device,drive=foo1 \
>     -object multi-socket-backend,id=foo,listen,path=ms_socket \
>     -object 
> memory-backend-shared,id=mem,size=1G,mem-path=/mnt/hugetlbfs,chardev=foo,master=on,prealloc=on
>   \
>     -device idm_ipi,master=true,memdev=mem,socket=foo \
>     -numa node,memdev=mem -m 1G \
>     -append "root=/dev/vda rw console=ttyAMA0 mem=512M 
> memmap=512M$0x60000000" \
>     -nographic
>
> run the slave instance
>
> ./arm-softmmu/qemu-system-arm\
>     -M virt -cpu cortex-a15 -machine slave=on \
>     -drive if=none,file=disk.img,cache=writeback,id=foo1 \
>     -device virtio-blk-device,drive=foo1 \
>     -object multi-socket-backend,id=foo,path=ms_socket \
>     -object 
> memory-backend-shared,id=mem,size=512M,mem-path=/mnt/hugetlbfs,chardev=foo,master=off
>  \
>     -device idm_ipi,master=false,memdev=mem,socket=foo \
>     -incoming "shared:mem" -numa node,memdev=mem -m 512M \
>     -nographic
>
>
> For simplicity, use a disk image for the slave instead of a ramdisk.
>
> As visible from the kernel boot arguments, the master is booted with mem=512
> so that one half of the whole memory allocated is not used by the master and
> reserved for the slave. Such memory starts for the virt platform from
> address 0x60000000.
>
> Once the master is booted the image of the kernel and DTB can be copied in the
> memory carved out for the slave.
>
> In the maser console
>
> probe the IDM kernel module:
>
> $ insmod idm_test_mod.ko
>
> run the application that copies the binaries into memory and triggers the 
> boot:
>
> $ ./load_bin_app 1 ./zImage ./slave.dtb
>
>
> On the slave console the linux kernel boot should be visible.
>
> The present demo is intended only as a demonstration to see the patch-set at
> work. In the near future, boot triggering, memory carveout and binary copy 
> might
> be implemented in a remoteproc driver coupled with a RPMSG driver for
> communication between master and slave instance.
>

So are these drivers the same ones as run on the real hardware? is
there value in the fact that the real IPI mechanisms are replaced with
virt ones?

Regards,
Peter

>
> This work has been sponsored by Huawei Technologies Duesseldorf GmbH.
>
> Baptiste Reynal (3):
>   backend: multi-socket
>   backend: shared memory backend
>   migration: add shared migration type
>
> Christian Pinto (5):
>   hw/misc: IDM Device
>   hw/arm: sysbus-fdt
>   qemu: slave machine flag
>   hw/arm: boot
>   qemu: numa
>
>  backends/Makefile.objs             |   4 +-
>  backends/hostmem-shared.c          | 203 ++++++++++++++++++
>  backends/multi-socket.c            | 353 +++++++++++++++++++++++++++++++
>  default-configs/arm-softmmu.mak    |   1 +
>  default-configs/i386-softmmu.mak   |   1 +
>  default-configs/x86_64-softmmu.mak |   1 +
>  hw/arm/boot.c                      |  13 ++
>  hw/arm/sysbus-fdt.c                |  60 ++++++
>  hw/core/machine.c                  |  27 +++
>  hw/misc/Makefile.objs              |   2 +
>  hw/misc/idm.c                      | 416 
> +++++++++++++++++++++++++++++++++++++
>  include/hw/boards.h                |   2 +
>  include/hw/misc/idm.h              | 119 +++++++++++
>  include/migration/migration.h      |   2 +
>  include/qemu/multi-socket.h        | 124 +++++++++++
>  include/sysemu/hostmem-shared.h    |  61 ++++++
>  migration/Makefile.objs            |   2 +-
>  migration/migration.c              |   2 +
>  migration/shared.c                 |  32 +++
>  numa.c                             |  17 +-
>  qemu-options.hx                    |   5 +-
>  util/qemu-config.c                 |   5 +
>  22 files changed, 1448 insertions(+), 4 deletions(-)
>  create mode 100644 backends/hostmem-shared.c
>  create mode 100644 backends/multi-socket.c
>  create mode 100644 hw/misc/idm.c
>  create mode 100644 include/hw/misc/idm.h
>  create mode 100644 include/qemu/multi-socket.h
>  create mode 100644 include/sysemu/hostmem-shared.h
>  create mode 100644 migration/shared.c
>
> --
> 1.9.1
>
>

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] [RFC PATCH 0/8] Towards an Heterogeneous QEMU, Christian Pinto, 2015/10/07
- [Qemu-devel] [RFC PATCH 6/8] qemu: slave machine flag, Christian Pinto, 2015/10/07
- [Qemu-devel] [RFC PATCH 2/8] backend: shared memory backend, Christian Pinto, 2015/10/07
- [Qemu-devel] [RFC PATCH 1/8] backend: multi-socket, Christian Pinto, 2015/10/07
- [Qemu-devel] [RFC PATCH 4/8] hw/misc: IDM Device, Christian Pinto, 2015/10/07
- [Qemu-devel] [RFC PATCH 5/8] hw/arm: sysbus-fdt, Christian Pinto, 2015/10/07
- [Qemu-devel] [RFC PATCH 3/8] migration: add shared migration type, Christian Pinto, 2015/10/07
- [Qemu-devel] [RFC PATCH 7/8] hw/arm: boot, Christian Pinto, 2015/10/07
- [Qemu-devel] [RFC PATCH 8/8] qemu: numa, Christian Pinto, 2015/10/07
- Re: [Qemu-devel] [RFC PATCH 0/8] Towards an Heterogeneous QEMU, Peter Crosthwaite <=
  - Re: [Qemu-devel] [RFC PATCH 0/8] Towards an Heterogeneous QEMU, Christian Pinto, 2015/10/08
    - Re: [Qemu-devel] [RFC PATCH 0/8] Towards an Heterogeneous QEMU, Peter Crosthwaite, 2015/10/08
    - Re: [Qemu-devel] [RFC PATCH 0/8] Towards an Heterogeneous QEMU, Christian Pinto, 2015/10/22
    - Re: [Qemu-devel] [RFC PATCH 0/8] Towards an Heterogeneous QEMU, Peter Crosthwaite, 2015/10/25
    - Re: [Qemu-devel] [RFC PATCH 0/8] Towards an Heterogeneous QEMU, mar.krzeminski, 2015/10/26
    - Re: [Qemu-devel] [RFC PATCH 0/8] Towards an Heterogeneous QEMU, Peter Crosthwaite, 2015/10/26
    - Re: [Qemu-devel] [RFC PATCH 0/8] Towards an Heterogeneous QEMU, Christian Pinto, 2015/10/27

Prev by Date: Re: [Qemu-devel] QEMU fw_cfg DMA interface
Next by Date: Re: [Qemu-devel] About the sd card reader
Previous by thread: [Qemu-devel] [RFC PATCH 8/8] qemu: numa
Next by thread: Re: [Qemu-devel] [RFC PATCH 0/8] Towards an Heterogeneous QEMU
Index(es):
- Date
- Thread