qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC] Towards an Heterogeneous QEMU


From: Christian Pinto
Subject: Re: [Qemu-devel] [RFC] Towards an Heterogeneous QEMU
Date: Fri, 31 Jul 2015 18:23:17 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.8.0

Hello Cristopher,

On 31/07/2015 14:03, Christopher Covington wrote:
Hi Christian,

On 07/27/2015 09:54 AM, Christian Pinto wrote:
Hi all,

this message is to present, and get feedback, on a QEMU enhancement which we
are working on.  Most of the state-of-the-art SoCs use the heterogeneous
paradigm, in which a Master processor is surrounded by multiple (Slave) co-
processors (other CPUs, MCUs, hardware accelerators, etc) that usually share
the very same physical memory. An example is a multi-core ARM CPU working
alongside with two Cortex-M micro controllers.

 From the user point of view there is usually an operating system booting on
the Master processor (e.g. Linux) at platform startup, while the other
processors are used to offload the Master one from some computation or to deal
with real-time interfaces. It is the Master OS that triggers the boot of the
Slave processors, and provides them also the binary code to execute (e.g.
RTOS, binary firmware) by placing it into a pre-defined memory area that is
accessible to the Slaves. Usually the memory for the Slaves is carved out from
the Master OS during boot. Once a Slave is booted the two processors can
communicate through queues in shared memory and inter-processor interrupts
(IPIs). In Linux, it is the remoteproc/rpmsg framework that enables the
control (boot/shutdown) of Slave processors, and also to establish a
communication channel based on virtio queues.

Currently, QEMU is not able to model such an architecture, mainly because only
a single processor can be emulated at one time, and the OS binary image needs
to be placed in memory at model startup.

We are working on some extensions in QEMU, that enable Heterogeneous SoCs
modeling. In our proposal each processor of the target Heterogeneous SoC is
represented by a separate QEMU process, one of which will act as the Master of
the target platform. The physical shared memory abstraction is created by
leveraging on Posix shared memory. At model boot the Master QEMU will allocate
the whole memory of the target platform as a Posix shared memory segment, by
using the hostmem-file backend. The Slave QEMU instances, instead, will not
allocate any memory but wait, over a Unix socket, to receive the file
descriptor of the Posix shared memory segment allocated by the Master and an
offset. Once received, the file descriptor is mmap-ed starting from the
received offset and used as memory backend for the Slave instance. For a Slave
QEMU instance a new memory backend will be defined, to receive the file
descriptor from a socket instead of allocating the RAM of the model from a
file or regular memory.

To resemble the behavior of a real platform, the Slave QEMU instances will not
jump into the target code until the information on the memory to be used is
received from the Master. This happens only when at a certain point during
execution, an application running on the Master OS needs to use one of the co-
processors and triggers its boot. The initialization and boot phase of a Slave
QEMU will differ from the regular one in the following:

- No RAM memory is allocated for the model.
- No binary image is copied into memory.
- After the model initialization is complete, QEMU will jump into a wait state
   in which no code is executed (since the memory is not yet available).

When the Slave receives the fd and offset of its memory into the platform one,
it will find into such memory also the binary image to be executed and any
other information needed to complete the boot process. The Slave QEMU
instances will mmap the shared memory segment only starting from a specific
offset, thus there will be no possibility for them to corrupt the Master memory
since it will not be visible to the target Slave OS.

Finally a new QEMU device, the Interrupt Distribution Module (IDM), will be
implemented to model a hardware mailbox/inter processor interrupt module, to be
used to send interrupts across all the QEMU instances involved in the
heterogeneous model. Such module will be based on eventfd, whose file
descriptors are exchanged with the Master using a Unix domain socket. Each QEMU
instance participating to the heterogeneous model will embed this new hardware
module into its memory map. As an example, such hardware mailboxes and IPI
modules are used in real rpmsg applications to signal with an interrupt the kick
of a virtio queue to a remote processor.

The proposed changes are to be considered as the minimal building blocks to
enable
the emulation of an Heterogeneous SoC, that allow programmers to experiment with
various intra-SoC communication frameworks (e.g. remoteproc/rpmsg) and perform a
functional validation of their drivers and software targeting a heterogeneous
SoC.
How does this multiprocess architecture compare to current efforts for
multithreaded TCG?
The multi-threaded TCG work is to be considered as orthogonal to what we propose here, in the sense that if one wants to model, through our extensions, a heterogeneous system with multi-core master + a multi-core slave it will still be possible to exploit the multi-threaded
TCG on both QEMU instances to obtain higher performance.
None of the two works excludes the other.

Do you anticipate needing a mechanism to keep processes roughly in sync with
each other, so that one doesn't unrealistically get way far ahead of the rest?
For the use-case scenario we are looking at, a remoteproc/rpmsg type of communication, we don't see the need for synchronization between the processes. In such type of interaction in fact two (or more) processors exchange messages using explicit synchronization points (e.g., virtio queues kicks through inter-processor interrupts), and do not rely on global
timers or shared time-based resources.

Do you see any use-case where the two processes might need to be synchronized?


Christian

Thanks,
Christopher Covington





reply via email to

[Prev in Thread] Current Thread [Next in Thread]