[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-discuss] Getting qemu-system-i386 to use more than one core on

From: Jakob Bohm
Subject: Re: [Qemu-discuss] Getting qemu-system-i386 to use more than one core on Cortex A7 host
Date: Mon, 4 Jan 2016 14:24:20 +0100
User-agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.5.0

On 04/01/2016 13:21, Peter Maydell wrote:
On 3 January 2016 at 20:57, David Durham <address@hidden> wrote:
Any suggestions or comments on how to do this are very welcome
... I built qemu with --target-list i386-softmmu and when I run
qemu, top only shows one qemu-system-i386 using 100% of one core
This is expected. Our current emulation is single threaded
even when emulating multiple target CPUs, so we'll only
use one host core. (We do have some helper threads for a
few IO tasks etc but those are not cpu-bound.)

There is some development work in progress to try to
make better use of multi-core hosts but it's not very
far advanced yet. (Also emulating x86 guests on arm hosts
with multiple cpus might not ever be supported because
the x86 memory model would require barriers everywhere
and it's not clear it would overall improve performance.
ARM-on-x86 is the primary initial usecase.)

-- PMM

For your information, the x86 memory model only requires
barriers in the following cases (this is somewhat
implemented on modern machines with multiple actual x86
CPU sockets, as opposed to multicore chips, it may also
be observed when using any kind of DMA/bus-master
hardware such as GPUs):

1. Instructions with the explicit "LOCK" prefix, these
  require a memory barrier, then a locked read-modify-write
  on a single address, then another memory barrier.

2. Explicit memory barrier instructions (there have been
  a few over the years).

3. Some of the XCHG-family instructions implicitly behave
  as though there was a LOCK in front.

4. On modern CPUs, the floating point ("ESC") instructions
  are treated as normal instructions, the related historic
  "WAIT" opcode is now a NOP (optionally throwing an
  "FPU disabled" exception), (on 386 and older, floating
  point instructions might postpone their memory writes
  to any point up to and including the next same-CPU WAIT,
  but this was never a multi-CPU barrier, just
  synchronization between the CPU and FPU chips within
  each two-chip CPU).

5. Some specific operations (see the architecture manuals)
  typically associated with cache management, system calls
  and/or thread switching also act as barriers.

6. Only a minority of instructions flush the instruction
  decode (and hence TCG translation) buffers, though for
  highest consistency any actual write to a memory page
  with code should cause the translation of that code to
  be discarded from cache.

7. If doing cycle-accurate bug-for-bug emulation of
  specific CPU models, it might be necessary to exactly
  model the implicit size limitations of their various
  caches, such as how many page table entries are cached
  by the on-CPU TLB or how many bytes ahead the
  instruction decoder may look.  But I don't think that
  is a qemu feature anyway.

This still leaves the majority of code not doing memory barriers.


Jakob Bohm, CIO, Partner, WiseMo A/S.  https://www.wisemo.com
Transformervej 29, 2860 Søborg, Denmark.  Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded 

reply via email to

[Prev in Thread] Current Thread [Next in Thread]