[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Using All Cores of CPU on Snapdragon Processor during x86-to-ARM Use

From: Jakob Bohm
Subject: Re: Using All Cores of CPU on Snapdragon Processor during x86-to-ARM User Space Emulation
Date: Thu, 14 May 2020 12:31:15 +0200
User-agent: Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:68.0) Gecko/20100101 Thunderbird/68.8.0

On 13/05/2020 12:02, Alex Bennée wrote:
Vijay Daita <address@hidden> writes:


It is my understanding that one would be unable to do x86-to-ARM user space
emulation while utilizing all cores because of x86 barriers.
Actually the utilisation of multiple cores (often referred to at MTTCG)
is a function of system emulation and you are correct for x86-on-ARM we
don't enable MTTCG because we don't currently add barrier instructions
to fully emulate the x86 memory model. However for linux-user we have
always followed the guest threading model because the guest clone() is
passed down to the host. However because the memory modelling isn't
perfect you can run into problems because of the mismatch.

I wanted to
know if there is difference between what QEMU aims to do and using a
interpreter of sorts to convert x86 instructions directly to ARM
instructions so that when run on the system directly, the system can
decide, itself, how to apportion the task.
This is what the TCG does - it translates guest instructions into groups
of host instructions. We could insert the extra barriers for all loads
and stores but the effect would be to cripple performance. In an ideal
world we would only do these for the load/store instructions involved in
inter-thread synchronisation operations but that's a fairly tricky
problem to solve.
Especially because the x86 memory model traditionally has barrier/synchronization instructions automatically push through their ordering to all other cores/CPUs,
and as a result, "barrier load" wasn't really a thing until CMPXCHG was
introduced in a later CPU generation than the basic sync instructions
(8086) and cache coherency mechanisms (80486).  In fact, the LOCK barrier prefix
triggers an #UD exception with most load instructions.

The one exception to this lack was instruction decoding, where certain commonly used branch instructions were defined as implicitly picking up any changes in instruction memory.  This of cause corresponds to the TCG checking for needed
retranslation of buffers at those points.

Additionally, x86 barriers generally guarantee total ordering relative to the barrier operation of all memory accesses that occur before or after in program
order, which some other CPU families do not.

I am new to this, so sorry if
this doesn't make very much sense.

Thank you


Jakob Bohm, CIO, Partner, WiseMo A/S.  http://www.wisemo.com
Transformervej 29, 2860 Soborg, Denmark.  Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded

reply via email to

[Prev in Thread] Current Thread [Next in Thread]