[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Using All Cores of CPU on Snapdragon Processor during x86-to-ARM Use
Re: Using All Cores of CPU on Snapdragon Processor during x86-to-ARM User Space Emulation
Thu, 14 May 2020 12:31:15 +0200
Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:68.0) Gecko/20100101 Thunderbird/68.8.0
On 13/05/2020 12:02, Alex Bennée wrote:
Especially because the x86 memory model traditionally has
instructions automatically push through their ordering to all other
Vijay Daita <address@hidden> writes:
It is my understanding that one would be unable to do x86-to-ARM user space
emulation while utilizing all cores because of x86 barriers.
Actually the utilisation of multiple cores (often referred to at MTTCG)
is a function of system emulation and you are correct for x86-on-ARM we
don't enable MTTCG because we don't currently add barrier instructions
to fully emulate the x86 memory model. However for linux-user we have
always followed the guest threading model because the guest clone() is
passed down to the host. However because the memory modelling isn't
perfect you can run into problems because of the mismatch.
I wanted to
know if there is difference between what QEMU aims to do and using a
interpreter of sorts to convert x86 instructions directly to ARM
instructions so that when run on the system directly, the system can
decide, itself, how to apportion the task.
This is what the TCG does - it translates guest instructions into groups
of host instructions. We could insert the extra barriers for all loads
and stores but the effect would be to cripple performance. In an ideal
world we would only do these for the load/store instructions involved in
inter-thread synchronisation operations but that's a fairly tricky
problem to solve.
and as a result, "barrier load" wasn't really a thing until CMPXCHG was
introduced in a later CPU generation than the basic sync instructions
(8086) and cache coherency mechanisms (80486). In fact, the LOCK
triggers an #UD exception with most load instructions.
The one exception to this lack was instruction decoding, where certain
used branch instructions were defined as implicitly picking up any
instruction memory. This of cause corresponds to the TCG checking for
retranslation of buffers at those points.
Additionally, x86 barriers generally guarantee total ordering relative
barrier operation of all memory accesses that occur before or after in
order, which some other CPU families do not.
I am new to this, so sorry if
this doesn't make very much sense.
Jakob Bohm, CIO, Partner, WiseMo A/S. http://www.wisemo.com
Transformervej 29, 2860 Soborg, Denmark. Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded