[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [RFC v1 01/11] tcg: move tb_find_fast outside the tb_lo
Re: [Qemu-devel] [RFC v1 01/11] tcg: move tb_find_fast outside the tb_lock critical section
Mon, 21 Mar 2016 22:08:06 +0000
On 21 March 2016 at 21:50, Emilio G. Cota <address@hidden> wrote:
> This function, as is, doesn't really just "find"; two concurrent "finders"
> could race here by *writing* to the head of the list at the same time.
> The fix is to get rid of this write entirely; moving the just-found TB to
> the head of the list is not really that necessary thanks to the CPU's
> tb_jmp_cache table. This fix would make the function read-only, which
> is what the function's name implies.
It is not _necessary_, but it is a performance optimization to
speed up the "missed in the TLB" case. (A TLB flush will wipe
the tb_jmp_cache table.) From the thread where the move-to-front-of-list
behaviour was added in 2010, benefits cited:
# The exact numbers depend on complexity of guest system.
# - For basic Debian system (no X-server) on versatilepb we observed
# 25% decrease of boot time.
# - For to-be released Samsung LIMO platform on S5PC110 board we
# observed 2x (for older version) and 3x (for newer version)
# decrease of boot time.
# - Small CPU-intensive benchmarks are not affected because they are
# completely handled by 'tb_find_fast'.
# We also noticed better response time for heavyweight GUI applications,
# but I do not know how to measure it accurately.
I think what's happening here is that for guest CPUs where TLB
invalidation happens fairly frequently (notably ARM, because
we don't model ASIDs in the QEMU TLB and thus have to flush
the TLB on any context switch) the case of "we didn't hit in
the TLB but we do have this TB and it was used really recently"
happens often enough to make it worthwhile for the
tb_find_physical() code to keep its hash buckets in LRU order.
Obviously that's all five year old data now, so a pinch of
salt may be indicated, but I'd rather we didn't just remove
the optimisation without some benchmarking to check that it's
not significant. A 2x difference is huge.
[Qemu-devel] [RFC v1 04/11] tcg: protect TBContext with tb_lock., Alex Bennée, 2016/03/18
[Qemu-devel] [RFC v1 08/11] tcg: add kick timer for single-threaded vCPU emulation, Alex Bennée, 2016/03/18
[Qemu-devel] [RFC v1 07/11] tcg: add options for enabling MTTCG, Alex Bennée, 2016/03/18
[Qemu-devel] [RFC v1 10/11] tcg: grab iothread lock in cpu-exec interrupt handling, Alex Bennée, 2016/03/18