[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH 00/10] TCG optimizations for 2.10
From: |
Alex Bennée |
Subject: |
Re: [Qemu-devel] [PATCH 00/10] TCG optimizations for 2.10 |
Date: |
Wed, 12 Apr 2017 11:03:25 +0100 |
User-agent: |
mu4e 0.9.19; emacs 25.2.15 |
Emilio G. Cota <address@hidden> writes:
> Hi all,
>
> This series is aimed at 2.10 or beyond. Its goal is to improve
> TCG performance by optimizing:
>
> 1- Cross-page direct jumps (softmmu only, obviously). Patches 1-4.
> 2- Indirect branches (softmmu and user-mode). Patches 5-9.
> 3- tb_jmp_cache hashing in user-mode. Patch 10.
>
> I decided to work on this after reading this paper [1] (code at [2]),
> which among other optimizations it proposes solutions for 1 and 2.
> I followed the same overall scheme they follow, that is to use helpers
> to check whether the target vaddr is valid, and if so, jump to its
> corresponding translated code (host address) without having to go back
> to the exec loop. My implementation differs from that in the paper
> in that it uses tb_jmp_cache instead of adding more caches,
> which is simpler and probably more resilient in environments
> where TLB invalidations are frequent (in the paper they acknowledge
> that they limited background processes to a minimum, which isn't
> realistic).
Hi Emilio,
If you want to get some numbers on TLB invalidations please have a look
at my WIP branch:
https://github.com/stsquad/qemu/tree/misc/tlb-flush-stats
It's mainly an experiment at how easy it is to extract number data using
QEMU's trace subsystem (it turns out pretty easy). I had started looking
at the execution trace but got a little bogged down with re-implementing
hashes in python - it would be nice if we could just ctype dll load the
C implementation (or maybe just save the computed hashes in another
trace point rather than inferring via exec_tb).
>
> These changes require modifications on the targets and, for optimization
> number 2, a new TCG opcode to jump to a host address contained in a register.
>
> For now I only implemented this for the i386 and arm targets, and
> the i386 TCG backend. Other targets/backends can easily opt-in.
>
> The 3rd optimization is implemented in the last patch: it improves
> tb_jmp_cache hashing for user-mode by removing the requirement of
> being able to clear parts of the cache given a page number, since this
> requirement only applies to softmmu.
>
> The series applies cleanly on top of 95b31d709ba34.
>
> The commit logs include many measurements, performed using SPECint06 and
> NBench from dbt-bench[3].
>
> Feedback welcome! Thanks,
Given my notes above I think it would be worthwhile coming up with some
trace-points in the helpers and hash lookups so we can analyse their
behaviour as well as just looking at the performance improvement in
benchmarks.
>
> Emilio
>
> [1] "Optimizing Control Transfer and Memory Virtualization
> in Full System Emulators", Ding-Yong Hong, Chun-Chen Hsu, Cheng-Yi Chou,
> Wei-Chung Hsu, Pangfeng Liu, Jan-Jan Wu. ACM TACO, Jan. 2016.
> http://www.iis.sinica.edu.tw/page/library/TechReport/tr2015/tr15002.pdf
>
> [2] https://github.com/tkhsu/quick-android-emulator/tree/quick-qemu
>
> [3] https://github.com/cota/dbt-bench
--
Alex Bennée
- Re: [Qemu-devel] [PATCH 05/10] tcg: add jr opcode, (continued)
- Re: [Qemu-devel] [PATCH 00/10] TCG optimizations for 2.10,
Alex Bennée <=