qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] outlined TLB lookup on x86


From: Xin Tong
Subject: [Qemu-devel] outlined TLB lookup on x86
Date: Wed, 27 Nov 2013 16:41:27 +0900

I am trying to implement a out-of-line TLB lookup for QEMU softmmu-x86-64 on x86-64 machine, potentially for better instruction cache performance, I have a few  questions.

1. I see that tcg_out_qemu_ld_slow_path/tcg_out_qemu_st_slow_path are generated when tcg_out_tb_finalize is called. And when a TLB lookup misses, it jumps to the generated slow path and slow path refills the TLB, then load/store and jumps to the next emulated instruction. I am wondering is it easy to outline the code for the slow path. I am thinking when a TLB misses, the outlined TLB lookup code should generate a call out to the qemu_ld/st_helpers[opc & ~MO_SIGN] and rewalk the TLB after its refilled ? This code is off the critical path, so its not as important as the code when TLB hits.
2. why not use a TLB or bigger size?  currently the TLB has 1<<8 entries. the TLB lookup is 10 x86 instructions , but every miss needs ~450 instructions, i measured this using Intel PIN. so even the miss rate is low (say 3%) the overall time spent in the cpu_x86_handle_mmu_fault is still signifcant.  I am thinking the tlb may need to be organized in a set associative fashion to reduce conflict miss, e.g. 2 way set associative to reduce the miss rate. or have a victim tlb that is 4 way associative and use x86 simd instructions to do the lookup once the direct-mapped tlb misses. Has anybody done any work on this front ?
3. what are some of the drawbacks of using a superlarge TLB, i.e. a TLB with 4K entries ?

Xin

reply via email to

[Prev in Thread] Current Thread [Next in Thread]