Re: [Qemu-devel] outlined TLB lookup on x86

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] outlined TLB lookup on x86

From:	Lluís Vilanova
Subject:	Re: [Qemu-devel] outlined TLB lookup on x86
Date:	Wed, 27 Nov 2013 14:12:58 +0100
User-agent:	Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux)

Xin Tong writes:

> I am trying to implement a out-of-line TLB lookup for QEMU softmmu-x86-64 on
> x86-64 machine, potentially for better instruction cache performance, I have a
> few questions.

> 1. I see that tcg_out_qemu_ld_slow_path/tcg_out_qemu_st_slow_path are 
> generated
> when tcg_out_tb_finalize is called. And when a TLB lookup misses, it jumps to
> the generated slow path and slow path refills the TLB, then load/store and 
> jumps
> to the next emulated instruction. I am wondering is it easy to outline the 
> code
> for the slow path. I am thinking when a TLB misses, the outlined TLB lookup 
> code
> should generate a call out to the qemu_ld/st_helpers[opc & ~MO_SIGN] and 
> rewalk
> the TLB after its refilled ? This code is off the critical path, so its not as
> important as the code when TLB hits.
> 2. why not use a TLB or bigger size? currently the TLB has 1<<8 entries. the 
> TLB
> lookup is 10 x86 instructions , but every miss needs ~450 instructions, i
> measured this using Intel PIN. so even the miss rate is low (say 3%) the 
> overall
> time spent in the cpu_x86_handle_mmu_fault is still signifcant. I am thinking
> the tlb may need to be organized in a set associative fashion to reduce 
> conflict
> miss, e.g. 2 way set associative to reduce the miss rate. or have a victim tlb
> that is 4 way associative and use x86 simd instructions to do the lookup once
> the direct-mapped tlb misses. Has anybody done any work on this front ?
> 3. what are some of the drawbacks of using a superlarge TLB, i.e. a TLB with 
> 4K
> entries ?

Using vector intrinsics for the TLB lookup will probably make the code less
portable. I don't know how compatible are the GCC and LLVM vectorizing
intrinsics between each other (since there has been some efforts on making QEMU
also compile with LLVM).

A larger TLB will make some operations slower (e.g., look for CPU_TLB_SIZE in
cputlb.c), but the higher hit ratio could pay off, although I don't know how the
current size was chosen.


Lluis

-- 
 "And it's much the same thing with knowledge, for whenever you learn
 something new, the whole world becomes that much richer."
 -- The Princess of Pure Reason, as told by Norton Juster in The Phantom
 Tollbooth

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] outlined TLB lookup on x86, Xin Tong, 2013/11/27
- Re: [Qemu-devel] outlined TLB lookup on x86, Lluís Vilanova <=
  - Re: [Qemu-devel] outlined TLB lookup on x86, Xin Tong, 2013/11/27
    - Re: [Qemu-devel] outlined TLB lookup on x86, Lluís Vilanova, 2013/11/28
- Re: [Qemu-devel] outlined TLB lookup on x86, Richard Henderson, 2013/11/27
  - Re: [Qemu-devel] outlined TLB lookup on x86, Xin Tong, 2013/11/27

Prev by Date: [Qemu-devel] [Bug 659351] Re: QEMU uses obsolete gethostbyname and inet_aton rather than getaddrinfo
Next by Date: [Qemu-devel] Win 8 Driver for QEMU USB Hub?
Previous by thread: [Qemu-devel] outlined TLB lookup on x86
Next by thread: Re: [Qemu-devel] outlined TLB lookup on x86
Index(es):
- Date
- Thread