Re: [Qemu-devel] [PATCH] cpu: implementing victim TLB for QEMU system em

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] cpu: implementing victim TLB for QEMU system em

From:	Xin Tong
Subject:	Re: [Qemu-devel] [PATCH] cpu: implementing victim TLB for QEMU system emulated TLB
Date:	Thu, 23 Jan 2014 07:50:08 -0600

On Thu, Jan 23, 2014 at 5:23 AM, Alex Bennée <address@hidden> wrote:
>
> address@hidden writes:
>
>> This patch adds a victim TLB to the QEMU system mode TLB.
>>
>> QEMU system mode page table walks are expensive. Taken by running QEMU
>> qemu-system-x86_64 system mode on Intel PIN , a TLB miss and walking a
>> 4-level page tables in guest Linux OS takes ~450 X86 instructions on
>> average.
> <snip>
>>
>> Attached are some performance results taken on SPECINT2006 train
>> dataset and a Intel(R) Xeon(R) CPU  E5620  @ 2.40GHz Linux machine. In
>> summary, victim TLB improves the performance of qemu-system-x86_64 by
>> 11% on average on SPECINT2006 and with highest improvement of in 254%
>> in
>> 464.h264ref. And victim TLB does not result in any performance
>> degradation in any of the measured benchmarks. Furthermore, the
>> implemented victim TLB is architecture independent and is expected to
>> benefit other architectures in QEMU as well.
>>
>> Although there are measurement fluctuations, the performance
>> improvement are very significant and by no means in the range of
>> noises.
> <snip>
>
> I'm curious as the implication seems to be that entries are evicted from
> initial TLB lookup before they are "done". What would the impact be of
> simply growing the size of the main TLB cache?

Growing the size of the TLB gives significant performance improvement
as well, i have an incomplete set of numbers. but with the numbers i
have, i see significant performance improvement. With this being said,
victim tlb is still a nice addition as no matter how big you make the
TLB, there will always be conflict misses due to the low associativity
of the directly mapped tlb table.

>
> What's the current state of instrumentation around the system TLB
> handling? Can we trace the hit rates of the various caches with
> perf/oprofile/whatever (Stefan?)?
>

we do not have any TLB hit/miss tracking in the QEMU mainline code
right now. I think perf/oprofile can tell us how much time we spend in
TLB lookup and TLB refill. we need TCG generated instrumentation to
get TLB hit/miss rate though.
> --
> Alex Bennée
>

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] [PATCH] cpu: implementing victim TLB for QEMU system emulated TLB, Xin Tong, 2014/01/22
- Re: [Qemu-devel] [PATCH] cpu: implementing victim TLB for QEMU system emulated TLB, Richard Henderson, 2014/01/22
  - Re: [Qemu-devel] [PATCH] cpu: implementing victim TLB for QEMU system emulated TLB, Xin Tong, 2014/01/22
    - Re: [Qemu-devel] [PATCH] cpu: implementing victim TLB for QEMU system emulated TLB, Richard Henderson, 2014/01/22
- Re: [Qemu-devel] [PATCH] cpu: implementing victim TLB for QEMU system emulated TLB, Alex Bennée, 2014/01/23
  - Re: [Qemu-devel] [PATCH] cpu: implementing victim TLB for QEMU system emulated TLB, Xin Tong <=

Prev by Date: Re: [Qemu-devel] Possible bug in monitor code
Next by Date: Re: [Qemu-devel] [Xen-devel] Project idea: make QEMU more flexible
Previous by thread: Re: [Qemu-devel] [PATCH] cpu: implementing victim TLB for QEMU system emulated TLB
Next by thread: [Qemu-devel] [PATCH v3] Describe flaws in qcow/qcow2 encryption in the docs
Index(es):
- Date
- Thread