qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] cpu: implementing victim TLB for QEMU system em


From: Xin Tong
Subject: Re: [Qemu-devel] [PATCH] cpu: implementing victim TLB for QEMU system emulated TLB
Date: Thu, 23 Jan 2014 07:50:08 -0600

On Thu, Jan 23, 2014 at 5:23 AM, Alex Bennée <address@hidden> wrote:
>
> address@hidden writes:
>
>> This patch adds a victim TLB to the QEMU system mode TLB.
>>
>> QEMU system mode page table walks are expensive. Taken by running QEMU
>> qemu-system-x86_64 system mode on Intel PIN , a TLB miss and walking a
>> 4-level page tables in guest Linux OS takes ~450 X86 instructions on
>> average.
> <snip>
>>
>> Attached are some performance results taken on SPECINT2006 train
>> dataset and a Intel(R) Xeon(R) CPU  E5620  @ 2.40GHz Linux machine. In
>> summary, victim TLB improves the performance of qemu-system-x86_64 by
>> 11% on average on SPECINT2006 and with highest improvement of in 254%
>> in
>> 464.h264ref. And victim TLB does not result in any performance
>> degradation in any of the measured benchmarks. Furthermore, the
>> implemented victim TLB is architecture independent and is expected to
>> benefit other architectures in QEMU as well.
>>
>> Although there are measurement fluctuations, the performance
>> improvement are very significant and by no means in the range of
>> noises.
> <snip>
>
> I'm curious as the implication seems to be that entries are evicted from
> initial TLB lookup before they are "done". What would the impact be of
> simply growing the size of the main TLB cache?

Growing the size of the TLB gives significant performance improvement
as well, i have an incomplete set of numbers. but with the numbers i
have, i see significant performance improvement. With this being said,
victim tlb is still a nice addition as no matter how big you make the
TLB, there will always be conflict misses due to the low associativity
of the directly mapped tlb table.

>
> What's the current state of instrumentation around the system TLB
> handling? Can we trace the hit rates of the various caches with
> perf/oprofile/whatever (Stefan?)?
>

we do not have any TLB hit/miss tracking in the QEMU mainline code
right now. I think perf/oprofile can tell us how much time we spend in
TLB lookup and TLB refill. we need TCG generated instrumentation to
get TLB hit/miss rate though.
> --
> Alex Bennée
>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]