qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] save compiled qemu traces.


From: Laurent Desnogues
Subject: Re: [Qemu-devel] save compiled qemu traces.
Date: Thu, 12 Dec 2013 14:37:32 +0100

On Thu, Dec 12, 2013 at 5:07 AM, Xin Tong <address@hidden> wrote:
> see questions below.
>
> On Tue, Dec 10, 2013 at 12:25 AM, Alex Bennée <address@hidden> wrote:
>>
>> address@hidden writes:
>>
>>> Does anyone have profiles on how much time QEMU spends in translating
>>> instructions. QEMU does not have a baseline interpreter nor does it
>>> translate on trace-granularity.  so i imagine QEMU must spend quite a bit
>>> of time translating instructions.
>>
>> Not as much as you'd think. The translation stage isn't very complex and
>> blocks only get translated once (modulo exceptions and self modifying
>> code). If you run perf on your task you should see most of the time is
>> spent in the generated code - if not please send the test case to the
>> list.
>
> I took a profile running speccpu2006 403.gcc with test input on a
> intel xeon machine. we only spent 44.76% of the time in the code cache
> (i.e. 13M ticks in the code cache), while 40.97% of the time is spent
> in the qemu-system-x86_64. some of the hot functions in
> qemu-system-x86_64 are listed below.
>
> *you are right* we do not spend much time in translation routines.
> instead we spend significant amount of time in address translation
> code.
>
> CPU_CLK_UNHALTED %     Symbol/Functions
> 1340512         100.00 anon (tgid:7106 range:0x7f97815ca000-0x7f979a692000)
>
>
> CPU_CLK_UNHALTED %     Symbol/Functions
> 314655           25.64 address_space_translate_internal
> 308942           25.18 cpu_x86_exec
> 128922           10.51 ldq_phys
> 92345           7.53 cpu_x86_handle_mmu_fault
> 62456           5.09 tlb_set_page
> 49332           4.02 memory_region_is_ram
> 31055           2.53 helper_le_ldq_mmu
> 22048           1.80 memory_region_get_ram_addr
> 19223           1.57 memory_region_section_get_iotlb
> 15873           1.29 tcg_optimize
> 14526           1.18 get_page_addr_code
> 12601           1.03 memory_region_get_ram_ptr

You could perhaps redo the same experiment using user mode QEMU.
That'll give you another interesting point of measure.

Another experiment is kernel booting, because it's likely to run code
once which will make code translation functions climb up the use
scale.


Laurent



reply via email to

[Prev in Thread] Current Thread [Next in Thread]