qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] save compiled qemu traces.


From: Xin Tong
Subject: Re: [Qemu-devel] save compiled qemu traces.
Date: Thu, 12 Dec 2013 13:07:40 +0900

see questions below.

On Tue, Dec 10, 2013 at 12:25 AM, Alex Bennée <address@hidden> wrote:
>
> address@hidden writes:
>
>> Does anyone have profiles on how much time QEMU spends in translating
>> instructions. QEMU does not have a baseline interpreter nor does it
>> translate on trace-granularity.  so i imagine QEMU must spend quite a bit
>> of time translating instructions.
>
> Not as much as you'd think. The translation stage isn't very complex and
> blocks only get translated once (modulo exceptions and self modifying
> code). If you run perf on your task you should see most of the time is
> spent in the generated code - if not please send the test case to the
> list.

I took a profile running speccpu2006 403.gcc with test input on a
intel xeon machine. we only spent 44.76% of the time in the code cache
(i.e. 13M ticks in the code cache), while 40.97% of the time is spent
in the qemu-system-x86_64. some of the hot functions in
qemu-system-x86_64 are listed below.

*you are right* we do not spend much time in translation routines.
instead we spend significant amount of time in address translation
code.

CPU_CLK_UNHALTED %     Symbol/Functions
1340512         100.00 anon (tgid:7106 range:0x7f97815ca000-0x7f979a692000)


CPU_CLK_UNHALTED %     Symbol/Functions
314655           25.64 address_space_translate_internal
308942           25.18 cpu_x86_exec
128922           10.51 ldq_phys
92345           7.53 cpu_x86_handle_mmu_fault
62456           5.09 tlb_set_page
49332           4.02 memory_region_is_ram
31055           2.53 helper_le_ldq_mmu
22048           1.80 memory_region_get_ram_addr
19223           1.57 memory_region_section_get_iotlb
15873           1.29 tcg_optimize
14526           1.18 get_page_addr_code
12601           1.03 memory_region_get_ram_ptr

Xin


>
> I suspect the more useful statistic would be getting a break down of the
> translation blocks and seeing which ones are the most heavily used and
> examining if QEMU has done as good a job as it can of translating them.
>
>> Is it possible for QEMU to obviate some of the translations by attaching a
>> signature (e.g. a hash) with every translated basic block and try to reuse
>> translated basic block based on the signature as much as possible ? Reuses
>> can be a result of rerunning programs or same libraries statically linked
>> to programs.
>
> Your right a translation cache *could* save some translation time,
> especially if you end up translating the same program over and over
> again. Having said that you might find the cost of computing the
> checksum obviates any speed-up from skipping the translation. After all
> QEMU only needs to look at each subject instruction once normally.
>
> Using QEMU  linux-user for cross building would be the obvious pain
> point. However as the usual use case is building for embedded platforms
> most users are just happy to fully utilise their 80-core build machines
> in preference to having a farm of slow embedded processors.
>
>> This could end up saving some translation time.
>
> I think you would need to do some performance analysis and come up with
> some numbers before you made that assumption.
>
> Cheers,
>
> --
> Alex Bennée
> QEMU/KVM Hacker for Linaro
>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]