[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] Profiling sparc64 emulation
From: |
Artyom Tarasenko |
Subject: |
Re: [Qemu-devel] Profiling sparc64 emulation |
Date: |
Thu, 9 May 2013 22:11:45 +0200 |
On Thu, May 9, 2013 at 8:30 PM, Aurelien Jarno <address@hidden> wrote:
> On Wed, May 08, 2013 at 11:02:24PM +0200, Artyom Tarasenko wrote:
>> On Wed, May 8, 2013 at 12:57 AM, Aurelien Jarno <address@hidden> wrote:
>> > On Tue, May 07, 2013 at 11:29:20PM +0200, Artyom Tarasenko wrote:
>> >> On Tue, May 7, 2013 at 1:38 PM, Torbjorn Granlund <address@hidden> wrote:
>> >> > The 2nd table of http://gmplib.org/devel/testsystems.html shows all
>> >> > emulated systems I am using, most of which are qemu-based.
>> >>
>> >> Do I read it correct that qemu-system-ppc64 with the slowdown factor
>> >> of 33 is ~3 times faster than qemu-system-sparc64 with the slowdown
>> >> factor of 96 ?
>> >> Do they both use Debian Wheezy guest? You have a remark that ppc64 has
>> >> problems with its clock. Was it taken into account when the slowdown
>> >> factors were calculated?
>> >>
>> >
>> > Clock or not, it should be noted that qemu-system-sparc64 is undoubtedly
>> > slower (at least 5 to 10 times) than qemu-system-{arm,ppc,mips,...} on
>> > some type of load like perl scripts.
>>
>> That's interesting. Actually it should be possible to lauch perl under user
>> mode qemu-sparc32plus. Is it possible to launch perl under user mode
>> qemu-ppc{32,64} too?
>>
>> That would allow to understand whether the bad performance has to do
>> with TCG or the rest of the system emulation.
>
> I haven't done that yet, but I have run perf top while running perl
> script (lintian), on both qemu-system-sparc64 and qemu-system-ppc64. The
> results are quite different:
>
> qemu-system-ppc64
> -----------------
> 49,73% perf-10672.map [.] 0x7f7853ab4e0f
> 13,23% qemu-system-ppc64 [.] cpu_ppc_exec
> 13,16% libglib-2.0.so.0.3200.4 [.] g_hash_table_lookup
> 8,18% libglib-2.0.so.0.3200.4 [.] g_str_hash
> 2,47% qemu-system-ppc64 [.] object_class_dynamic_cast
> 1,97% qemu-system-ppc64 [.] type_is_ancestor
> 1,05% libglib-2.0.so.0.3200.4 [.] g_str_equal
> 0,91% qemu-system-ppc64 [.] ppc_cpu_do_interrupt
> 0,90% qemu-system-ppc64 [.] object_dynamic_cast_assert
> 0,79% libc-2.13.so [.] __sigsetjmp
> 0,62% qemu-system-ppc64 [.] type_get_parent.isra.3
> 0,58% qemu-system-ppc64 [.] type_get_by_name
> 0,57% qemu-system-ppc64 [.] qemu_log_mask
> 0,54% qemu-system-ppc64 [.] object_dynamic_cast
>
> qemu-system-sparc64
> -------------------
> 17,43% perf-8154.map [.] 0x7f6ac10245c8
> 10,46% qemu-system-sparc64 [.] tcg_optimize
> 10,36% qemu-system-sparc64 [.] cpu_sparc_exec
> 6,35% qemu-system-sparc64 [.] tb_flush_jmp_cache
> 4,75% qemu-system-sparc64 [.] get_physical_address_data
> 4,45% qemu-system-sparc64 [.] tcg_liveness_analysis
> 4,35% qemu-system-sparc64 [.] tcg_reg_alloc_op
> 2,90% qemu-system-sparc64 [.] tlb_flush_page
> 2,35% qemu-system-sparc64 [.] disas_sparc_insn
> 2,28% qemu-system-sparc64 [.] get_physical_address_code
> 2,21% qemu-system-sparc64 [.] tlb_flush
> 1,64% qemu-system-sparc64 [.] tcg_out_opc
> 1,22% qemu-system-sparc64 [.] tcg_out_modrm_sib_offset.constprop.41
> 1,20% qemu-system-sparc64 [.] helper_ld_asi
> 1,14% qemu-system-sparc64 [.] gen_intermediate_code_pc
> 1,04% qemu-system-sparc64 [.] helper_st_asi
> 1,00% qemu-system-sparc64 [.] object_class_dynamic_cast
> 0,98% qemu-system-sparc64 [.] tb_find_pc
> 0,94% qemu-system-sparc64 [.] get_page_addr_code
> 0,92% qemu-system-sparc64 [.] tcg_gen_code_search_pc
> 0,91% qemu-system-sparc64 [.] tlb_set_page
> 0,83% qemu-system-sparc64 [.] reset_temp
> 0,82% qemu-system-sparc64 [.] tcg_reg_alloc_start
>
>
> The perf-xxxx.map correspond to the code execution. As you can see it's
> a lot lower on sparc, while a lot of smaller code generation/mmu code
> appears. It's seems that the optimizations have to be focused on the
> system part, not the TCG part, at least for now.
>
> A quick look at the MMU seems to show some performance issue here, due
> to the split code/data MMU on SPARC64, while the QEMU TLB is a joint
> one. As a consequence one can see a lot of ping pong, setting a given
> page to read or read/write, then execution, and later read or read/write
> again. My guess is that it's related to constants table in the same page
> than the code.
>
> It should also be noted that the tcg_optimize starts to take a
> non-negligible time, in both cases. The code grew up quite a lot
> recently, and it might be time to rework it. It's nice to have optimized
> code, but not if the gain is lower than the optimization time.
Is it possible to disable some optimisations, or the whole
optimisation completely?
I see no command line switches for that.