[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [RFC PATCH] include/exec/cpu-defs.h: try and make SoftM
From: |
Alex Bennée |
Subject: |
Re: [Qemu-devel] [RFC PATCH] include/exec/cpu-defs.h: try and make SoftMMU page size match target |
Date: |
Mon, 10 Jul 2017 16:17:17 +0100 |
User-agent: |
mu4e 0.9.19; emacs 25.2.50.3 |
Peter Maydell <address@hidden> writes:
> On 10 July 2017 at 15:28, Alex Bennée <address@hidden> wrote:
>> While the SoftMMU is not emulating the target MMU of a system there is
>> a relationship between its page size and that of the target. If the
>> target MMU is full featured the functions called to re-fill the
>> entries in the SoftMMU entries start moving up the perf profiles. If
>> we can we should try and prevent too much thrashing around by having
>> the page sizes the same.
>>
>> Ideally we should use TARGET_PAGE_BITS_MIN but that potentially
>> involves a fair bit of #include re-jigging so I went for 10 bits (1k
>> pages) which I think is the smallest of all our emulated systems.
>
> The figures certainly show an improvement, but it's not clear
> to me why this is related to the target's page size rather than
> just being a "bigger is better" kind of thing?
Well this was driven by a discussion with Pranith last week. In his
(admittedly memory intensive) bench-marking he was seeing around 30%
overhead is coming from mmu related functions with the hottest being
get_phys_addr_lpae() followed by address_space_do_translate(). We
theorised that even given the high hit rate of the fast path the slow
path was triggered by moving over SoftMMU's effective page boundary. A
quick experiment in extending the size of the TLB made his hot spots
disappear.
I don't see quite such a hot-spot in my simple boot/build benchmark test
but after helper_lookup_tb_ptr quite a lot of hits are part of the
re-fill chain:
16.37% qemu-system-aar qemu-system-aarch64 [.] helper_lookup_tb_ptr
3.43% qemu-system-aar qemu-system-aarch64 [.] victim_tlb_hit
2.73% qemu-system-aar qemu-system-aarch64 [.] tlb_set_page_with_attrs
2.60% qemu-system-aar qemu-system-aarch64 [.] get_phys_addr_lpae
2.36% qemu-system-aar qemu-system-aarch64 [.] qht_lookup
1.53% qemu-system-aar qemu-system-aarch64 [.] arm_regime_tbi1
1.37% qemu-system-aar qemu-system-aarch64 [.] tcg_optimize
1.34% qemu-system-aar qemu-system-aarch64 [.] tcg_gen_code
1.31% qemu-system-aar qemu-system-aarch64 [.] arm_regime_tbi0
1.28% qemu-system-aar qemu-system-aarch64 [.] address_space_ldq_le
1.22% qemu-system-aar qemu-system-aarch64 [.]
object_dynamic_cast_assert
1.11% qemu-system-aar qemu-system-aarch64 [.]
address_space_translate_internal
1.03% qemu-system-aar qemu-system-aarch64 [.] tb_htable_lookup
0.98% qemu-system-aar qemu-system-aarch64 [.] get_page_addr_code
0.98% qemu-system-aar qemu-system-aarch64 [.]
address_space_do_translate
0.87% qemu-system-aar qemu-system-aarch64 [.]
object_class_dynamic_cast_assert
0.82% qemu-system-aar qemu-system-aarch64 [.] get_phys_addr
0.75% qemu-system-aar qemu-system-aarch64 [.] tb_cmp
0.63% qemu-system-aar qemu-system-aarch64 [.] liveness_pass_1
0.59% qemu-system-aar qemu-system-aarch64 [.] helper_le_ldq_mmu
--
Alex Bennée