qemu-arm
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-arm] [RFC PATCH] include/exec/cpu-defs.h: try and make SoftMMU


From: Peter Maydell
Subject: Re: [Qemu-arm] [RFC PATCH] include/exec/cpu-defs.h: try and make SoftMMU page size match target
Date: Mon, 10 Jul 2017 16:23:07 +0100

On 10 July 2017 at 16:17, Alex Bennée <address@hidden> wrote:
>
> Peter Maydell <address@hidden> writes:
>
>> On 10 July 2017 at 15:28, Alex Bennée <address@hidden> wrote:
>>> While the SoftMMU is not emulating the target MMU of a system there is
>>> a relationship between its page size and that of the target. If the
>>> target MMU is full featured the functions called to re-fill the
>>> entries in the SoftMMU entries start moving up the perf profiles. If
>>> we can we should try and prevent too much thrashing around by having
>>> the page sizes the same.
>>>
>>> Ideally we should use TARGET_PAGE_BITS_MIN but that potentially
>>> involves a fair bit of #include re-jigging so I went for 10 bits (1k
>>> pages) which I think is the smallest of all our emulated systems.
>>
>> The figures certainly show an improvement, but it's not clear
>> to me why this is related to the target's page size rather than
>> just being a "bigger is better" kind of thing?
>
> Well this was driven by a discussion with Pranith last week. In his
> (admittedly memory intensive) bench-marking he was seeing around 30%
> overhead is coming from mmu related functions with the hottest being
> get_phys_addr_lpae() followed by address_space_do_translate(). We
> theorised that even given the high hit rate of the fast path the slow
> path was triggered by moving over SoftMMU's effective page boundary. A
> quick experiment in extending the size of the TLB made his hot spots
> disappear.
>
> I don't see quite such a hot-spot in my simple boot/build benchmark test
> but after helper_lookup_tb_ptr quite a lot of hits are part of the
> re-fill chain:

Right, but why do we know that the target page size matters rather
than this just being "smaller TLB -> more TLB misses -> more calls
to the slow path -> functions called in the slow path appear more
in profiling" ?

thanks
-- PMM



reply via email to

[Prev in Thread] Current Thread [Next in Thread]