qemu-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-discuss] ASan'ed binaries start up very slow under qemu-aarch6


From: Maxim Ostapenko
Subject: Re: [Qemu-discuss] ASan'ed binaries start up very slow under qemu-aarch64.
Date: Tue, 19 Jul 2016 12:22:41 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1

On 18/07/16 18:51, Peter Maydell wrote:
(CCing qemu-devel, which is more likely to get developer attention)

Peter, thank you for your answer.


On 18 July 2016 at 15:45, Maxim Ostapenko <address@hidden> wrote:
1) AddressSanitizer mmaps quite large regions of memory for redzones and
shadow gap. In particular, for 39-bit AS it mmapes:

|| `[0x1400000000, 0x1fffffffff]` || HighShadow || - 48 Gb
|| `[0x1200000000, 0x13ffffffff]` || ShadowGap  || - 8 Gb
|| `[0x1000000000, 0x11ffffffff]` || LowShadow  || - 4 Gb

2) In QEMU, page_set_flags is called for these ranges. It cuts given range
to individual pages and sets flags for them.  Given the page size is 4 Kb,
for 8 Gb range we have 2097152 iterations and for 48 Gb 12582912 iterations
in inner loop. This is obviously a performance bottleneck.
Mmm, the algorithm here is pretty simple and basically assumes the
guest isn't going to be doing enormous allocations like that.
(If the host process doesn't happen to have a suitable big lump of its
VA space free then the mmap will fail anyway.)

Hm, it seems that ASan is really special here. Actually, I think that this slowdown is not critical for individual runs, but it certainly critical for people who rely on QEMU in their builds (e.g. in Aarch64 chroot). Not sure it's a common case, though.


3) Same issue may happen when ASan tries to read /proc/self/map later in
page_check_range function, after it already mmaped HighShadow, ShadowGap and
LowShadow regions.

Could someone help me, how can I mitigate this performance issue? Do we
really need to set flags to each page on entire (quite big) memory region?
Well, we do need to do some things:
  * we're populating the PageDesc data structure which we later use
    to cache generated code
  * if we're marking the range as writeable and it wasn't previously
    writeable, we need to check whether there's already generated code
    anywhere in this memory range and invalidate those translations

This could probably be done in a way that doesn't iterate naively
through every page, though.

Oh, I see. Perhaps we can restrict QEMU to use some well defined pages for generated code?

Thanks,
-Maxim


thanks
-- PMM






reply via email to

[Prev in Thread] Current Thread [Next in Thread]