qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v1 0/3] util/oslib-posix: Support MADV_POPULATE_WRITE for os_


From: David Hildenbrand
Subject: Re: [PATCH v1 0/3] util/oslib-posix: Support MADV_POPULATE_WRITE for os_mem_prealloc()
Date: Wed, 21 Jul 2021 10:23:55 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0

On 20.07.21 16:45, Daniel P. Berrangé wrote:
On Wed, Jul 14, 2021 at 01:23:03PM +0200, David Hildenbrand wrote:
#1 adds support for MADV_POPULATE_WRITE, #2 cleans up the code to avoid
global variables and prepare for concurrency and #3 makes os_mem_prealloc()
safe to be called from multiple threads concurrently.

Details regarding MADV_POPULATE_WRITE can be found in introducing upstream
Linux commit 4ca9b3859dac ("mm/madvise: introduce
MADV_POPULATE_(READ|WRITE) to prefault page tables") and in the latest man
page patch [1].

Looking at that commit message, I see your caveat about POPULATE_WRITE
used together with shared file mappings, causing an undesirable glut
of dirty pages that needs to be flushed back to the underlying storage.

Is this something we need to be concerned with for the hostmem-file.c
implementation ? While it is mostly used to point to files on tmpfs
or hugetlbfs, I think users do something point it to a plain file
on a normal filesystem.  So will we need to optimize to use the
fallocate+POPULATE_READ combination at some point ?

In the future, it might make sense to use fallocate() only when it comes to shared file mappings.

AFAIKS os_mem_prealloc() currently serves the following purposes:

1) Preallocate anonymous memory or backend storage (file, hugetlbfs, ...)
2) Apply mbind() policy, preallocating it from the right node when applicable.
3) Prefault page tables

For shared mappings, it's a little bit difficult, though: mbind() does not seem to work on shared mappings (which to some degree makes logically sense, but I don't think QEMU users are aware that it is like that): "The specified policy will be ignored for any MAP_SHARED mappings in the specified memory range. Rather the pages will be allocated according to the memory policy of the thread that caused the page to be allocated. Again, this may not be the thread that called mbind()."

So 2) does not apply. A simple fallocate() can get 1) done more efficiently.

So if we want to use MADV_POPULATE_READ completely depends on whether we want 3). It can make sense to prefault page tables for RT workloads, however, there is usually nothing stopping the OS from clearing the page cache and requiring a refault later -- except with mlock.

So whether we want fallocate() or fallocate()+MADV_POPULATE_READ for shared file mappings really depends on the use case, and on the system setup. If the system won't immediately free up the page cache and undo what MADV_POPULATE_READ did, it might make sense to use it.

Long story short: it's complicated :)

--
Thanks,

David / dhildenb




reply via email to

[Prev in Thread] Current Thread [Next in Thread]