[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] util/mmap: optimize qemu_ram_mmap() alignment
From: |
Steven Sistare |
Subject: |
Re: [PATCH] util/mmap: optimize qemu_ram_mmap() alignment |
Date: |
Tue, 11 Apr 2023 10:39:53 -0400 |
User-agent: |
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.9.1 |
On 4/11/2023 3:57 AM, David Hildenbrand wrote:
> On 10.04.23 17:46, Steve Sistare wrote:
>> Guest RAM created with memory-backend-memfd is aligned to a
>> QEMU_VMALLOC_ALIGN=2M boundary, and memory-backend-memfd does not support
>> the "align" parameter to change the default. This is sub-optimal on
>> aarch64 kernels with a 64 KB page size and 512 MB huge page size, as the
>> range will not be backed by huge pages. Moreover, any shared allocation
>> using qemu_ram_mmap() will be sub-optimal on such a system if the align
>> parameter is less than 512 MB.
>>
>> The kernel is smart enough to return a hugely aligned pointer for MAP_SHARED
>> mappings when /sys/kernel/mm/transparent_hugepage/shmem_enabled allows it.
>> However, the qemu function qemu_ram_mmap() mmap's twice to perform its own
>> alignment:
>>
>> guardptr = mmap(0, total, PROT_NONE, flags, ...
>> flags |= shared ? MAP_SHARED : MAP_PRIVATE;
>> ptr = mmap(guardptr + offset, size, prot, flags | map_sync_flags, ...
>>
>> On the first call, flags has MAP_PRIVATE, hence the kernel does not apply
>> its shared memory policy, and returns a non-huge-aligned guardptr.
>>
>> To fix, for shared mappings, pass MAP_SHARED to both mmap calls.
>>
>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>> Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>> util/mmap-alloc.c | 5 +++--
>> 1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
>> index 5ed7d29..37a0d1e 100644
>> --- a/util/mmap-alloc.c
>> +++ b/util/mmap-alloc.c
>> @@ -121,7 +121,7 @@ static bool map_noreserve_effective(int fd, uint32_t
>> qemu_map_flags)
>> * Reserve a new memory region of the requested size to be used for mapping
>> * from the given fd (if any).
>> */
>> -static void *mmap_reserve(size_t size, int fd)
>> +static void *mmap_reserve(size_t size, int fd, int final_flags)
>> {
>> int flags = MAP_PRIVATE;
>> @@ -144,6 +144,7 @@ static void *mmap_reserve(size_t size, int fd)
>> #else
>> fd = -1;
>> flags |= MAP_ANONYMOUS;
>> + flags |= final_flags & MAP_SHARED;
>
> Setting both, MAP_PRIVATE and MAP_SHARED sure sounds very wrong.
Yes, thanks. I introduced that mistake when I ported the fix from an earlier
qemu that did not
have mmap_reserve. Should be:
fd = -1;
flags = MAP_ANONYMOUS;
flags |= final_flags & (MAP_SHARED | MAP_PRIVATE);
> The traditional way to handle that is via QEMU_VMALLOC_ALIGN.
>
> I think you'd actually want to turn QEMU_VMALLOC_ALIGN into a function that
> is able to special-case like
> hw/virtio/virtio-mem.c:virtio_mem_default_thp_size() does for
> "qemu_real_host_page_size() == 64 * KiB".
>
> If you set QEMU_VMALLOC_ALIGN properly, you'll also have any DIMMs get that
> alignment in guest physical address space.
If we increase QEMU_VMALLOC_ALIGN, then all allocations will be 512MB aligned.
If we make many small
allocations, we could conceivably run out of VA space. Further, the processor
may use low order
address bits in internal hashes, and now offset X in every allocated range will
have the same value for
the low 29 bits, possibly causing more collisions and reducing performance.
Further, the huge alignment
is not even needed if huge pages for shmem have been disabled in sysconfig.
We could avoid that by adding logic to also consider allocation size when
choosing alignment, and
checking sysconfig tunables. Or, we can just let the kernel do so by telling
it the truth about
memory flavor when calling mmap.
- Steve