[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH 00/15] optimize Qemu RSS usage
From: |
Peter Lieven |
Subject: |
Re: [Qemu-devel] [PATCH 00/15] optimize Qemu RSS usage |
Date: |
Tue, 28 Jun 2016 14:14:24 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.8.0 |
Am 28.06.2016 um 13:37 schrieb Paolo Bonzini:
On 28/06/2016 11:01, Peter Lieven wrote:
I recently found that Qemu is using several hundred megabytes of RSS memory
more than older versions such as Qemu 2.2.0. So I started tracing
memory allocation and found 2 major reasons for this.
1) We changed the qemu coroutine pool to have a per thread and a global release
pool. The choosen poolsize and the changed algorithm could lead to up to
192 free coroutines with just a single iothread. Each of the coroutines
in the pool each having 1MB of stack memory.
But the fix, as you correctly note, is to reduce the stack size. It
would be nice to compile block-obj-y with -Wstack-usage=2048 too.
To reveal if there are any big stack allocations in the block layer?
As it seems reducing to 64kB breaks live migration in some (non reproducible)
cases.
The question is which way to go? Reduce the stack size and fix the big stack
allocations
or keep the stack size at 1MB?
2) Between Qemu 2.2.0 and 2.3.0 RCU was introduced which lead to delayed freeing
of memory. This lead to higher heap allocations which could not effectively
be returned to kernel (most likely due to fragmentation).
I agree that some of the exec.c allocations need some care, but I would
prefer to use a custom free list or lazy allocation instead of mmap.
This would only help if the elements from the free list would be allocated using
mmap? The issue is that RCU delays the freeing so that the number of concurrent
allocations is high and then a bunch is freed at once. If the memory was
malloced
it would still have caused trouble.
Changing allocations to use mmap also is not really useful if you do it
for objects that are never freed (as in patches 8-9-10-15 at least, and
probably 11 too which is one of the most contentious).
9 actually frees the memory ;-)
15 frees the memory as soon as the vnc client disconnects.
The others I agree. If the objects in Patch 11 are freed needs to be checked.
In other words, the effort tracking down the allocation is really,
really appreciated. But the patches look like you only had a hammer at
hand, and everything looked like a nail. :)
I just have observed that forcing ptmalloc to use mmap for everything
above 4kB significantly reduced the RSS usage.
Peter
- Re: [Qemu-devel] [PATCH 07/15] qapi: use mmap for QmpInputVisitor, (continued)
- [Qemu-devel] [PATCH 10/15] vmware_svga: use mmap for scratch pad, Peter Lieven, 2016/06/28
- [Qemu-devel] [PATCH 04/15] coroutine: add a knob to disable the shared release pool, Peter Lieven, 2016/06/28
- [Qemu-devel] [PATCH 05/15] util: add a helper to mmap private anonymous memory, Peter Lieven, 2016/06/28
- [Qemu-devel] [PATCH 06/15] exec: use mmap for subpages, Peter Lieven, 2016/06/28
- Re: [Qemu-devel] [PATCH 00/15] optimize Qemu RSS usage, Paolo Bonzini, 2016/06/28
- Re: [Qemu-devel] [PATCH 00/15] optimize Qemu RSS usage,
Peter Lieven <=
- Re: [Qemu-devel] [PATCH 00/15] optimize Qemu RSS usage, Paolo Bonzini, 2016/06/28
- Re: [Qemu-devel] [PATCH 00/15] optimize Qemu RSS usage, Peter Lieven, 2016/06/28
- Re: [Qemu-devel] [PATCH 00/15] optimize Qemu RSS usage, Paolo Bonzini, 2016/06/28
- Re: [Qemu-devel] [PATCH 00/15] optimize Qemu RSS usage, Dr. David Alan Gilbert, 2016/06/28
- Re: [Qemu-devel] [PATCH 00/15] optimize Qemu RSS usage, Peter Lieven, 2016/06/28
- Re: [Qemu-devel] [PATCH 00/15] optimize Qemu RSS usage, Peter Lieven, 2016/06/28