qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 00/15] optimize Qemu RSS usage


From: Paolo Bonzini
Subject: Re: [Qemu-devel] [PATCH 00/15] optimize Qemu RSS usage
Date: Tue, 28 Jun 2016 13:37:02 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.1.1


On 28/06/2016 11:01, Peter Lieven wrote:
> I recently found that Qemu is using several hundred megabytes of RSS memory
> more than older versions such as Qemu 2.2.0. So I started tracing
> memory allocation and found 2 major reasons for this.
> 
> 1) We changed the qemu coroutine pool to have a per thread and a global 
> release
>    pool. The choosen poolsize and the changed algorithm could lead to up to
>    192 free coroutines with just a single iothread. Each of the coroutines
>    in the pool each having 1MB of stack memory.

But the fix, as you correctly note, is to reduce the stack size.  It
would be nice to compile block-obj-y with -Wstack-usage=2048 too.

> 2) Between Qemu 2.2.0 and 2.3.0 RCU was introduced which lead to delayed 
> freeing
>    of memory. This lead to higher heap allocations which could not effectively
>    be returned to kernel (most likely due to fragmentation).

I agree that some of the exec.c allocations need some care, but I would
prefer to use a custom free list or lazy allocation instead of mmap.

Changing allocations to use mmap also is not really useful if you do it
for objects that are never freed (as in patches 8-9-10-15 at least, and
probably 11 too which is one of the most contentious).

In other words, the effort tracking down the allocation is really,
really appreciated.  But the patches look like you only had a hammer at
hand, and everything looked like a nail. :)

Paolo

> The following series is what I came up with. Beside the coroutine patches I 
> changed
> some allocations to forcibly use mmap. All these allocations are not repeatly 
> made
> during runtime so the impact of using mmap should be neglectible.
> 
> There are still some big malloced allocations left which cannot be easily 
> changed
> (e.g. the pixman buffers in VNC). So it might an idea to set a lower mmap 
> threshold for
> malloc since this threshold seems to be in the order of several Megabytes on 
> modern systems.
> 
> Peter Lieven (15):
>   coroutine-ucontext: mmap stack memory
>   coroutine-ucontext: add a switch to monitor maximum stack size
>   coroutine-ucontext: reduce stack size to 64kB
>   coroutine: add a knob to disable the shared release pool
>   util: add a helper to mmap private anonymous memory
>   exec: use mmap for subpages
>   qapi: use mmap for QmpInputVisitor
>   virtio: use mmap for VirtQueue
>   loader: use mmap for ROMs
>   vmware_svga: use mmap for scratch pad
>   qom: use mmap for bigger Objects
>   util: add a function to realloc mmapped memory
>   exec: use mmap for PhysPageMap->nodes
>   vnc-tight: make the encoding palette static
>   vnc: use mmap for VncState
> 
>  configure                 | 33 ++++++++++++++++++--
>  exec.c                    | 11 ++++---
>  hw/core/loader.c          | 16 +++++-----
>  hw/display/vmware_vga.c   |  3 +-
>  hw/virtio/virtio.c        |  5 +--
>  include/qemu/mmap-alloc.h |  7 +++++
>  include/qom/object.h      |  1 +
>  qapi/qmp-input-visitor.c  |  5 +--
>  qom/object.c              | 20 ++++++++++--
>  ui/vnc-enc-tight.c        | 21 ++++++-------
>  ui/vnc.c                  |  5 +--
>  ui/vnc.h                  |  1 +
>  util/coroutine-ucontext.c | 66 +++++++++++++++++++++++++++++++++++++--
>  util/mmap-alloc.c         | 27 ++++++++++++++++
>  util/qemu-coroutine.c     | 79 
> ++++++++++++++++++++++++++---------------------
>  15 files changed, 225 insertions(+), 75 deletions(-)
> 



reply via email to

[Prev in Thread] Current Thread [Next in Thread]