Re: [Qemu-devel] [RFC PATCH 2/3] cpus-common: Cache allocated work items

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH 2/3] cpus-common: Cache allocated work items

From:	Pranith Kumar
Subject:	Re: [Qemu-devel] [RFC PATCH 2/3] cpus-common: Cache allocated work items
Date:	Mon, 28 Aug 2017 17:51:59 -0400

On Mon, Aug 28, 2017 at 3:05 PM, Emilio G. Cota <address@hidden> wrote:
> On Sun, Aug 27, 2017 at 23:53:25 -0400, Pranith Kumar wrote:
>> Using heaptrack, I found that quite a few of our temporary allocations
>> are coming from allocating work items. Instead of doing this
>> continously, we can cache the allocated items and reuse them instead
>> of freeing them.
>>
>> This reduces the number of allocations by 25% (200000 -> 150000 for
>> ARM64 boot+shutdown test).
>>
>
> But what is the perf difference, if any?
>
> Adding a lock (or a cmpxchg) here is not a great idea. However, this is not 
> yet
> immediately obvious because of other scalability bottlenecks. (if
> you boot many arm64 cores you'll see most of the time is spent idling
> on the BQL, see
>   https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg05207.html )
>
> You're most likely better off using glib's slices, see
>   https://developer.gnome.org/glib/stable/glib-Memory-Slices.html
> These slices use per-thread lists, so scalability should be OK.

I think we should modify our g_malloc() to internally use this. Seems
like an idea worth trying out.

>
> I also suggest profiling with either or both of jemalloc/tcmalloc
> (build with --enable-jemalloc/tcmalloc) in addition to using glibc's
> allocator, and then based on perf numbers decide whether this is something
> worth optimizing.
>

OK, I will try to get some perf numbers.

-- 
Pranith

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] [PATCH 1/3] target/arm: Remove stale comment, Pranith Kumar, 2017/08/27
- [Qemu-devel] [RFC PATCH 2/3] cpus-common: Cache allocated work items, Pranith Kumar, 2017/08/27
  - Re: [Qemu-devel] [RFC PATCH 2/3] cpus-common: Cache allocated work items, Richard Henderson, 2017/08/28
    - Re: [Qemu-devel] [RFC PATCH 2/3] cpus-common: Cache allocated work items, Pranith Kumar, 2017/08/28
  - Re: [Qemu-devel] [RFC PATCH 2/3] cpus-common: Cache allocated work items, Emilio G. Cota, 2017/08/28
    - Re: [Qemu-devel] [RFC PATCH 2/3] cpus-common: Cache allocated work items, Pranith Kumar <=
- [Qemu-devel] [RFC PATCH 3/3] mttcg: Implement implicit ordering semantics, Pranith Kumar, 2017/08/27
  - Re: [Qemu-devel] [RFC PATCH 3/3] mttcg: Implement implicit ordering semantics, Richard Henderson, 2017/08/28
    - Re: [Qemu-devel] [RFC PATCH 3/3] mttcg: Implement implicit ordering semantics, Pranith Kumar, 2017/08/28
    - Re: [Qemu-devel] [RFC PATCH 3/3] mttcg: Implement implicit ordering semantics, Richard Henderson, 2017/08/28
- Re: [Qemu-devel] [PATCH 1/3] target/arm: Remove stale comment, Richard Henderson, 2017/08/28

Prev by Date: Re: [Qemu-devel] [PATCH 0/3] scripts: add argparse module for Python 2.6 compatibility
Next by Date: Re: [Qemu-devel] [Qemu devel v7 PATCH 1/5] msf2: Add Smartfusion2 System timer
Previous by thread: Re: [Qemu-devel] [RFC PATCH 2/3] cpus-common: Cache allocated work items
Next by thread: [Qemu-devel] [RFC PATCH 3/3] mttcg: Implement implicit ordering semantics
Index(es):
- Date
- Thread