Re: [Qemu-devel] [PATCH 01/15] qemu coroutine: support bypass mode

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 01/15] qemu coroutine: support bypass mode

From:	Ming Lei
Subject:	Re: [Qemu-devel] [PATCH 01/15] qemu coroutine: support bypass mode
Date:	Thu, 31 Jul 2014 16:59:47 +0800

On Thu, Jul 31, 2014 at 7:37 AM, Paolo Bonzini <address@hidden> wrote:
> Il 30/07/2014 19:15, Ming Lei ha scritto:
>> On Wed, Jul 30, 2014 at 9:45 PM, Paolo Bonzini <address@hidden> wrote:
>>> Il 30/07/2014 13:39, Ming Lei ha scritto:
>>>> This patch introduces several APIs for supporting bypass qemu coroutine
>>>> in case of being not necessary and for performance's sake.
>>>
>>> No, this is wrong.  Dataplane *must* use the same code as non-dataplane,
>>> anything else is a step backwards.
>>
>> As we saw, coroutine has brought up performance regression
>> on dataplane, and it isn't necessary to use co in some cases, is it?
>
> Yes, and it's not necessary on non-dataplane either.  It's not necessary
> on virtio-scsi, and it will not be necessary on virtio-scsi dataplane
> either.
>
>>> If you want to bypass coroutines, bdrv_aio_readv/writev must detect the
>>> conditions that allow doing that and call the bdrv_aio_readv/writev
>>> directly.
>>
>> That is easy to detect, please see the 5th patch.
>
> No, that's not enough.  Dataplane right now prevents block jobs, but
> that's going to change and it could require coroutines even for raw devices.
>
>>> To begin with, have you benchmarked QEMU and can you provide a trace of
>>> *where* the coroutine overhead lies?
>>
>> I guess it may be caused by the stack switch, at least in one of
>> my box, bypassing co can improve throughput by ~7%, and by
>> ~15% in another box.
>
> No guesses please.  Actually that's also my guess, but since you are
> submitting the patch you must do better and show profiles where stack
> switching disappears after the patches.

Follows the below hardware events reported by 'perf stat' when running
fio randread benchmark for 2min in VM(single vq, 2 jobs):

sudo ~/bin/perf stat -e
L1-dcache-loads,L1-dcache-load-misses,cpu-cycles,instructions,branch-instructions,branch-misses,branch-loads,branch-load-misses,dTLB-loads,dTLB-load-misses
./nqemu-start-mq 4 1

1), without bypassing coroutine via forcing to set 's->raw_format ' as
false, see patch 5/15

- throughout: 95K

  Performance counter stats for './nqemu-start-mq 4 1':

    69,231,035,842      L1-dcache-loads
              [40.10%]
     1,909,978,930      L1-dcache-load-misses     #    2.76% of all
L1-dcache hits   [39.98%]
   263,731,501,086      cpu-cycles                [40.03%]
   232,564,905,115      instructions              #    0.88  insns per
cycle         [50.23%]
    46,157,868,745      branch-instructions
              [49.82%]
       785,618,591      branch-misses             #    1.70% of all
branches         [49.99%]
    46,280,342,654      branch-loads
              [49.95%]
    34,934,790,140      branch-load-misses
              [50.02%]
    69,447,857,237      dTLB-loads
              [40.13%]
       169,617,374      dTLB-load-misses          #    0.24% of all
dTLB cache hits  [40.04%]

     161.991075781 seconds time elapsed


2), with bypassing coroutinue
- throughput: 115K
 Performance counter stats for './nqemu-start-mq 4 1':

    76,784,224,509      L1-dcache-loads
              [39.93%]
     1,334,036,447      L1-dcache-load-misses     #    1.74% of all
L1-dcache hits   [39.91%]
   262,697,428,470      cpu-cycles                [40.03%]
   255,526,629,881      instructions              #    0.97  insns per
cycle         [50.01%]
    50,160,082,611      branch-instructions
              [49.97%]
       564,407,788      branch-misses             #    1.13% of all
branches         [50.08%]
    50,331,510,702      branch-loads
              [50.08%]
    35,760,766,459      branch-load-misses
              [50.03%]
    76,706,000,951      dTLB-loads
              [40.00%]
       123,291,001      dTLB-load-misses          #    0.16% of all
dTLB cache hits  [40.02%]

     162.333465490 seconds time elapsed

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] [PATCH 00/14] dataplane: optimization and multi virtqueue support, Ming Lei, 2014/07/30
- [Qemu-devel] [PATCH 01/15] qemu coroutine: support bypass mode, Ming Lei, 2014/07/30
  - Re: [Qemu-devel] [PATCH 01/15] qemu coroutine: support bypass mode, Paolo Bonzini, 2014/07/30
    - Re: [Qemu-devel] [PATCH 01/15] qemu coroutine: support bypass mode, Ming Lei, 2014/07/30
    - Re: [Qemu-devel] [PATCH 01/15] qemu coroutine: support bypass mode, Paolo Bonzini, 2014/07/30
    - Re: [Qemu-devel] [PATCH 01/15] qemu coroutine: support bypass mode, Ming Lei, 2014/07/30
    - Re: [Qemu-devel] [PATCH 01/15] qemu coroutine: support bypass mode, Benoît Canet, 2014/07/31
    - Re: [Qemu-devel] [PATCH 01/15] qemu coroutine: support bypass mode, Ming Lei, 2014/07/31
    - Re: [Qemu-devel] [PATCH 01/15] qemu coroutine: support bypass mode, Paolo Bonzini, 2014/07/31
    - Re: [Qemu-devel] [PATCH 01/15] qemu coroutine: support bypass mode, Ming Lei <=
    - Re: [Qemu-devel] [PATCH 01/15] qemu coroutine: support bypass mode, Paolo Bonzini, 2014/07/31
    - Re: [Qemu-devel] [PATCH 01/15] qemu coroutine: support bypass mode, Ming Lei, 2014/07/31
    - Re: [Qemu-devel] [PATCH 01/15] qemu coroutine: support bypass mode, Ming Lei, 2014/07/31
    - Re: [Qemu-devel] [PATCH 01/15] qemu coroutine: support bypass mode, Paolo Bonzini, 2014/07/31
    - Re: [Qemu-devel] [PATCH 01/15] qemu coroutine: support bypass mode, Ming Lei, 2014/07/31
- [Qemu-devel] [PATCH 02/15] qemu aio: prepare for supporting selective bypass coroutine, Ming Lei, 2014/07/30
- [Qemu-devel] [PATCH 03/15] block: support to bypass qemu coroutinue, Ming Lei, 2014/07/30
- [Qemu-devel] [PATCH 04/15] Revert "raw-posix: drop raw_get_aio_fd() since it is no longer used", Ming Lei, 2014/07/30
- [Qemu-devel] [PATCH 05/15] dataplane: enable selective bypassing coroutine, Ming Lei, 2014/07/30
- [Qemu-devel] [PATCH 06/15] qemu/obj_pool.h: introduce object allocation pool, Ming Lei, 2014/07/30

Prev by Date: Re: [Qemu-devel] [v2][PATCH 0/5] xen: introduce new machine for IGD passthrough
Next by Date: Re: [Qemu-devel] [v2][PATCH 2/5] hw:pci-host:piix: split i440fx_init
Previous by thread: Re: [Qemu-devel] [PATCH 01/15] qemu coroutine: support bypass mode
Next by thread: Re: [Qemu-devel] [PATCH 01/15] qemu coroutine: support bypass mode
Index(es):
- Date
- Thread