qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virt


From: Ming Lei
Subject: Re: [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virtqueue support
Date: Wed, 6 Aug 2014 21:53:04 +0800

On Wed, Aug 6, 2014 at 4:50 PM, Paolo Bonzini <address@hidden> wrote:
> Il 06/08/2014 10:38, Ming Lei ha scritto:
>> On Wed, Aug 6, 2014 at 3:45 PM, Paolo Bonzini <address@hidden> wrote:
>>> Il 06/08/2014 07:33, Ming Lei ha scritto:
>>>>>> I played a bit with the following, I hope it's not too naive. I couldn't
>>>>>> see a difference with your patches, but at least one reason for this is
>>>>>> probably that my laptop SSD isn't fast enough to make the CPU the
>>>>>> bottleneck. Haven't tried ramdisk yet, that would probably be the next
>>>>>> thing. (I actually wrote the patch up just for some profiling on my own,
>>>>>> not for comparing throughput, but it should be usable for that as well.)
>>>> This might not be good for the test since it is basically a sequential
>>>> read test, which can be optimized a lot by kernel. And I always use
>>>> randread benchmark.
>>>
>>> A microbenchmark already exists in tests/test-coroutine.c, and doesn't
>>> really tell us much; it's obvious that coroutines execute more code, the
>>> question is why it affects the iops performance.
>>
>> Could you take a look at the coroutine benchmark I worte?  The running
>> result shows coroutine does decrease performance a lot compared with
>> bypass coroutine like the patchset is doing.
>
> Your benchmark is synchronous, while disk I/O is asynchronous.

It can be thought as asynchronous too, since it won't sleep like
synchronous I/O.

Basically the IO thread is CPU bound type in case of linux-aio
since both submission and completion won't block CPU mostly,
so my benchmark still fits if we thought the completion as nop.

The current problem is that from single coroutine benchmark,
looks it doesn't hurt performance with stack switch, but in Kevin's
block aio benchmark, bypass coroutine can still obtain observable
improvement.

>
> Your benchmark doesn't add much compared to "time tests/test-coroutine
> -m perf  -p /perf/yield".  It takes 8 seconds on my machine, and 10^8
> function calls obviously take less than 8 seconds.  I've sent a patch to
> add a "baseline" function call benchmark to test-coroutine.
>
>>> The sequential read should be the right workload.  For fio, you want to
>>> get as many iops as possible to QEMU and so you need randread.  But
>>> qemu-img is not run in a guest and if the kernel optimizes sequential
>>> reads then the bypass should have even more benefits because it makes
>>> userspace proportionally more expensive.
>
> Do you agree with this?

Yes, I have posted the result of the benchmark, and looks the result
is basically similar with my previous test on dataplane.

Thanks,



reply via email to

[Prev in Thread] Current Thread [Next in Thread]