qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [Qemu-devel] Combining synchronous and asynchronous IO


From: Kevin Wolf
Subject: Re: [Qemu-block] [Qemu-devel] Combining synchronous and asynchronous IO
Date: Mon, 18 Mar 2019 10:50:44 +0100
User-agent: Mutt/1.11.3 (2019-02-01)

Am 17.03.2019 um 06:58 hat Fam Zheng geschrieben:
> > On Mar 15, 2019, at 01:31, Sergio Lopez <address@hidden> wrote:
> > 
> > Hi,
> > 
> > Our current AIO path does a great job at unloading the work from the VM,
> > and combined with IOThreads provides a good performance in most
> > scenarios. But it also comes with its costs, in both a longer execution
> > path and the need of the intervention of the scheduler at various
> > points.
> > 
> > There's one particular workload that suffers from this cost, and that's
> > when you have just 1 or 2 cores on the Guest issuing synchronous
> > requests. This happens to be a pretty common workload for some DBs and,
> > in a general sense, on small VMs.
> > 
> > I did a quick'n'dirty implementation on top of virtio-blk to get some
> > numbers. This comes from a VM with 4 CPUs running on an idle server,
> > with a secondary virtio-blk disk backed by a null_blk device with a
> > simulated latency of 30us.
> > 
> > - Average latency (us)
> > 
> > ----------------------------------------
> > |        | AIO+iothread | SIO+iothread |
> > | 1 job  |      70      |      55      |
> > | 2 jobs |      83      |      82      |
> > | 4 jobs |      90      |     159      |
> > ----------------------------------------
> > 
> > In this case the intuition matches the reality, and synchronous IO wins
> > when there's just 1 job issuing the requests, while it loses hard when
> > the are 4.
> > 
> > While my first thought was implementing this as a tunable, turns out we
> > have a hint about the nature of the workload in the number of the
> > requests in the VQ. So I updated the code to use SIO if there's just 1
> > request and AIO otherwise, with these results:
> > 
> > -----------------------------------------------------------
> > |        | AIO+iothread | SIO+iothread | AIO+SIO+iothread |
> > | 1 job  |      70      |      55      |        55        |
> > | 2 jobs |      83      |      82      |        78        |
> > | 4 jobs |      90      |     159      |        90        |
> > -----------------------------------------------------------
> > 
> > This data makes me think this is something worth pursuing, but I'd like
> > to hear your opinion on it.
> 
> Nice. In many cases coroutines just forward the raw read/write to the
> raw file (no qcow2 LBA translation, backup, throttling, etc. in the
> data path), being able to transparently (and dynamically, since the
> said condition can change any time for any request) bypass block layer
> will be a very interesting idea to explore. The challenge is how not
> to totally break existing features (e.g. live snapshot and
> everything).

Before jumping to conclusions, maybe we should first try to find out
what is the important part that we're bypassing. Is it really the
indirections through the block layer (And if so, which part of them?
Maybe some things can selectively be bypassed.) or is it bypassing the
thread pool or Linux AIO overhead? The latter could easily be
implemented inside file-posix.

Kevin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]