qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Combining synchronous and asynchronous IO


From: Kevin Wolf
Subject: Re: [Qemu-devel] Combining synchronous and asynchronous IO
Date: Mon, 18 Mar 2019 16:35:06 +0100
User-agent: Mutt/1.11.3 (2019-02-01)

Am 18.03.2019 um 13:44 hat Sergio Lopez geschrieben:
> Kevin Wolf writes:
> > Am 15.03.2019 um 16:33 hat Sergio Lopez geschrieben:
> >> Stefan Hajnoczi writes:
> >> > On Thu, Mar 14, 2019 at 06:31:34PM +0100, Sergio Lopez wrote:
> >> >> Our current AIO path does a great job at unloading the work from the VM,
> >> >> and combined with IOThreads provides a good performance in most
> >> >> scenarios. But it also comes with its costs, in both a longer execution
> >> >> path and the need of the intervention of the scheduler at various
> >> >> points.
> >> >> 
> >> >> There's one particular workload that suffers from this cost, and that's
> >> >> when you have just 1 or 2 cores on the Guest issuing synchronous
> >> >> requests. This happens to be a pretty common workload for some DBs and,
> >> >> in a general sense, on small VMs.
> >> >> 
> >> >> I did a quick'n'dirty implementation on top of virtio-blk to get some
> >> >> numbers. This comes from a VM with 4 CPUs running on an idle server,
> >> >> with a secondary virtio-blk disk backed by a null_blk device with a
> >> >> simulated latency of 30us.
> >> >
> >> > Can you describe the implementation in more detail?  Does "synchronous"
> >> > mean that hw/block/virtio_blk.c makes a blocking preadv()/pwritev() call
> >> > instead of calling blk_aio_preadv/pwritev()?  If so, then you are also
> >> > bypassing the QEMU block layer (coroutines, request tracking, etc) and
> >> > that might explain some of the latency.
> >> 
> >> The first implementation, the one I've used for getting these numbers,
> >> it's just preadv/pwrite from virtio_blk.c, as you correctly guessed. I
> >> know it's unfair, but I wanted to take a look at the best possible
> >> scenario, and then measure the cost of the other layers.
> >> 
> >> I'm working now on writing non-coroutine counterparts for
> >> blk_co_[preadv|pwrite], so we have SIO without bypassing the block layer.
> >
> > Maybe try to keep the change local to file-posix.c? I think you would
> > only have to modify raw_thread_pool_submit() so that it doesn't go
> > through the thread pool, but just calls func directly.
> 
> I already tried something similar, but I'd like to explore the
> possibility of avoiding the coroutine/aio_poll dance to trim down
> another ~10us.
> 
> If it's deemed to be too complex or hard to maintain, we can always fall
> back to something simpler.

I feel it will add in some complexity that we won't like, but at the
moment I don't have your exact patches, command lines and numbers, so I
can't say much yet.

When you think you've measured enough, I'd appreciate if you could send
all of this information to the list, so we can see what to make of it.

Kevin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]