qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH 00/11] hw/nvme: reimplement all multi-aio commands with c


From: Klaus Jensen
Subject: Re: [RFC PATCH 00/11] hw/nvme: reimplement all multi-aio commands with custom aiocbs
Date: Mon, 7 Jun 2021 13:02:55 +0200

On Jun  7 13:24, Vladimir Sementsov-Ogievskiy wrote:
07.06.2021 13:00, Klaus Jensen wrote:
On Jun  7 10:11, Vladimir Sementsov-Ogievskiy wrote:
07.06.2021 09:17, Klaus Jensen wrote:
On Jun  7 08:14, Vladimir Sementsov-Ogievskiy wrote:
04.06.2021 09:52, Klaus Jensen wrote:

I've kept the RFC since I'm still new to using the block layer like
this. I was hoping that Stefan could find some time to look over this -
this is a huge series, so I don't expect non-nvme folks to spend a large
amount of time on it, but I would really like feedback on my approach in
the reimplementation of flush and format.

If I understand your code correctly, you do stat next io operation from 
call-back of a previous. It works, and it is similar to haw mirror block-job 
was operating some time ago (still it maintained several in-flight requests 
simultaneously, but I'm about using _aio_ functions). Still, now mirror doesn't 
use _aio_ functions like this.

Better approach to call several io functions of block layer one-by-one is 
creating a coroutine. You may just add a coroutine function, that does all your 
linear logic as you want, without any callbacks like:

nvme_co_flush()
{
  for (...) {
     blk_co_flush();
  }
}

(and you'll need qemu_coroutine_create() and qemu_coroutine_enter() to start a 
coroutine).


So, this is definitely a tempting way to implement this. I must admit that I 
did not consider it like this because I thought this was at the wrong level of 
abstraction (looked to me like this was something that belonged in block/, not 
hw/). Again, I referred to the Trim implementation in hw/ide as the source of 
inspiration on the sequential AIOCB approach.

No, I think it's OK from abstraction point of view. Everybody is welcome to use 
coroutines if it is appropriate and especially for doing sequential IOs :)
Actually, it's just more efficient: the way I propose, you create one 
coroutine, which does all your logic as you want, when blk_aio_ functions 
actually create a coroutine under the hood each time (I don't think that it 
noticeably affects performance, but logic becomes more straightforward)

The only problem is that for this way we don't have cancellation API, so you 
can't use it for cancellation anyway :(


Yeah, I'm not really feeling up for adding that :P


Still, I'm not sure that moving from simultaneous issuing several IO commands 
to sequential is good idea..
And this way you of course can't use blk_aio_canel.. This leads to my last 
doubt:

One more thing I don't understand after fast look at the series: how cancelation 
works? It seems to me, that you just call cancel on nested AIOCBs, produced by 
blk_<io_functions>, but no one of them implement cancel.. I see only four 
implementations of .cancel_async callback in the whole Qemu code: in iscsi, in 
ide/core.c, in dma-helpers.c and in thread-pool.c.. Seems no one is related to 
blk_aio_flush() and other blk_* functions you call in the series. Or, what I miss?


Right now, cancellation is only initiated by the device when a submission queue 
is deleted. This causes blk_aio_cancel() to be called on each BlockAIOCB 
(NvmeRequest.aiocb) for outstanding requests. In most cases this BlockAIOCB is 
a DMAAIOCB from softmmu/dma-helpers.c, which implements .cancel_async. Prior to 
this patchset, Flush, DSM, Copy and so on, did not have any BlockAIOCB to 
cancel since we did not keep references to the ongoing AIOs.

Hmm. Looking at flush for example, I don't see how DMAAIOCB comes.

You do

 iocb->aiocb = blk_aio_flush(ns->blkconf.blk, nvme_flush_ns_cb, iocb);

it calls blk_aio_prwv(), it calls blk_aio_get() with blk_aio_em_aiocb_info, 
that doesn't implement .cancel_async..


I meant that most I/O in the regular path (read/write) are using the dma 
helpers (since they do DMA). We might use the blk_aio_p{read,write} directly 
when we read from/write to memory on the device (the controller memory buffer), 
but it is not the common case.

You are correct that BlkAioEmAIOCB does not implement cancel, but the "wrapper" 
(NvmeFlushAIOCB) *does*. This means that, from the NVMe controller perspective, we can 
cancel the flush in between (un-cancellable blk_aio_flush-initiated) flushes to multiple 
namespaces.

Hm. But it will work the same way if you just not implement .cancel_async in 
nvme_flush_aiocb_info.


Oh. Crap, I see. I screwed this up in flush.

blk_aio_cancel_async(iocb->aiocb) will be a no-op and ret will never be -ECANCELED. I think I do this correctly in the other reimplementations (explicitly set iocb->ret), but not here in flush.

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]