qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH 00/11] hw/nvme: reimplement all multi-aio commands with c


From: Vladimir Sementsov-Ogievskiy
Subject: Re: [RFC PATCH 00/11] hw/nvme: reimplement all multi-aio commands with custom aiocbs
Date: Mon, 7 Jun 2021 08:14:38 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0

04.06.2021 09:52, Klaus Jensen wrote:
From: Klaus Jensen <k.jensen@samsung.com>

This series reimplements flush, dsm, copy, zone reset and format nvm to
allow cancellation. I posted an RFC back in March ("hw/block/nvme:
convert ad-hoc aio tracking to aiocb") and I've applied some feedback
from Stefan and reimplemented the remaining commands.

The basic idea is to define custom AIOCBs for these commands. The custom
AIOCB takes care of issuing all the "nested" AIOs one by one instead of
blindly sending them off simultaneously without tracking the returned
aiocbs.

I'm not familiar with nvme. But intuitively, isn't it less efficient to send 
mutltiple commands one-by-one? Overall, wouldn't it be slower? In block layer 
we mostly do opposite transition: instead of doing IO operations one-by-one, 
run them simultaneously to make a non-empty queue on a device.. Even on one 
device. This way overall performance is increased.

If you need to store nested AIOCBs, you may store them in a list for example, 
and cancel in a loop, keeping simultaneous start for all flushes.. If you send 
two flushes on two different disks, what's the reason to wait for first flush 
finish before issuing the second?


I've kept the RFC since I'm still new to using the block layer like
this. I was hoping that Stefan could find some time to look over this -
this is a huge series, so I don't expect non-nvme folks to spend a large
amount of time on it, but I would really like feedback on my approach in
the reimplementation of flush and format.

If I understand your code correctly, you do stat next io operation from 
call-back of a previous. It works, and it is similar to haw mirror block-job 
was operating some time ago (still it maintained several in-flight requests 
simultaneously, but I'm about using _aio_ functions). Still, now mirror doesn't 
use _aio_ functions like this.

Better approach to call several io functions of block layer one-by-one is 
creating a coroutine. You may just add a coroutine function, that does all your 
linear logic as you want, without any callbacks like:

nvme_co_flush()
{
   for (...) {
      blk_co_flush();
   }
}

(and you'll need qemu_coroutine_create() and qemu_coroutine_enter() to start a 
coroutine).

Still, I'm not sure that moving from simultaneous issuing several IO commands 
to sequential is good idea..
And this way you of course can't use blk_aio_canel.. This leads to my last 
doubt:

One more thing I don't understand after fast look at the series: how cancelation 
works? It seems to me, that you just call cancel on nested AIOCBs, produced by 
blk_<io_functions>, but no one of them implement cancel.. I see only four 
implementations of .cancel_async callback in the whole Qemu code: in iscsi, in 
ide/core.c, in dma-helpers.c and in thread-pool.c.. Seems no one is related to 
blk_aio_flush() and other blk_* functions you call in the series. Or, what I miss?


Those commands are special in
that may issue AIOs to multiple namespaces and thus, to multiple block
backends. Since this device does not support iothreads, I've opted for
simply always returning the main loop aio context, but I wonder if this
is acceptable or not. It might be the case that this should contain an
assert of some kind, in case someone starts adding iothread support.

Klaus Jensen (11):
   hw/nvme: reimplement flush to allow cancellation
   hw/nvme: add nvme_block_status_all helper
   hw/nvme: reimplement dsm to allow cancellation
   hw/nvme: save reftag when generating pi
   hw/nvme: remove assert from nvme_get_zone_by_slba
   hw/nvme: use prinfo directly in nvme_check_prinfo and nvme_dif_check
   hw/nvme: add dw0/1 to the req completion trace event
   hw/nvme: reimplement the copy command to allow aio cancellation
   hw/nvme: reimplement zone reset to allow cancellation
   hw/nvme: reimplement format nvm to allow cancellation
   Partially revert "hw/block/nvme: drain namespaces on sq deletion"

  hw/nvme/nvme.h       |   10 +-
  include/block/nvme.h |    8 +
  hw/nvme/ctrl.c       | 1861 ++++++++++++++++++++++++------------------
  hw/nvme/dif.c        |   64 +-
  hw/nvme/trace-events |   21 +-
  5 files changed, 1102 insertions(+), 862 deletions(-)



--
Best regards,
Vladimir



reply via email to

[Prev in Thread] Current Thread [Next in Thread]