qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v5 07/11] block: use int64_t instead of int in driver write_z


From: Vladimir Sementsov-Ogievskiy
Subject: Re: [PATCH v5 07/11] block: use int64_t instead of int in driver write_zeroes handlers
Date: Sat, 5 Jun 2021 16:50:54 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0

04.06.2021 23:09, Eric Blake wrote:
On Wed, May 05, 2021 at 10:49:57AM +0300, Vladimir Sementsov-Ogievskiy wrote:
We are generally moving to int64_t for both offset and bytes parameters
on all io paths.

Main motivation is realization of 64-bit write_zeroes operation for
fast zeroing large disk chunks, up to the whole disk.

We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).

So, convert driver write_zeroes handlers bytes parameter to int64_t.

The only caller of all updated function is bdrv_co_do_pwrite_zeroes().

bdrv_co_do_pwrite_zeroes() itself is of course OK with widening of
callee parameter type. Also, bdrv_co_do_pwrite_zeroes()'s
max_write_zeroes is limited to INT_MAX. So, updated functions all are
safe, the will not get "bytes" larger than before.

they


Still, let's look through all updated functions, and add assertions to
the ones which are actually unprepared to values larger than INT_MAX.
For these drivers also set explicit max_pwrite_zeroes limit.

Let's go:

backup-top: Calls backup_top_cbw() and bdrv_co_pwrite_zeroes, both
   have 64bit argument.

backup_top_cbw has uint64_t argument (at least on current qemu.git; I
didn't spot if it was changed in the meantime earlier in this series
or if I'm missing review of a prerequisite), but we're safe since the
block layer does not pass in negative values.


blkdebug: calculations can't overflow, thanks to
   bdrv_check_qiov_request() in generic layer. rule_check() and
   bdrv_co_pwrite_zeroes() both have 64bit argument.

rule_check() is another function currently using uint64_t.


blklogwrites: pass to blk_log_writes_co_log() with 64bit argument.

That, and struct BlkLogWritesFileReq, still use uint64_t.


blkreply, copy-on-read, filter-compress: pass to bdrv_co_pwrite_zeroes() which 
is OK

blkreplay


file-posix: both handler calls raw_do_pwrite_zeroes, which is updated.
   In raw_do_pwrite_zeroes() calculations are OK due to
   bdrv_check_qiov_request(), bytes go to RawPosixAIOData::aio_nbytes
   which is uint64_t.

We also have to check where that uint64_t gets handed;
handle_aiocb_write_zeroes_block() passes a uint64_t[2] to
ioctl(BLKZEROOUT), handle_aiocb_write_zeroes() calls do_fallocate()
which takes off_t (and we compile to always have 64-bit off_t), as
does handle_aiocb_write_zeroes_unmap.  All look safe.


gluster: bytes go to GlusterAIOCB::size which is int64_t and to
   glfs_zerofill_async works with off_t.

iscsi: Aha, here we deal with iscsi_writesame16_task() that has
   uint32_t num_blocks argument and iscsi_writesame16_task() has

s/16/10/

   uint16_t argument. Make comments, add assertions and clarify
   max_pwrite_zeroes calculation.
   iscsi_allocmap_() functions already has int64_t argument
   is_byte_request_lun_aligned is simple to update, do it.

mirror_top: pass to bdrv_mirror_top_do_write which has uint16_t

s/16/64/

   argument

nbd: Aha, here we have protocol limitation, and NBDRequest::len is
   uint32_t. max_pwrite_zeroes is cleanly set to 32bit value, so we are
   OK for now.

nvme: Again, protocol limitation. And no inherent limit for
   write-zeroes at all. But from code that calculates cdw12 it's obvious
   that we do have limit and alignment. Let's clarify it. Also,
   obviously the code is not prepared to bytes=0. Let's handle this

to handle bytes=0

   case too.
   trace events already 64bit

qcow2: offset + bytes and alignment still works good (thanks to
   bdrv_check_qiov_request()), so tail calculation is OK
   qcow2_subcluster_zeroize() has 64bit argument, should be OK
   trace events updated

qed: qed_co_request wants int nb_sectors. Also in code we have size_t
   used for request length which may be 32bit.. So, let's just keep

Double dot looks odd.

   INT_MAX as a limit (aligning it down to pwrite_zeroes_alignment) and
   don't care.

Yeah, even when size_t is 64-bit, qed is not our high priority so
sticking to 32-bit limit encourages people to switch away from qed ;)


raw-format: Is OK. raw_adjust_offset and bdrv_co_pwrite_zeroes are both
   64bit.

I probably already mentioned it in earlier reviews, but hopefully by
the end of the series, we clean up raw_adjust_offset() to take
int64_t* instead of uint64_t* to get rid of ugly casts.  Doesn't have
to be done in this patch.


throttle: Both throttle_group_co_io_limits_intercept() and
   bdrv_co_pwrite_zeroes() are 64bit.

vmdk: pass to vmdk_pwritev which is 64bit

quorum: pass to quorum_co_pwritev() which is 64bit

preallocated: pass to handle_write() and bdrv_co_pwrite_zeroes(), both

File is named preallocate.c, the 'd' seems odd. Worth sorting this
before qcow2 in the commit message?

I think yes


   64bit.

Hooray!

At this point all block drivers are prepared to 64bit write-zero
requests or has explicitly set max_pwrite_zeroes.

are prepared to support 64bit write-zero requests, or have explicitly set


Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---

+++ b/block/iscsi.c
@@ -1205,15 +1205,16 @@ out_unlock:
static int
  coroutine_fn iscsi_co_pwrite_zeroes(BlockDriverState *bs, int64_t offset,
-                                    int bytes, BdrvRequestFlags flags)
+                                    int64_t bytes, BdrvRequestFlags flags)
  {
      IscsiLun *iscsilun = bs->opaque;
      struct IscsiTask iTask;
      uint64_t lba;
-    uint32_t nb_blocks;
+    uint64_t nb_blocks;
      bool use_16_for_ws = iscsilun->use_16_for_rw;
      int r = 0;
+
      if (!is_byte_request_lun_aligned(offset, bytes, iscsilun)) {

Why the added blank line here?

mistake


          return -ENOTSUP;
      }
@@ -1250,11 +1251,21 @@ coroutine_fn iscsi_co_pwrite_zeroes(BlockDriverState 
*bs, int64_t offset,
      iscsi_co_init_iscsitask(iscsilun, &iTask);
  retry:
      if (use_16_for_ws) {
+        /*
+         * iscsi_writesame16_task num_blocks argument is uint32_t. We rely here
+         * on our max_pwrite_zeroes limit.
+         */
+        assert(nb_blocks < UINT32_MAX);
          iTask.task = iscsi_writesame16_task(iscsilun->iscsi, iscsilun->lun, 
lba,
                                              iscsilun->zeroblock, 
iscsilun->block_size,
                                              nb_blocks, 0, !!(flags & 
BDRV_REQ_MAY_UNMAP),
                                              0, 0, iscsi_co_generic_cb, 
&iTask);
      } else {
+        /*
+         * iscsi_writesame10_task num_blocks argument is uint16_t. We rely here
+         * on our max_pwrite_zeroes limit.
+         */
+        assert(nb_blocks < UINT16_MAX);
          iTask.task = iscsi_writesame10_task(iscsilun->iscsi, iscsilun->lun, 
lba,
                                              iscsilun->zeroblock, 
iscsilun->block_size,
                                              nb_blocks, 0, !!(flags & 
BDRV_REQ_MAY_UNMAP),

Thanks - these assertions and comments are indeed a lifesaver for
future maintenance.

@@ -2074,10 +2085,10 @@ static void iscsi_refresh_limits(BlockDriverState *bs, 
Error **errp)
          bs->bl.pdiscard_alignment = iscsilun->block_size;
      }
- if (iscsilun->bl.max_ws_len < 0xffffffff / block_size) {
-        bs->bl.max_pwrite_zeroes =
-            iscsilun->bl.max_ws_len * iscsilun->block_size;
-    }
+    bs->bl.max_pwrite_zeroes =
+        MIN_NON_ZERO(iscsilun->bl.max_ws_len * iscsilun->block_size,
+                     max_xfer_len * iscsilun->block_size);

Works because max_xfer_len is 0xffff if we are stuck using
writesame10, or 0xffffffff if we are able to use writesame16.

+++ b/block/nvme.c
@@ -1266,19 +1266,29 @@ static coroutine_fn int nvme_co_flush(BlockDriverState 
*bs)
static coroutine_fn int nvme_co_pwrite_zeroes(BlockDriverState *bs,
                                                int64_t offset,
-                                              int bytes,
+                                              int64_t bytes,
                                                BdrvRequestFlags flags)
  {
      BDRVNVMeState *s = bs->opaque;
      NVMeQueuePair *ioq = s->queues[INDEX_IO(0)];
      NVMeRequest *req;
-
-    uint32_t cdw12 = ((bytes >> s->blkshift) - 1) & 0xFFFF;
+    uint32_t cdw12;
if (!s->supports_write_zeroes) {
          return -ENOTSUP;
      }
+ if (bytes == 0) {
+        return 0;
+    }
+
+    cdw12 = ((bytes >> s->blkshift) - 1) & 0xFFFF;
+    /*
+     * We should not loose information. pwrite_zeroes_alignment and

lose (this is a common English typo; "lose" rhymes with "use" and is
opposite of "gain", while "loose" rhymes with "goose" and is opposite
of "tighten")

+++ b/block/qed.c

@@ -1408,6 +1409,12 @@ static int coroutine_fn 
bdrv_qed_co_pwrite_zeroes(BlockDriverState *bs,
       */
      QEMUIOVector qiov = QEMU_IOVEC_INIT_BUF(qiov, NULL, bytes);
+ /*
+     * QED is not prepared for 62bit write-zero requests, so rely on

64bit

+     * max_pwrite_zeroes.
+     */
+    assert(bytes <= INT_MAX);
+
      /* Fall back if the request is not aligned */
      if (qed_offset_into_cluster(s, offset) ||
          qed_offset_into_cluster(s, bytes)) {

Reviewed-by: Eric Blake <eblake@redhat.com>


Thanks!

--
Best regards,
Vladimir



reply via email to

[Prev in Thread] Current Thread [Next in Thread]