On Wed, May 05, 2021 at 10:49:57AM +0300, Vladimir Sementsov-Ogievskiy wrote:
We are generally moving to int64_t for both offset and bytes parameters
on all io paths.
Main motivation is realization of 64-bit write_zeroes operation for
fast zeroing large disk chunks, up to the whole disk.
We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).
So, convert driver write_zeroes handlers bytes parameter to int64_t.
The only caller of all updated function is bdrv_co_do_pwrite_zeroes().
bdrv_co_do_pwrite_zeroes() itself is of course OK with widening of
callee parameter type. Also, bdrv_co_do_pwrite_zeroes()'s
max_write_zeroes is limited to INT_MAX. So, updated functions all are
safe, the will not get "bytes" larger than before.
they
Still, let's look through all updated functions, and add assertions to
the ones which are actually unprepared to values larger than INT_MAX.
For these drivers also set explicit max_pwrite_zeroes limit.
Let's go:
backup-top: Calls backup_top_cbw() and bdrv_co_pwrite_zeroes, both
have 64bit argument.
backup_top_cbw has uint64_t argument (at least on current qemu.git; I
didn't spot if it was changed in the meantime earlier in this series
or if I'm missing review of a prerequisite), but we're safe since the
block layer does not pass in negative values.
blkdebug: calculations can't overflow, thanks to
bdrv_check_qiov_request() in generic layer. rule_check() and
bdrv_co_pwrite_zeroes() both have 64bit argument.
rule_check() is another function currently using uint64_t.
blklogwrites: pass to blk_log_writes_co_log() with 64bit argument.
That, and struct BlkLogWritesFileReq, still use uint64_t.
blkreply, copy-on-read, filter-compress: pass to bdrv_co_pwrite_zeroes() which
is OK
blkreplay
file-posix: both handler calls raw_do_pwrite_zeroes, which is updated.
In raw_do_pwrite_zeroes() calculations are OK due to
bdrv_check_qiov_request(), bytes go to RawPosixAIOData::aio_nbytes
which is uint64_t.
We also have to check where that uint64_t gets handed;
handle_aiocb_write_zeroes_block() passes a uint64_t[2] to
ioctl(BLKZEROOUT), handle_aiocb_write_zeroes() calls do_fallocate()
which takes off_t (and we compile to always have 64-bit off_t), as
does handle_aiocb_write_zeroes_unmap. All look safe.
gluster: bytes go to GlusterAIOCB::size which is int64_t and to
glfs_zerofill_async works with off_t.
iscsi: Aha, here we deal with iscsi_writesame16_task() that has
uint32_t num_blocks argument and iscsi_writesame16_task() has
s/16/10/
uint16_t argument. Make comments, add assertions and clarify
max_pwrite_zeroes calculation.
iscsi_allocmap_() functions already has int64_t argument
is_byte_request_lun_aligned is simple to update, do it.
mirror_top: pass to bdrv_mirror_top_do_write which has uint16_t
s/16/64/
argument
nbd: Aha, here we have protocol limitation, and NBDRequest::len is
uint32_t. max_pwrite_zeroes is cleanly set to 32bit value, so we are
OK for now.
nvme: Again, protocol limitation. And no inherent limit for
write-zeroes at all. But from code that calculates cdw12 it's obvious
that we do have limit and alignment. Let's clarify it. Also,
obviously the code is not prepared to bytes=0. Let's handle this
to handle bytes=0
case too.
trace events already 64bit
qcow2: offset + bytes and alignment still works good (thanks to
bdrv_check_qiov_request()), so tail calculation is OK
qcow2_subcluster_zeroize() has 64bit argument, should be OK
trace events updated
qed: qed_co_request wants int nb_sectors. Also in code we have size_t
used for request length which may be 32bit.. So, let's just keep
Double dot looks odd.
INT_MAX as a limit (aligning it down to pwrite_zeroes_alignment) and
don't care.
Yeah, even when size_t is 64-bit, qed is not our high priority so
sticking to 32-bit limit encourages people to switch away from qed ;)
raw-format: Is OK. raw_adjust_offset and bdrv_co_pwrite_zeroes are both
64bit.
I probably already mentioned it in earlier reviews, but hopefully by
the end of the series, we clean up raw_adjust_offset() to take
int64_t* instead of uint64_t* to get rid of ugly casts. Doesn't have
to be done in this patch.
throttle: Both throttle_group_co_io_limits_intercept() and
bdrv_co_pwrite_zeroes() are 64bit.
vmdk: pass to vmdk_pwritev which is 64bit
quorum: pass to quorum_co_pwritev() which is 64bit
preallocated: pass to handle_write() and bdrv_co_pwrite_zeroes(), both
File is named preallocate.c, the 'd' seems odd. Worth sorting this
before qcow2 in the commit message?
64bit.
Hooray!
At this point all block drivers are prepared to 64bit write-zero
requests or has explicitly set max_pwrite_zeroes.
are prepared to support 64bit write-zero requests, or have explicitly set
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
+++ b/block/iscsi.c
@@ -1205,15 +1205,16 @@ out_unlock:
static int
coroutine_fn iscsi_co_pwrite_zeroes(BlockDriverState *bs, int64_t offset,
- int bytes, BdrvRequestFlags flags)
+ int64_t bytes, BdrvRequestFlags flags)
{
IscsiLun *iscsilun = bs->opaque;
struct IscsiTask iTask;
uint64_t lba;
- uint32_t nb_blocks;
+ uint64_t nb_blocks;
bool use_16_for_ws = iscsilun->use_16_for_rw;
int r = 0;
+
if (!is_byte_request_lun_aligned(offset, bytes, iscsilun)) {
Why the added blank line here?
return -ENOTSUP;
}
@@ -1250,11 +1251,21 @@ coroutine_fn iscsi_co_pwrite_zeroes(BlockDriverState
*bs, int64_t offset,
iscsi_co_init_iscsitask(iscsilun, &iTask);
retry:
if (use_16_for_ws) {
+ /*
+ * iscsi_writesame16_task num_blocks argument is uint32_t. We rely here
+ * on our max_pwrite_zeroes limit.
+ */
+ assert(nb_blocks < UINT32_MAX);
iTask.task = iscsi_writesame16_task(iscsilun->iscsi, iscsilun->lun,
lba,
iscsilun->zeroblock,
iscsilun->block_size,
nb_blocks, 0, !!(flags &
BDRV_REQ_MAY_UNMAP),
0, 0, iscsi_co_generic_cb,
&iTask);
} else {
+ /*
+ * iscsi_writesame10_task num_blocks argument is uint16_t. We rely here
+ * on our max_pwrite_zeroes limit.
+ */
+ assert(nb_blocks < UINT16_MAX);
iTask.task = iscsi_writesame10_task(iscsilun->iscsi, iscsilun->lun,
lba,
iscsilun->zeroblock,
iscsilun->block_size,
nb_blocks, 0, !!(flags &
BDRV_REQ_MAY_UNMAP),
Thanks - these assertions and comments are indeed a lifesaver for
future maintenance.
@@ -2074,10 +2085,10 @@ static void iscsi_refresh_limits(BlockDriverState *bs,
Error **errp)
bs->bl.pdiscard_alignment = iscsilun->block_size;
}
- if (iscsilun->bl.max_ws_len < 0xffffffff / block_size) {
- bs->bl.max_pwrite_zeroes =
- iscsilun->bl.max_ws_len * iscsilun->block_size;
- }
+ bs->bl.max_pwrite_zeroes =
+ MIN_NON_ZERO(iscsilun->bl.max_ws_len * iscsilun->block_size,
+ max_xfer_len * iscsilun->block_size);
Works because max_xfer_len is 0xffff if we are stuck using
writesame10, or 0xffffffff if we are able to use writesame16.
+++ b/block/nvme.c
@@ -1266,19 +1266,29 @@ static coroutine_fn int nvme_co_flush(BlockDriverState
*bs)
static coroutine_fn int nvme_co_pwrite_zeroes(BlockDriverState *bs,
int64_t offset,
- int bytes,
+ int64_t bytes,
BdrvRequestFlags flags)
{
BDRVNVMeState *s = bs->opaque;
NVMeQueuePair *ioq = s->queues[INDEX_IO(0)];
NVMeRequest *req;
-
- uint32_t cdw12 = ((bytes >> s->blkshift) - 1) & 0xFFFF;
+ uint32_t cdw12;
if (!s->supports_write_zeroes) {
return -ENOTSUP;
}
+ if (bytes == 0) {
+ return 0;
+ }
+
+ cdw12 = ((bytes >> s->blkshift) - 1) & 0xFFFF;
+ /*
+ * We should not loose information. pwrite_zeroes_alignment and
lose (this is a common English typo; "lose" rhymes with "use" and is
opposite of "gain", while "loose" rhymes with "goose" and is opposite
of "tighten")
+++ b/block/qed.c
@@ -1408,6 +1409,12 @@ static int coroutine_fn
bdrv_qed_co_pwrite_zeroes(BlockDriverState *bs,
*/
QEMUIOVector qiov = QEMU_IOVEC_INIT_BUF(qiov, NULL, bytes);
+ /*
+ * QED is not prepared for 62bit write-zero requests, so rely on
64bit
+ * max_pwrite_zeroes.
+ */
+ assert(bytes <= INT_MAX);
+
/* Fall back if the request is not aligned */
if (qed_offset_into_cluster(s, offset) ||
qed_offset_into_cluster(s, bytes)) {
Reviewed-by: Eric Blake <eblake@redhat.com>