qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v6 04/10] qcow2: Support BDRV_REQ_ZERO_WRITE for truncate


From: Vladimir Sementsov-Ogievskiy
Subject: Re: [PATCH v6 04/10] qcow2: Support BDRV_REQ_ZERO_WRITE for truncate
Date: Fri, 24 Apr 2020 17:27:31 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.2.1

24.04.2020 15:17, Kevin Wolf wrote:
Am 24.04.2020 um 08:16 hat Vladimir Sementsov-Ogievskiy geschrieben:
23.04.2020 18:01, Kevin Wolf wrote:
If BDRV_REQ_ZERO_WRITE is set and we're extending the image, calling
qcow2_cluster_zeroize() with flags=0 does the right thing: It doesn't
undo any previous preallocation, but just adds the zero flag to all
relevant L2 entries. If an external data file is in use, a write_zeroes
request to the data file is made instead.

Signed-off-by: Kevin Wolf <address@hidden>
Reviewed-by: Max Reitz <address@hidden>
---
   block/qcow2-cluster.c |  2 +-
   block/qcow2.c         | 33 +++++++++++++++++++++++++++++++++
   2 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 17f1363279..4b5fc8c4a7 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1795,7 +1795,7 @@ int qcow2_cluster_zeroize(BlockDriverState *bs, uint64_t 
offset,
       /* Caller must pass aligned values, except at image end */
       assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
       assert(QEMU_IS_ALIGNED(end_offset, s->cluster_size) ||
-           end_offset == bs->total_sectors << BDRV_SECTOR_BITS);
+           end_offset >= bs->total_sectors << BDRV_SECTOR_BITS);
       /* The zero flag is only supported by version 3 and newer */
       if (s->qcow_version < 3) {
diff --git a/block/qcow2.c b/block/qcow2.c
index 9cfbdfc939..ad621fe404 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1726,6 +1726,7 @@ static int coroutine_fn qcow2_do_open(BlockDriverState 
*bs, QDict *options,
       bs->supported_zero_flags = header.version >= 3 ?
                                  BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK : 0;
+    bs->supported_truncate_flags = BDRV_REQ_ZERO_WRITE;
       /* Repair image if dirty */
       if (!(flags & (BDRV_O_CHECK | BDRV_O_INACTIVE)) && !bs->read_only &&
@@ -4214,6 +4215,38 @@ static int coroutine_fn 
qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
           g_assert_not_reached();
       }
+    if ((flags & BDRV_REQ_ZERO_WRITE) && offset > old_length) {
+        uint64_t zero_start = QEMU_ALIGN_UP(old_length, s->cluster_size);
+
+        /*
+         * Use zero clusters as much as we can. qcow2_cluster_zeroize()
+         * requires a cluster-aligned start. The end may be unaligned if it is
+         * at the end of the image (which it is here).
+         */
+        ret = qcow2_cluster_zeroize(bs, zero_start, offset - zero_start, 0);
+        if (ret < 0) {
+            error_setg_errno(errp, -ret, "Failed to zero out new clusters");
+            goto fail;
+        }
+
+        /* Write explicit zeros for the unaligned head */
+        if (zero_start > old_length) {
+            uint8_t *buf = qemu_blockalign0(bs, s->cluster_size);

Why not s/s->cluster_size/zero_start - old_length/? We may save a lot
of pages if cluster_size is large.

Ok.

+            QEMUIOVector qiov;
+            qemu_iovec_init_buf(&qiov, buf, zero_start - old_length);
+
+            qemu_co_mutex_unlock(&s->lock);

We are in intermediate state here. Is it safe to unlock? Anything may
happen, up to another truncate..

I don't think it's worse than unlocking during normal writes, which we
have been doing for a long time. I don't see anything that would corrupt
any internal state.

Of course, doing truncate in parallel with other operations around EOF
has somewhat undefined semantics because the order could be anything.
But if you don't want to get undefined results, you just shouldn't make
such requests. It's similar to reading and writing the same area at the
same time.

If there would be two truncate operations in parallel, we may finish up with 
s->l1_vm_state_index bs->total_sectors not corresponding to other metadata. Not 
sure how much is it bad..


+            ret = qcow2_co_pwritev_part(bs, old_length, qiov.size, &qiov, 0, 
0);

Better not do it if this cluster is already ZERO.. But it may be a
later patch to improve that.

I doubt it's worth writing code to optimise a corner case inside a
corner case.

+            qemu_co_mutex_lock(&s->lock);
+
+            qemu_vfree(buf);
+            if (ret < 0) {
+                error_setg_errno(errp, -ret, "Failed to zero out the new 
area");
+                goto fail;
+            }
+        }
+    }
+
       if (prealloc != PREALLOC_MODE_OFF) {
           /* Flush metadata before actually changing the image size */
           ret = qcow2_write_caches(bs);

Kevin



--
Best regards,
Vladimir



reply via email to

[Prev in Thread] Current Thread [Next in Thread]