Re: [PATCH v3 09/19] block: introduce FleecingState class

qemu-block

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 09/19] block: introduce FleecingState class

From:	Hanna Reitz
Subject:	Re: [PATCH v3 09/19] block: introduce FleecingState class
Date:	Tue, 18 Jan 2022 17:37:26 +0100
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.3.0

On 22.12.21 18:40, Vladimir Sementsov-Ogievskiy wrote:

FleecingState represents state shared between copy-before-write filter
and upcoming fleecing block driver.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
  block/fleecing.h  | 135 ++++++++++++++++++++++++++++++++++
  block/fleecing.c  | 182 ++++++++++++++++++++++++++++++++++++++++++++++
  MAINTAINERS       |   2 +
  block/meson.build |   1 +
  4 files changed, 320 insertions(+)
  create mode 100644 block/fleecing.h
  create mode 100644 block/fleecing.c

diff --git a/block/fleecing.h b/block/fleecing.h
new file mode 100644
index 0000000000..fb7b2f86c4
--- /dev/null
+++ b/block/fleecing.h
@@ -0,0 +1,135 @@
+/*
+ * FleecingState
+ *
+ * The common state of image fleecing, shared between copy-before-write filter
+ * and fleecing block driver.

From this documentation, it’s unclear to me who owns the FleecingStateobject. I would have assumed it’s the fleecing node, and if it is, Iwonder why we even have this external interface instead of consideringFleecingState a helper object for the fleecing block driver (or ratherthe block driver’s opaque state, which it basically is, as far as I cansee from peeking into the next patch), and putting both into a singlefile with no external interface except forfleecing_mark_done_and_wait_readers().

+ *
+ * Copyright (c) 2021 Virtuozzo International GmbH.
+ *
+ * Author:
+ *  Sementsov-Ogievskiy Vladimir <vsementsov@virtuozzo.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ *
+ *
+ * Fleecing scheme looks as follows:
+ *
+ * [guest blk]                   [nbd export]
+ *    |                              |
+ *    |root                          |
+ *    v                              v
+ * [copy-before-write]--target-->[fleecing drv]
+ *    |                          /   |
+ *    |file                     /    |file
+ *    v                        /     v
+ * [active disk]<--source-----/  [temp disk]
+ *
+ * Note that "active disk" is also called just "source" and "temp disk" is also
+ * called "target".
+ *
+ * What happens here:
+ *
+ * copy-before-write filter performs copy-before-write operations: on guest
+ * write we should copy old data to target child before rewriting. Note that we
+ * write this data through fleecing driver: it saves a possibility to implement
+ * a kind of cache in fleecing driver in future.

I don’t understand why this explanation is the first one given (and theonly one given explicitly as a reason) for why we want a fleecing blockdriver.

(1) If we implement caching later, I have a feeling that we’ll want newoptions for this. So a management layer that wants caching will need tobe updated at that point anyway (to specify these new options), so Idon’t understand how adding a fleecing block driver now would make iteasier later on to introduce caching.

(1b) It’s actually entirely possible that we will not want to use thefleecing driver for caching, because we decide that caching is much moreuseful as its own dedicated block driver.

(2) There are much better arguments below. This FleecingState youintroduce here makes it clear why we need a fleecing block driver; ithelps with synchronization, and it provides the “I’m done with this bit,I don’t care about it anymore” discard interface.

+ *
+ * Fleecing user is nbd export: it can read from fleecing node, which 
guarantees
+ * a snapshot-view for fleecing user. Fleecing user may also do discard
+ * operations.
+ *
+ * FleecingState is responsible for most of the fleecing logic:
+ *
+ * 1. Fleecing read. Handle reads of fleecing user: we should decide where from
+ * to read, from source node or from copy-before-write target node. In former
+ * case we need to synchronize with guest writes. See fleecing_read_lock() and
+ * fleecing_read_unlock() functionality.
+ *
+ * 2. Guest write synchronization (part of [1] actually). See
+ * fleecing_mark_done_and_wait_readers()
+ *
+ * 3. Fleecing discard. Used by fleecing user when corresponding area is 
already
+ * copied. Fleecing user may discard the area which is not needed anymore, that
+ * should result in:
+ *   - discarding data to free disk space
+ *   - clear bits in copy-bitmap of block-copy, to avoid extra 
copy-before-write
+ *     operations
+ *   - clear bits in access-bitmap of FleecingState, to avoid further wrong
+ *     access
+ *
+ * Still, FleecingState doesn't own any block children, so all real io
+ * operations (reads, writes and discards) are done by copy-before-write filter
+ * and fleecing block driver.

I find this a bit confusing, because for me, it raised the question of“why would it own block childen?”, which led to me wanting to know evenmore where the place of FleecingState is. This sentence makes it reallysound as if FleecingState is its own independent object floating aroundsomewhere, not owned by anything, and that feels very wrong.

(If FleecingState were owned by the fleecing driver, i.e. if itbasically were just the fleecing driver’s opaque data itself, then thequestion of what the FleecingState is, and whether it could own blockchildren wouldn’t even come up.)

+ */
+
+#ifndef FLEECING_H
+#define FLEECING_H
+
+#include "block/block_int.h"
+#include "block/block-copy.h"
+#include "block/reqlist.h"
+
+typedef struct FleecingState FleecingState;
+
+/*
+ * Create FleecingState.
+ *
+ * @bcs: link to block-copy owned by copy-before-write filter.


s/block-copy/block-copy state/

+ *
+ * @fleecing_node: should be fleecing block driver node. Used to create some

I think the “should be” should be dropped. This must be a fleecingblock driver, right?

(But then again, I really don’t understand why the FleecingState isseparate from BDRVFleecingState in the first place.)

+ * bitmaps in it.
+ */
+FleecingState *fleecing_new(BlockCopyState *bcs,
+                            BlockDriverState *fleecing_node,
+                            Error **errp);
+
+/* Free the state. Doesn't free block-copy state (@bcs) */
+void fleecing_free(FleecingState *s);
+
+/*
+ * Convenient function for thous who want to do fleecing read.


s/thous/those/

I kind of miss a quick summary here of what this function is for, i.e. to
(1) find out where to read the data from, and

(2) if it’s to be read from the source, to block the affected area fromwrites until the read is done.

But I don’t know how to phrase that concisely, so I’m also OK with nothaving such a summary.

+ *
+ * If requested region starts in "done" area, i.e. data is already copied to
+ * copy-before-write target node, req is set to NULL, pnum is set to available
+ * bytes to read from target. User is free to read @pnum bytes from target.
+ * Still, user is responsible for concurrent discards on target.
+ *
+ * If requests region starts in "not done" area, i.e. we have to read from


s/requests/requested/

+ * source node directly, than @pnum bytes of source node are frozen and


s/than/then/

+ * guaranteed not be rewritten until user calls cbw_snapshot_read_unlock().


s/guaranteed not be rewritten/guaranteed not to be rewritten/

(or perhaps also s/rewritten/modified/, but it’s probably just me towhom “rewritten” sounds a bit like “the same data is written again”)

+ *
+ * Returns 0 on success and -EACCES when try to read non-dirty area of
+ * access_bitmap.
+ */

This description doesn’t sufficiently describe the @req parameter. Itonly says that `*req == NULL` will be returned if the data is to be readfrom the target node, but other than that it doesn’t say whether *req isa pure “out” or “inout” parameter. It doesn’t say whether the callerhas to pre-fill it (and fleecing_read_lock() will set it to NULL if thecaller should read from the target), or whether fleecing_read_lock()will always set it (depending on whether to read from the source(non-NULL) or the target (NULL)).

Since it’s the latter (fleecing_read_lock() will allocate a BlockReq andreturn it), we also need to explain what we expect the user to do withthis, namely absolutely nothing except pass it again tofleecing_read_unlock().

+int fleecing_read_lock(FleecingState *f, int64_t offset,
+                       int64_t bytes, const BlockReq **req, int64_t *pnum);
+/* Called as closing pair for fleecing_read_lock() */

It isn’t quite clear from this summary whether this function should alsobe called if fleecing_read_lock() returned success, but *req == NULL. It shouldn’t, but given this description, I’d do it.

+void fleecing_read_unlock(FleecingState *f, const BlockReq *req);
+
+/*
+ * Called when fleecing user doesn't need the region anymore (for example the
+ * region is successfully read and backed up somewhere).
+ * This prevents extra copy-before-write operations in this area in future.
+ * Next fleecing read from this area will fail with -EACCES.
+ */
+void fleecing_discard(FleecingState *f, int64_t offset, int64_t bytes);
+
+/*
+ * Called by copy-before-write filter after successful copy-before-write
+ * operation to synchronize with parallel fleecing reads.
+ */
+void fleecing_mark_done_and_wait_readers(FleecingState *f, int64_t offset,
+                                         int64_t bytes);
+
+#endif /* FLEECING_H */
diff --git a/block/fleecing.c b/block/fleecing.c
new file mode 100644
index 0000000000..f75d11b892
--- /dev/null
+++ b/block/fleecing.c
@@ -0,0 +1,182 @@
+/*
+ * FleecingState
+ *
+ * The common state of image fleecing, shared between copy-before-write filter
+ * and fleecing block driver.
+ *
+ * Copyright (c) 2021 Virtuozzo International GmbH.
+ *
+ * Author:
+ *  Sementsov-Ogievskiy Vladimir <vsementsov@virtuozzo.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+
+#include "sysemu/block-backend.h"
+#include "qemu/cutils.h"
+#include "qapi/error.h"
+#include "block/block_int.h"
+#include "block/coroutines.h"
+#include "block/qdict.h"
+#include "block/block-copy.h"
+#include "block/reqlist.h"
+
+#include "block/fleecing.h"
+
+/*
+ * @bcs: link to block-copy state owned by copy-before-write filter which
+ * performs copy-before-write operations in context of fleecing scheme.
+ * FleecingState doesn't own the block-copy state and don't free it on cleanup.


s/don't/doesn't/

+ *
+ * @lock: protects access to @access_bitmap, @done_bitmap and @frozen_read_reqs
+ *
+ * @access_bitmap: represents areas allowed for reading by fleecing user.
+ * Reading from non-dirty areas leads to -EACCES. Discard operation among other

Since this is not really abort dirty or not, I’d prefer “clear(non-dirty)” instead of just “non-dirty”.

+ * things clears corresponding bits in this bitmaps.

It isn’t quite clear whether (A) the discard operation does variousthings, and one of them is to reset the corresponding area in theaccess_bitmap; or (B) the discard operation is only one of various waysto reset areas in the access_bitmap. (It’s (A), and so I’d just say“fleecing_discard() clears areas in this bitmap.”)

+ *
+ * @done_bitmap: represents areas that was successfully copied by


s/that was/that were/

+ * copy-before-write operations. So, for dirty areas fleecing user should read
+ * from target node and for clear areas - from source node.


I’d prefer “from target node, and for clear areas from source node”.

+ *
+ * @frozen_read_reqs: current read requests for fleecing user in source node.


Hmm, perhaps “ongoing” would be clearer than “current”?

+ * corresponding areas must not be rewritten by guest.

Not necessarily just the guest, so something like “Writing to theseareas must wait until the respective requests are settled.” would bemore general.

+ */
+typedef struct FleecingState {
+    BlockCopyState *bcs;
+
+    CoMutex lock;
+
+    BdrvDirtyBitmap *access_bitmap;
+    BdrvDirtyBitmap *done_bitmap;
+
+    BlockReqList frozen_read_reqs;
+} FleecingState;
+
+FleecingState *fleecing_new(BlockCopyState *bcs,
+                            BlockDriverState *fleecing_node,
+                            Error **errp)
+{
+    BdrvDirtyBitmap *bcs_bitmap = block_copy_dirty_bitmap(bcs),
+                    *done_bitmap, *access_bitmap;

I don’t really understand why you didn’t just start a new declarationhere, putting “BdrvDirtyBitmap” at the beginning of the line again.

+    int64_t cluster_size = block_copy_cluster_size(bcs);
+    FleecingState *s;
+
+    /* done_bitmap starts empty */
+    done_bitmap = bdrv_create_dirty_bitmap(fleecing_node, cluster_size, NULL,
+                                           errp);
+    if (!done_bitmap) {
+        return NULL;
+    }
+    bdrv_disable_dirty_bitmap(done_bitmap);
+
+    /* access_bitmap starts equal to bcs_bitmap */
+    access_bitmap = bdrv_create_dirty_bitmap(fleecing_node, cluster_size, NULL,
+                                             errp);
+    if (!access_bitmap) {
+        return NULL;
+    }
+    bdrv_disable_dirty_bitmap(access_bitmap);
+    if (!bdrv_dirty_bitmap_merge_internal(access_bitmap, bcs_bitmap,
+                                          NULL, true))
+    {
+        return NULL;
+    }

This function lacks a proper on-error clean-up path to free the dirtybitmaps.

+
+    s = g_new(FleecingState, 1);
+    *s = (FleecingState) {
+        .bcs = bcs,
+        .done_bitmap = done_bitmap,
+        .access_bitmap = access_bitmap,
+    };
+    qemu_co_mutex_init(&s->lock);
+    QLIST_INIT(&s->frozen_read_reqs);
+
+    return s;
+}
+
+void fleecing_free(FleecingState *s)
+{
+    if (!s) {
+        return;
+    }
+
+    bdrv_release_dirty_bitmap(s->access_bitmap);
+    bdrv_release_dirty_bitmap(s->done_bitmap);
+    g_free(s);
+}
+
+static BlockReq *add_read_req(FleecingState *s, uint64_t offset, uint64_t 
bytes)
+{
+    BlockReq *req = g_new(BlockReq, 1);
+
+    reqlist_init_req(&s->frozen_read_reqs, req, offset, bytes);
+
+    return req;
+}
+
+static void drop_read_req(BlockReq *req)
+{
+    reqlist_remove_req(req);
+    g_free(req);
+}
+
+int fleecing_read_lock(FleecingState *s, int64_t offset,
+                       int64_t bytes, const BlockReq **req,
+                       int64_t *pnum)
+{
+    bool done;
+
+    QEMU_LOCK_GUARD(&s->lock);
+
+    if (bdrv_dirty_bitmap_next_zero(s->access_bitmap, offset, bytes) != -1) {
+        return -EACCES;
+    }
+
+    bdrv_dirty_bitmap_status(s->done_bitmap, offset, bytes, &done, pnum);
+    if (!done) {
+        *req = add_read_req(s, offset, *pnum);
+    }
+
+    return 0;
+}
+
+void fleecing_read_unlock(FleecingState *s, const BlockReq *req)
+{
+    QEMU_LOCK_GUARD(&s->lock);
+
+    drop_read_req((BlockReq *)req);

In my opinion, any cast removing a `const` must be accompanied by anexplanatory comment.

As I understand it, this function takes a `const BlockReq *` pointer sobecause `fleecing_read_lock()` returns a `const BlockReq *` object,because we don’t want the user to modify that request, but still wantthem to be able to easily pass the object they’ve received to`fleecing_read_unlock()`, right?

The problem is of course that we need a mutable BlockReq object here,because QLIST_REMOVE() will modify it. Taking a `const BlockReq *` isjust not correct.

Perhaps instead of returning a `const BlockReq *` pointer to the caller,we should just return a `void *` opaque pointer that they are to pass tothis function again, that should ensure they don’t touch it just thesame, and we wouldn’t need this cast here.

+}
+
+void fleecing_discard(FleecingState *s, int64_t offset, int64_t bytes)
+{
+    WITH_QEMU_LOCK_GUARD(&s->lock) {
+        bdrv_reset_dirty_bitmap(s->access_bitmap, offset, bytes);
+    }
+
+    block_copy_reset(s->bcs, offset, bytes);
+}
+
+void fleecing_mark_done_and_wait_readers(FleecingState *s, int64_t offset,
+                                         int64_t bytes)
+{
+    assert(QEMU_IS_ALIGNED(offset, block_copy_cluster_size(s->bcs)));
+    assert(QEMU_IS_ALIGNED(bytes, block_copy_cluster_size(s->bcs)));
+
+    WITH_QEMU_LOCK_GUARD(&s->lock) {
+        bdrv_set_dirty_bitmap(s->done_bitmap, offset, bytes);
+        reqlist_wait_all(&s->frozen_read_reqs, offset, bytes, &s->lock);
+    }
+}
diff --git a/MAINTAINERS b/MAINTAINERS
index 7f24ee4b92..78ea04e292 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2423,6 +2423,8 @@ F: block/reqlist.c
  F: include/block/reqlist.h
  F: block/copy-before-write.h
  F: block/copy-before-write.c
+F: block/fleecing.h
+F: block/fleecing.c
  F: include/block/aio_task.h
  F: block/aio_task.c
  F: util/qemu-co-shared-resource.c
diff --git a/block/meson.build b/block/meson.build
index 5065cf33ba..d30da90a01 100644
--- a/block/meson.build
+++ b/block/meson.build
@@ -18,6 +18,7 @@ block_ss.add(files(
    'crypto.c',
    'dirty-bitmap.c',
    'filter-compress.c',
+  'fleecing.c',
    'io.c',
    'mirror.c',
    'nbd.c',

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [PATCH v3 09/19] block: introduce FleecingState class, Hanna Reitz <=
- Re: [PATCH v3 09/19] block: introduce FleecingState class, Vladimir Sementsov-Ogievskiy, 2022/01/18

Prev by Date: [PATCH 00/12] Removal of Aiocontext lock through drains: protect bdrv_replace_child_noperm.
Next by Date: [PATCH 09/12] jobs: ensure sleep in job_sleep_ns is fully performed
Previous by thread: [PATCH 00/12] Removal of Aiocontext lock through drains: protect bdrv_replace_child_noperm.
Next by thread: Re: [PATCH v3 09/19] block: introduce FleecingState class
Index(es):
- Date
- Thread