Re: [Qemu-block] [PATCH v2 2/3] block/fleecing-filter: new filter driver

qemu-block

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [PATCH v2 2/3] block/fleecing-filter: new filter driver

From:	Vladimir Sementsov-Ogievskiy
Subject:	Re: [Qemu-block] [PATCH v2 2/3] block/fleecing-filter: new filter driver for fleecing
Date:	Mon, 2 Jul 2018 14:47:49 +0300
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0

29.06.2018 20:24, Eric Blake wrote:

On 06/29/2018 10:15 AM, Vladimir Sementsov-Ogievskiy wrote:
We need to synchronize backup job with reading from fleecing image
like it was done in block/replication.c.

Otherwise, the following situation is theoretically possible:
Grammar suggestions:
1. client start reading
client starts reading
2. client understand, that there is no corresponding cluster in
    fleecing image
3. client is going to read from backing file (i.e. active image)
client sees that no corresponding cluster has been allocated in thefleecing image, so the request is forwarded to the backing file
4. guest writes to active image
5. this write is stopped by backup(sync=none) and cluster is copied to
    fleecing image
6. guest write continues...
7. and client reads _new_ (or partly new) date from active image
Interesting race. Can it actually happen, or does our read codealready serialize writes to the same area while a read is underway?
In short, I see what problem you are claiming exists: the moment theclient starts reading from the backing file, that portion of thebacking file must remain unchanged until after the client is donereading. But I don't know enough details of the block layer to knowif this is actually a problem, or if adding the new filter is justoverhead.



Looking at the code, more real example (but I still have no reproducer):

1. client starts reading and take qcow2 mutex in qcow2_co_preadv, andgoes up to l2 table loading (assume cache miss)2) guest write => backup COW => qcow2 write => try to take qcow2 mutex=> waiting3. l2 table loaded, we see that cluster is UNALLOCATED, go to "caseQCOW2_CLUSTER_UNALLOCATED" and unlock mutex beforebdrv_co_preadv(bs->backing, ...)4) aha, mutex unlocked, backup COW continues, and we finally finishguest write and change cluster in our active disk5. actually, do bdrv_co_preadv(bs->backing, ...) and read _new updated_data.


So, this fleecing-filter should be above fleecing image, the whole
picture of fleecing looks like this:

     +-------+           +------------+
     |       |           |            |
     | guest |           | NBD client +<------+
     |       |           |            |       |
     ++-----++           +------------+       |only read
      |     ^                                 |
      | IO  |                                 |
      v     |                           +-----+------+
     ++-----+---------+                 |            |
     |                |                 |  internal  |
     |  active image  +----+            | NBD server |
     |                |    |            |            |
     +-+--------------+    |backup      +-+----------+
       ^                   |sync=none     ^
       |backing            |              |only read
       |                   |              |
     +-+--------------+    |       +------+----------+
     |                |    |       |                 |
     | fleecing image +<---+       | fleecing filter |
     |                |            |                 |
     +--------+-------+            +-----+-----------+
              ^                          |
              |                          |
              +--------------------------+
                        file

Can you also show the sequence of QMP commands to set up thisstructure (or maybe you do in 3/3; which I haven't looked at yet).


Signed-off-by: Vladimir Sementsov-Ogievskiy <address@hidden>
---
  qapi/block-core.json    |  6 ++--

block/fleecing-filter.c | 80+++++++++++++++++++++++++++++++++++++++++++++++++

  block/Makefile.objs     |  1 +
  3 files changed, 85 insertions(+), 2 deletions(-)
  create mode 100644 block/fleecing-filter.c

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 577ce5e999..43872c3d79 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2542,7 +2542,8 @@

'host_device', 'http', 'https', 'iscsi', 'luks', 'nbd','nfs', 'null-aio', 'null-co', 'nvme', 'parallels', 'qcow','qcow2', 'qed',

              'quorum', 'raw', 'rbd', 'replication', 'sheepdog', 'ssh',

- 'throttle', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat','vxhs' ] }

+            'throttle', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat', 'vxhs',
+            'fleecing-filter' ] }

Missing a 'since 3.0' documentation blurb; also, this enum has beenkept sorted, so your new filter needs to come earlier.

    ##
  # @BlockdevOptionsFile:
@@ -3594,7 +3595,8 @@
        'vmdk': 'BlockdevOptionsGenericCOWFormat',
        'vpc':        'BlockdevOptionsGenericFormat',
        'vvfat':      'BlockdevOptionsVVFAT',
-      'vxhs':       'BlockdevOptionsVxHS'
+      'vxhs':       'BlockdevOptionsVxHS',
+      'fleecing-filter': 'BlockdevOptionsGenericFormat'


Again, this has been kept sorted.

+static coroutine_fn int fleecing_co_preadv(BlockDriverState *bs,

+ uint64_t offset, uint64_tbytes,

+ QEMUIOVector *qiov, int flags)
+{
+    int ret;
+    BlockJob *job = bs->file->bs->backing->bs->job;
+    CowRequest req;
+
+    backup_wait_for_overlapping_requests(job, offset, bytes);
+    backup_cow_request_begin(&req, job, offset, bytes);
+
+    ret = bdrv_co_preadv(bs->file, offset, bytes, qiov, flags);
+
+    backup_cow_request_end(&req);
+
+    return ret;
+}

So the idea here is that you force a serializing request to ensurethat there are no other writes to the area in the meantime.

+
+static coroutine_fn int fleecing_co_pwritev(BlockDriverState *bs,
+ uint64_t offset,uint64_t bytes,
+ QEMUIOVector *qiov, int flags)
+{
+    return -EINVAL;

and you force this to be a read-only interface. (Does the block layeractually require us to provide a pwritev callback, or can we leave itNULL instead?)

+BlockDriver bdrv_fleecing_filter = {
+    .format_name = "fleecing-filter",
+    .protocol_name = "fleecing-filter",
+    .instance_size = 0,
+
+    .bdrv_open = fleecing_open,
+    .bdrv_close = fleecing_close,
+
+    .bdrv_getlength = fleecing_getlength,
+    .bdrv_co_preadv = fleecing_co_preadv,
+    .bdrv_co_pwritev = fleecing_co_pwritev,
+
+    .is_filter = true,

+ .bdrv_recurse_is_first_non_filter =fleecing_recurse_is_first_non_filter,

+    .bdrv_child_perm        = bdrv_filter_default_perms,

No .bdrv_co_block_status callback? That probably hurts querying forsparse regions.


hm, worth add.. and it possibly needs synchronization with backup too.

--
Best regards,
Vladimir

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-block] [PATCH v2 2/3] block/fleecing-filter: new filter driver for fleecing, Fam Zheng, 2018/07/02
- Re: [Qemu-block] [PATCH v2 2/3] block/fleecing-filter: new filter driver for fleecing, Vladimir Sementsov-Ogievskiy, 2018/07/02
- Re: [Qemu-block] [PATCH v2 2/3] block/fleecing-filter: new filter driver for fleecing, Vladimir Sementsov-Ogievskiy <=
- Re: [Qemu-block] [PATCH v2 2/3] block/fleecing-filter: new filter driver for fleecing, Vladimir Sementsov-Ogievskiy, 2018/07/02
  - Re: [Qemu-block] [PATCH v2 2/3] block/fleecing-filter: new filter driver for fleecing, Kevin Wolf, 2018/07/03
- Re: [Qemu-block] [PATCH v2 2/3] block/fleecing-filter: new filter driver for fleecing, Vladimir Sementsov-Ogievskiy, 2018/07/02
  - Re: [Qemu-block] [PATCH v2 2/3] block/fleecing-filter: new filter driver for fleecing, Kevin Wolf, 2018/07/03
    - Re: [Qemu-block] [PATCH v2 2/3] block/fleecing-filter: new filter driver for fleecing, Vladimir Sementsov-Ogievskiy, 2018/07/03
    - Re: [Qemu-block] [PATCH v2 2/3] block/fleecing-filter: new filter driver for fleecing, Vladimir Sementsov-Ogievskiy, 2018/07/03
    - Re: [Qemu-block] [PATCH v2 2/3] block/fleecing-filter: new filter driver for fleecing, Kevin Wolf, 2018/07/03
    - Re: [Qemu-block] [PATCH v2 2/3] block/fleecing-filter: new filter driver for fleecing, Max Reitz, 2018/07/04

Prev by Date: Re: [Qemu-block] [PATCH v2 2/3] block/fleecing-filter: new filter driver for fleecing
Next by Date: Re: [Qemu-block] [PATCH v5] crypto: Implement TLS Pre-Shared Keys (PSK).
Previous by thread: Re: [Qemu-block] [PATCH v2 2/3] block/fleecing-filter: new filter driver for fleecing
Next by thread: Re: [Qemu-block] [PATCH v2 2/3] block/fleecing-filter: new filter driver for fleecing
Index(es):
- Date
- Thread