[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] Block Filters
From: |
Fam Zheng |
Subject: |
Re: [Qemu-devel] Block Filters |
Date: |
Fri, 6 Sep 2013 15:56:06 +0800 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
On Tue, 09/03 18:24, Benoît Canet wrote:
>
> Hello list,
>
> I am thinking about QEMU block filters lately.
>
> I am not a block.c/blockdev.c expert so tell me what you think of the
> following.
>
> The use cases I see would be:
>
> -$user want to have some real cryptography on top of qcow2/qed or another
> format.
> snapshots and other block features should continue to work
>
> -$user want to use a raid like feature like QUORUM in QEMU.
> other features should continue to work
>
> -$user want to use the future SSD deduplication implementation with metadata
> on
> SSD and data on spinning disks.
> other features should continue to work
>
> -$user want to I/O throttle one drive of his vm.
>
> -$user want to do Copy On Read
>
> -$user want to do a combination of the above
>
> -$developer want to make the minimum of required steps to keep changes small
>
> -$developer want to keep user interface changes for later
>
> Lets take a example case of an user wanting to do I/O throttled encrypted
> QUORUM
> on top of QCOW2.
>
> Assuming we want to implement throttle and encryption as something remotely
> being like a block filter this makes a pretty complex BlockDriverState tree.
>
> The tree would look like the following:
>
> I/O throttling BlockDriverState (bs)
> |
> |
> |
> |
> Encryption BlockDriverState (bs)
> |
> |
> |
> |
> Quorum BlockDriverState (bs)
> / | \
> / | \
> / | \
> / | \
> QCOW2 bs QCOW2 b s QCOW2 bs
> | | |
> | | |
> | | |
> | | |
> RAW bs RAW bs RAW bs
>
> An external snapshot should result in a tree like the following.
> I/O throttling BlockDriverState (bs)
> |
> |
> |
> |
> Encryption BlockDriverState (bs)
> |
> |
> |
> |
> Quorum BlockDriverState (bs)
> / | \
> / | \
> / | \
> / | \
> QCOW2 bs QCOW2 bs QCOW2 bs
> | | |
> | | |
> | | |
> | | |
> QCOW2 bs QCOW2 bs QCOW2 bs
> | | |
> | | |
> | | |
> | | |
> RAW bs RAW bs RAW bs
>
> In the current state of QEMU we can code some block drivers to implement this
> tree.
>
> However when doing operations like snapshots blockdev.c would have no real
> idea
> of what should be snapshotted and how. (The 3 top bs should be kept on top)
>
> Moreover it would have no way to manipulate easily this tree of
> BlockDriverState
> has each one is encapsulated in it's parent.
>
> Also there no generic way to tell the block layer that two or more
> BlockDriverState
> are siblings.
>
> The current mail is here to propose some additionals structures in order to
> cope
> with these problems.
>
> The overall strategy of the proposed structures is to push out the
> BlockDriverStates relationships out of each BlockDriverState.
>
> The idea is that it would make it easier for the block layer to manipulate a
> well known structure instead of being forced to enter into each
> BlockDriverState
> specificity.
>
> The first structure is the BlockStackNode.
>
> The BlockStateNode would be used to represent the relationship between the
> various BlockDriverStates
>
> struct BlockStackNode {
> BlockDriverState *bs; /* the BlockDriverState holded by this node */
>
> /* this doubly linked list entry points to the child node and the parent
> * node
> */
> QLIST_ENTRY(BlockStateNode) down;
>
> /* This doubly linked list entry point to the siblings of this node
> */
> QLIST_ENTRY(BlockStateNode) siblings;
>
> /* a hash or an array of the sibbling of this node for fast access
> * should be recomputed when updating the tree */
> QHASH_ENTRY<BlockStateNode, index> sibblings_hash;
> }
>
> The BlockBackend would be the structure used to hold the "drive" the guest
> use.
>
> struct BlockBackend {
> /* the following doubly linked list header point to the top BlockStackNode
> * in our case it's the one containing the I/O throttling bs
> */
> QLIST_HEAD(, BlockStateNode) block_stack_head;
> /* this is a pointer to the topest node below the block filter chain
> * in our case the first QCOW2 sibling
> */
> BlockStackNode *top_node_below_filters;
> }
>
>
> Updated diagram:
>
> (Here bsn means BlockStacknode)
>
> ------------------------BlockBackend
> | |
> | block_stack_head
> | |
> | |
> | I/O throttling BlockStackNode (contains it's bs)
> | |
> | down
> | |
> | |
> top_node_below_filter Encryption BlockStacknode (contains it's bs)
> | |
> | down
> | |
> | |
> | Quorum BlockStackNode (contain's it's bs)
> | /
> | down
> | /
> | / S S
> ------ QCOW2 bsn--i---QCOW2 bsn--i------ QCOW2 bsn (each bsn contains a
> bs)
> | b | b |
> down l down l down
> | i | i |
> | n | n |
> | g | g |
> | s | s |
> | | |
> RAW bsn RAW bsn RAW bsn (each bsn contains a
> bs)
>
>
> Block driver point of view:
>
> to construct the tree each BlockDriver would have some utility functions
> looking
> like.
>
> bdrv_register_child_bs(bs, child_bs, int index);
>
> multiples calls to this function could be done to register multiple siblings
> childs identified by their index.
>
> This way something like quorum could register multiple QCOW2 instances.
>
> driver would have a
> BlockDriverSTate *bdrv_access_child(bs, int index);
>
> to access their childs.
>
> These functions can be implemented without the driver knowing about
> BlockStateNodes using container_of.
>
> blockdev point of view: (here I need your help)
>
> When doing a snapshot blockdev.c would access
> BlockBackend->top_node_below_filter and make a snapshot of the bs contained in
> this node and it's sibblings.
>
Since BlockDriver.bdrv_snapshot_create() is an optional operation, blockdev.c
can navigate down the tree from top node, until hitting some layer where the op
is implemented (the QCow2 bs), so we get rid of this top_node_below_filter
pointer.
Is this the only use case of top_node_below_filter?
Fam
> After each individual snapshot the linked lists and the hash/arrays would be
> updated to point to the new top bsn.
> The snapshot operation can be done without violating any of the top block
> filter BlockDriverState.
>
> What do you think of this idea ?
> How this would fit in block.c/blockdev.c ?
>
> Best regards
>
> Benoît
- [Qemu-devel] Block Filters, Benoît Canet, 2013/09/03
- Re: [Qemu-devel] Block Filters, Stefan Hajnoczi, 2013/09/04
- Re: [Qemu-devel] Block Filters, Benoît Canet, 2013/09/04
- Re: [Qemu-devel] Block Filters, Kevin Wolf, 2013/09/05
- Re: [Qemu-devel] Block Filters, Stefan Hajnoczi, 2013/09/05
- Re: [Qemu-devel] Block Filters, Fam Zheng, 2013/09/05
- Re: [Qemu-devel] Block Filters, Stefan Hajnoczi, 2013/09/05
- Re: [Qemu-devel] Block Filters, Benoît Canet, 2013/09/05
- Re: [Qemu-devel] Block Filters, Kevin Wolf, 2013/09/06
- Re: [Qemu-devel] Block Filters, Fam Zheng, 2013/09/06
Re: [Qemu-devel] Block Filters,
Fam Zheng <=
- Re: [Qemu-devel] Block Filters, Kevin Wolf, 2013/09/06
- Re: [Qemu-devel] Block Filters, Fam Zheng, 2013/09/06
- Re: [Qemu-devel] Block Filters, Kevin Wolf, 2013/09/06
- Re: [Qemu-devel] Block Filters, Fam Zheng, 2013/09/06
- Re: [Qemu-devel] Block Filters, Benoît Canet, 2013/09/15
- Re: [Qemu-devel] Block Filters, Fam Zheng, 2013/09/16
- Re: [Qemu-devel] Block Filters, Benoît Canet, 2013/09/16
- Re: [Qemu-devel] Block Filters, Benoît Canet, 2013/09/16