qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] Block Filters


From: Benoît Canet
Subject: [Qemu-devel] Block Filters
Date: Tue, 3 Sep 2013 18:24:49 +0200
User-agent: Mutt/1.5.21 (2010-09-15)

Hello list,

I am thinking about QEMU block filters lately.

I am not a block.c/blockdev.c expert so tell me what you think of the following.

The use cases I see would be:

-$user want to have some real cryptography on top of qcow2/qed or another
format.
 snapshots and other block features should continue to work

-$user want to use a raid like feature like QUORUM in QEMU.
 other features should continue to work

-$user want to use the future SSD deduplication implementation with metadata on
SSD and data on spinning disks.
 other features should continue to work

-$user want to I/O throttle one drive of his vm.

-$user want to do Copy On Read

-$user want to do a combination of the above

-$developer want to make the minimum of required steps to keep changes small

-$developer want to keep user interface changes for later

Lets take a example case of an user wanting to do I/O throttled encrypted QUORUM
on top of QCOW2.

Assuming we want to implement throttle and encryption as something remotely
being like a block filter this makes a pretty complex BlockDriverState tree.

The tree would look like the following:

                    I/O throttling BlockDriverState (bs)
                               |
                               |
                               |
                               |
                    Encryption BlockDriverState (bs)
                               |
                               |
                               |
                               |
                    Quorum BlockDriverState (bs)
                   /           |           \
                  /            |            \
                 /             |             \
                /              |              \
            QCOW2 bs       QCOW2 b s       QCOW2 bs
               |               |               |
               |               |               |
               |               |               |
               |               |               |
            RAW bs         RAW bs           RAW bs

An external snapshot should result in a tree like the following.
                    I/O throttling BlockDriverState (bs)
                               |
                               |
                               |
                               |
                    Encryption BlockDriverState (bs)
                               |
                               |
                               |
                               |
                    Quorum BlockDriverState (bs)
                   /           |           \
                  /            |            \
                 /             |             \
                /              |              \
            QCOW2 bs       QCOW2 bs         QCOW2 bs
               |               |               |
               |               |               |
               |               |               |
               |               |               |
            QCOW2 bs       QCOW2 bs         QCOW2 bs
               |               |               |
               |               |               |
               |               |               |
               |               |               |
            RAW bs         RAW bs           RAW bs

In the current state of QEMU we can code some block drivers to implement this
tree.

However when doing operations like snapshots blockdev.c would have no real idea
of what should be snapshotted and how. (The 3 top bs should be kept on top)

Moreover it would have no way to manipulate easily this tree of BlockDriverState
has each one is encapsulated in it's parent.

Also there no generic way to tell the block layer that two or more 
BlockDriverState
are siblings.

The current mail is here to propose some additionals structures in order to cope
with these problems.

The overall strategy of the proposed structures is to push out the
BlockDriverStates relationships out of each BlockDriverState.

The idea is that it would make it easier for the block layer to manipulate a
well known structure instead of being forced to enter into each BlockDriverState
specificity.

The first structure is the BlockStackNode.

The BlockStateNode would be used to represent the relationship between the
various BlockDriverStates

struct BlockStackNode {
    BlockDriverState *bs;  /* the BlockDriverState holded by this node */

    /* this doubly linked list entry points to the child node and the parent
     * node
     */
    QLIST_ENTRY(BlockStateNode) down;

    /* This doubly linked list entry point to the siblings of this node
     */
    QLIST_ENTRY(BlockStateNode) siblings;

    /* a hash or an array of the sibbling of this node for fast access
     * should be recomputed when updating the tree */
    QHASH_ENTRY<BlockStateNode, index> sibblings_hash;
}

The BlockBackend would be the structure used to hold the "drive" the guest use.

struct BlockBackend {
    /* the following doubly linked list header point to the top BlockStackNode
     * in our case it's the one containing the I/O throttling bs
     */
    QLIST_HEAD(, BlockStateNode) block_stack_head;
    /* this is a pointer to the topest node below the block filter chain
     * in our case the first QCOW2 sibling
     */
    BlockStackNode *top_node_below_filters;
}


Updated diagram:

(Here bsn means BlockStacknode)

    ------------------------BlockBackend
    |                             |
    |                          block_stack_head
    |                             |
    |                             |
    |                       I/O throttling BlockStackNode (contains it's bs)
    |                             |
    |                            down
    |                             |
    |                             |
top_node_below_filter     Encryption BlockStacknode (contains it's bs)
    |                             |
    |                            down
    |                             |
    |                             |
    |                Quorum BlockStackNode (contain's it's bs)
    |               /
    |             down
    |             /               
    |            /     S              S
    ------  QCOW2 bsn--i---QCOW2 bsn--i------ QCOW2 bsn (each bsn contains a bs)
               |       b       |      b         |
             down      l      down    l        down
               |       i       |      i         |
               |       n       |      n         |
               |       g       |      g         |
               |       s       |      s         |
               |               |                |
            RAW bsn         RAW bsn           RAW bsn  (each bsn contains a bs)


Block driver point of view:

to construct the tree each BlockDriver would have some utility functions looking
like.

bdrv_register_child_bs(bs, child_bs, int index);

multiples calls to this function could be done to register multiple siblings
childs identified by their index.

This way something like quorum could register multiple QCOW2 instances.

driver would have a
BlockDriverSTate *bdrv_access_child(bs, int index);

to access their childs.

These functions can be implemented without the driver knowing about
BlockStateNodes using container_of.

blockdev point of view: (here I need your help)

When doing a snapshot blockdev.c would access
BlockBackend->top_node_below_filter and make a snapshot of the bs contained in
this node and it's sibblings.

After each individual snapshot the linked lists and the hash/arrays would be
updated to point to the new top bsn.
The snapshot operation can be done without violating any of the top block
filter BlockDriverState.

What do you think of this idea ?
How this would fit in block.c/blockdev.c ?

Best regards

Benoît



reply via email to

[Prev in Thread] Current Thread [Next in Thread]