[Qemu-devel] Moving beyond image files

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] Moving beyond image files

From:	Anthony Liguori
Subject:	[Qemu-devel] Moving beyond image files
Date:	Mon, 21 Mar 2011 10:05:20 -0500
User-agent:	Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.14) Gecko/20110223 Lightning/1.0b2 Thunderbird/3.1.8

We've been evaluating block migration in a real environment to try tounderstand what the overhead of it is compared to normal migration. Theresults so far are pretty disappointing. The speed of local disks endsup becoming a big bottleneck even before the network does.

This has got me thinking about what we could do to avoid local I/O viadeduplication and other techniques. This has led me to wonder if itstime to move beyond simple image files into something a bit moresophisticated.

Ideally, I'd want a full Content Addressable Storage database like Ventibut there are lots of performance concerns with something like that.

I've been thinking about a middle ground and am looking for somefeedback. Here's my current thinking:

1) All block I/O goes through a daemon. There may be more than onedaemon to support multi-tenancy.

2) The daemon maintains metadata for each image that includes an extentmapping and then a clustered allocated bitmap within each extent(similar to FVD).


At this point, it's basically sparse raw but through a single daemon.

3) All writes result in a sha1 being calculated before the write iscompleted. The daemon maintains a mapping of sha1's -> clusters. Asingle sha1 may map to many clusters. The sha1 mapping can be madeeventually consistent using a journal or even dirty bitmap. It can bepartially rebuilt easily.

I think this is where v1 stops. With just this level of functionality,I think we have some very interesting properties:


a) Performance should be pretty close to raw

b) Without doing any (significant) disk I/O, we know exactly what dataan image is composed of. This means we can do an rsync style imagestreaming that uses potentially much less network I/O and potentiallymuch less disk I/O.

In a v2, I think you can add some interesting features that takeadvantage of the hashing. For instance:

4) If you run out of disk space, you can looking at a hash with arefcount > 1, and split off a reference making it copy-on-write. Thenyou can treat the remaining references as free list entries.

5) Copy-on-write references potentially become very interesting forimage streaming because you can avoid any I/O for blocks that arealready stored locally.

This is not fully baked yet but I thought I'd at least throw it outthere as a topic for discussion. I think we've focused almost entirelyon single images so I think it's worth thinking a little about differentstorage models.


Regards,

Anthony Liguori

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] Moving beyond image files, Anthony Liguori <=
- Re: [Qemu-devel] Moving beyond image files, Alexander Graf, 2011/03/21
  - Re: [Qemu-devel] Moving beyond image files, Anthony Liguori, 2011/03/21
- Re: [Qemu-devel] Moving beyond image files, Stefan Hajnoczi, 2011/03/21

Prev by Date: Re: [Qemu-devel] [PATCH] target-arm/helper.c: For float-int conversion helpers pass ints as ints
Next by Date: Re: [Qemu-devel] [PATCH] [PPC] Add support for 6 SPE instructions (evmra, evmwsmi{a{a}}, evmwumi{a{a}})
Previous by thread: [Qemu-devel] [PULL #7 0/7] virtio-serial fixes, enhancements
Next by thread: Re: [Qemu-devel] Moving beyond image files
Index(es):
- Date
- Thread