qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Block layer roadmap on wiki


From: Stefan Hajnoczi
Subject: Re: [Qemu-devel] Block layer roadmap on wiki
Date: Tue, 23 Aug 2011 13:21:25 +0100

On Tue, Aug 23, 2011 at 12:25 PM, Kevin Wolf <address@hidden> wrote:
> Am 22.08.2011 23:01, schrieb Anthony Liguori:
>> On 08/22/2011 03:48 PM, Ryan Harper wrote:
>>> * Stefan Hajnoczi<address@hidden>  [2011-08-22 15:32]:
>>>> We wouldn't rm -rf block/* because we still need qemu-nbd.  It
>>>> probably makes sense to keep what we have today.  I'm talking more
>>>> about a shift from writing our own image format to integrating
>>>> existing storage support.
>>>
>>> I think this is a key point.  While I do like the idea of keeping QEMU
>>> focused on single VM, I think we don't help ourselves by not consuming
>>> the hypervisor platform services and integrating/exploiting those
>>> features to make using QEMU easier.
>>
>> Let's avoid the h-word here as it's not terribly relevant to the discussion.
>>
>> Configuring block devices is fundamentally a privileged operation.  QEMU
>> fundamentally is designed to be useful as an unprivileged user.
>>
>> That's the trouble with something like LVM.  Only root can create LVM
>> snapshots and it's an all-or-nothing security model.
>>
>> If you want to get QEMU out of the snapshot business, you need a file
>> system that's widely available that allows non-privileged users to take
>> snapshots of individual files.
>
> I agree with you there (and it's interesting how different perception of
> the BoF results can be ;-))
>
> It's probably true that there are ways to do certain things on host
> block devices and we should definitely support such use cases better
> (where we means mostly the management layer, but we can possibly
> integrate things into qemu like a file-btrfs or lvm_device backend that
> supports snapshots or something).
>
> It isn't for everyone, though, and this is why I tried to point out in
> the BoF that image formats aren't going to go away and we still need
> good support for them. Providing only raw for running VMs and declaring
> the rest of the formats to be intended for import/export only doesn't work.

I have said that block/*.c doesn't go away.  But we need to look at
exploiting storage features rather than reinventing them.

Snapshots are an example: we do not have a scalable snapshot mechanism
in QEMU.  External snapshots are inefficient when you build up
multiple levels (due to having to follow the backing file chain) and
when you delete a snapshot (due to copying data back into the backing
file).  Internal snapshots in qcow2 involve operations that traverse
the image metadata.  This traversal becomes a problem when image files
grow large (e.g. 1 TB and beyond) because the I/O required can take
more than 1 second which is problematic for taking snapshots while the
VM is running.

There are known ways of doing better internal snapshots along the
lines of what ZFS, btrfs, and thin-dev do.  But that means redesigning
the image metadata and reimplementing these storage systems in
userspace.

What I'm suggesting is that we draw the line here.  Keep what we've
got and continue the optimizations that we have in the pipeline.  But
when we hit significant new features, work with existing storage
systems.  Why?  Because we need to support existing storage anyway and
therefore reinventing our own is not a good use of resources.

Stefan



reply via email to

[Prev in Thread] Current Thread [Next in Thread]