qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Re: RFC v2: blockdev_add & friends, brief rationale, QM


From: Markus Armbruster
Subject: Re: [Qemu-devel] Re: RFC v2: blockdev_add & friends, brief rationale, QMP docs
Date: Fri, 18 Jun 2010 10:20:27 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/23.1 (gnu/linux)

Kevin Wolf <address@hidden> writes:

> Am 17.06.2010 15:01, schrieb Anthony Liguori:
>> On 06/17/2010 03:20 AM, Kevin Wolf wrote:
>>> Am 16.06.2010 20:07, schrieb Anthony Liguori:
>>>    
>>>>>    But it requires that
>>>>> everything that -blockdev provides is accessible with -drive, too (or
>>>>> that we're okay with users hating us).
>>>>>
>>>>>        
>>>> I'm happy for -drive to die.  I think we should support -hda and
>>>> -blockdev.
>>>>      
>>> -hda is not sufficient for most users. It doesn't provide any options.
>>> It doesn't even support virtio. If -drive is going to die (and we seem
>>> to agree all on that), then -blockdev needs to be usable for users (and
>>> it's only you who contradicts so far).
>>>    
>> 
>> I've always thought we should have a -vda argument and an -sda argument 
>> specifically for specifying virtio and scsi disks.
>
> It would at least fix the most obvious problem. However, it still
> doesn't allow passing options.
>
>>>> -blockdev should be optimized for config files, not single
>>>> argument input.  IOW:
>>>>
>>>> [blockdev "blk2"]
>>>>    format = "raw"
>>>>    file = "/path/to/base.img"
>>>>    cache = "writeback"
>>>>
>>>> [blockdev "blk1"]
>>>>     format = "qcow2"
>>>>     file = "/path/to/leaf.img"
>>>>     cache="off"
>>>>     backing_dev = "blk2"
>>>>
>>>> [device "disk1"]
>>>>     driver = "ide-drive"
>>>>     blockdev = "blk1"
>>>>     bus = "0"
>>>>     unit = "0"
>>>>      
>>> You don't specify the backing file of an image on the command line (or
>>> in the configuration file).
>> 
>> But we really ought to allow it.  Backing files are implemented as part 
>> of the core block layer, not the actual block formats.  
>
> The generic block layer knows the name of the backing file, so it can be
> displayed in tools, but that's about it. Calling this the
> "implementation" of backing files is daring.

You're both partly wrong :)

COW format drivers retrieve the "name" of the backing file from the
image on open, and store it in a well-known place in their
BlockDriverState.  qcow2 also retrieves the format.

Then generic block code opens the backing "file", and stores a pointer
to the backing BlockDriverState in the COW format's BlockDriverState.

The COW format driver then uses the backing "file" BlockDriverState
however it sees fit.

Generic code takes care of closing it, and also has a hand in commit and
encryption.

Aside: I put "name" and "file" in quotes, because the name is really an
encoding of protocol + arguments.  It works just like -drive with
file=NAME (and format=FORMAT if qcow2).  But for the COW format driver,
it's just a string (or two) it uses to get access to a backing image.
It's fine with any format + protocol combination, so it leaves that to
generic code.

Aside^2: We can define the semantics of QCOW2 however we please.  But is
it really sane to interpret a backing file name "fat:duck" we find in
some VMDK image as "vvfat over directory duck"?

> I see no use case for specifying it on the command line. The only thing
> you can achieve better with it is corrupting your image because you
> specify the wrong/no backing file next time.

Anthony has a point on the conceptual level: COW is a stacking of
BlockDriverStates.

But Kevin is right on usability.  We could permit overriding of backing
file on the command line / config file / QMP, but it would be a
dangerous expert feature, and no replacement for storing references to
backing files in the COW images.

>> Today the block 
>> layer queries the block format for the name of the backing file but gets 
>> no additional options from the block format.  File isn't necessarily 
>> enough information to successfully open the backing device so why treat 
>> it specially?

Parameters for opening a block device are flags, format, filename (which
is really an encoding of protocol + arguments).  So the only piece of
information that's missing for backing files is flags.

Flags encode -drive's parameters snapshot, cache, aio, readonly.  For
backing files, we force readonly on and snapshot off.  We inherit cache
and aio settings from the backed file.  Which of these do you want to
control separately?

>> I think we should keep the current ability to query the block format for 
>> a backing file name but we should also support hooking up the backing 
>> device without querying the block format at all.  It makes the model 
>> much more elegant IMHO because then we're just creating block devices 
>> and hooking them up.  All block devices are created equal more or less.
>> 
>>>   It's saved as part of the image. It's more
>>> like this (for a simple raw image file):
>>>
>>> [blockdev-protocol "proto1"]
>>>     protocol = "file"
>>>     file = "/path/to/image.img"
>>>
>>> [blockdev "blk1"]
>>>     format = "raw"
>>>     cache="off"
>>>     protocol = "proto1"
>>>
>>> [device "disk1"]
>>>     driver = "ide-drive"
>>>     blockdev = "blk1"
>>>     bus = "0"
>>>     unit = "0"
>>>
>>> (This would be Markus' option 3, I think)
>>>    
>> 
>> I don't understand why we need two layers of abstraction here.  Why not 
>> just:
>> 
>> [blockdev "proto1"]
>>    protocol = "file"
>>    cache = "off"
>>    file = "/path/to/image.img"
>> 
>> Why does the cache option belong with raw and not with file and why 
>> can't we just use file directly?
>
> The cache option is shared along the chain, so it probably fits best in
> the blockdev.
>
> And we don't use file directly because it's wrong. Users say that their
> image is in raw format, and they don't get why they should have to make
> a difference between a raw image stored on a block device and one stored
> in a file.
>
>> As Christoph mentions, we really don't 
>> have stacked protocols and I'm

"missing the end of my sentence"?

What we *really* don't have is a common understanding of "format" and
"protocol"!

> The only question is if we call them stacked formats or stacked
> protocols. One of them exists.

It does internally.  But until we develop a stable understanding of
"format", "protocol" and how they stack, we better avoid exposing the
stacking at external interfaces.

>>>> not sure they make sense.
>>>>      
>>> Right, if we go for Christoph's suggestion, we don't need stacked
>>> protocols. We'll have stacked formats instead. I'm not sure if you like
>>> this any better. ;-)
>>>
>>> We do have stacking today. -hda blkdebug:test.cfg:foo.qcow2 is qcow2 on
>>> blkdebug on file. We need to be able to represent this.
>>>    
>> 
>> I think we need to do stacking in a device specific way.  When you look 
>> at something like vmdk, it should actually support multiple leafs since 
>> the format does support such a thing.  So what I'd suggest is:
>> 
>> [blockdev "part1"]
>>    format = "raw"
>>    file = "image000.vmdk"
>> 
>> [blockdev "part2"]
>>    format = "raw"
>>    file = "image001.vmdk"
>> 
>> [blockdev "image"]
>>    format = "vmdk"
>>    section0 = "part1"
>>    section1 = "part2"
>
> Actually, I'd prefer to read that information from the VMDK file instead
> of requiring the user to configure this manually...
>
>> Note, we'll need to support this sort of model in order to support a 
>> disk that creates an automatic partition table (which would be a pretty 
>> useful feature). 
>
> Sounds like a good example of a useful protocol.
>
> Markus, I'm afraid we've found an equivalent to Avi's mirror. If not
> even more complicated, because we'd need to accept any length for the
> list of partitions - possibly an option that should take an array?

I wouldn't say it's more complicated than Avi's mirror.  The
zero-one-infinity rule applies again: zero inferior protocols is
simplest (no stacking), then up to one inferior protocol (protocol
stack), then any number above one (protocol tree).

[...]



reply via email to

[Prev in Thread] Current Thread [Next in Thread]