qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Re: [patch 2/3] Add support for live block copy


From: Avi Kivity
Subject: Re: [Qemu-devel] Re: [patch 2/3] Add support for live block copy
Date: Mon, 28 Feb 2011 10:38:18 +0200
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.13) Gecko/20101209 Fedora/3.1.7-0.35.b3pre.fc14 Lightning/1.0b3pre Thunderbird/3.1.7

On 02/27/2011 07:41 PM, Anthony Liguori wrote:

I agree 100% the management tool cannot be the authoritative source of state.

My position is:
- the management tool should be 100% in control of configuration (how the guest is put together from its components) - qemu should be 100% in control of state (memory, disk state, NVRAM in various components, cd-rom eject state, explosive bolts for payload separation, self-destruct mechanism, etc.)


There simply is not such a clean separation between the two because things that the guest does affects the configuration of the guest.

Hot plug,

I don't think hotunplug works this way. When the guest ejects the pci or usb device, it simply stops working with the device and disconnects the power. There is nothing non-volatile going on, no spring-loaded lever that pushes the device out. If the server reboots immediately after hotunplug, but before the user physically removes the device, then the server will see the device when it boots up.

removable media eject,

Here, we do have a single bit of non-volatile storage.

persistent device settings (whether it's CMOS or EEPROM) all disrupt this model.

These are just arrays of bits, most of them with no standard interpretation. So a block device fits them perfectly.


If you really wanted to have this separation, you'd have to be very strict about making all guest settings not be specified in config. You would need to do:

qemu-img create -f e1000-eprom -o macaddr=12:23:45:67:78:90 e1000.0.rom
qemu-img create -f e1000-eprom -o macaddr=12:23:45:67:78:91 e1000.1.rom

qemu -device e1000,id=e1000.0,eeprom=e1000.0.rom -device e1000,id=e1000.1,eeprom=e1000.1.rom

And now I need a tool that lets me modify e1000-eprom images if I want to change the mac address dynamically (say I'm trying to clone a VM).

This type of model can be workable but as I said earlier, I think it's overengineering the problem.

In fact I don't think anyone wants this. Usually management wants the assigned MAC to be used without the guest playing games with it. So it's more or less pointless however it's implemented.


We don't separate configuration from guest state today. Instead of setting ourselves up for failure by setting an unrealistic standard that we try to achieve and never do, let's embrace the system that is working for us today. We are authoritative for everything and guest state is intimately tied to the virtual machine configuration.

"we are authoritative for everything" is a clean break from everything that's being done today. It's also a clean break from the model of central management plus database. We can't force it on people.

Non-volatile state is not intimately tied to configuration. We store block device state completely outside the configuration. What's left is the CD-ROM tray, CMOS memory, and network card EEPROM. We could argue back and forth about where exactly they belong, but they aren't really worth the conversation since they are meaningless for real-life use.



But beyond those races, QEMU is the only entity that knows with certainty what bits of information are important to persist in order to preserve a guest across shutdown/restart. The fact that we've punted this problem for so long has only ensured that management tools are either intrinsically broken or only support the most minimal subset of functionality we actually support.

I'm not arguing about that. I just want to stress again the difference between state and configuration. Qemu has no authority, in my mind, as to configuration. Only state.

Being the one that creates a guest based on configuration, I would say that we most certainly do.

That is not what being authoritative means.

In a virt-manager deployment, libvirt is the authoritative source of guest configuration. In a RHEV-M deployment, the RHEV-M database is the authoritative source of guest configuration. You can completely replace the host machine and your guest will recreate just fine as long as the authoritative source is intact.


Currently they contain the required guest configuration, a representation of what's the current live configuration, and they issue monitor commands to move the live configuration towards the required configuration (or just generate a qemu command line). What you're describing is completely different, I'm not even sure what it is.

Management tools shouldn't have to think about how the monitor commands they issue impact the invocation options of QEMU.

They have to, when creating a guest from scratch.

But I admit, this throws a new light (for me) on things. What's the implications? - must have a qemu instance running when editing configuration, even when the guest is down

QMP is an API. Whether a qemu instance is launched is an implementation detail. This could all be hidden completely with libqmp.

QMP is first and foremost a protocol.


- cannot add additional information to configuration; must store it in an external database and cross-reference it with the qemu data using the device ID

Don't confuse a management tool's notion of configuration with QEMU's configuration.

A management tools config is used to initially create and then manipulate an existing guest. If the management tool supports out-of-band manipulation of a configuration file, then it needs to determine how the configuration file changed and execute the appropriate commands.

I wasn't talking about that. I was talking about data that is meaningful to a user but not meaningful to qemu. That sort of data doesn't store well if qemu is the authoritative source.

Yes, it is. libvirt kind of cheats here and just deletes the old VM and creates a new one when editing the XML IIUC.

- no transactions/queries/etc except on non-authoritative source
- issues with shared-nothing design (well, can store the configuration file using DRBD).

In both cases, today a management tool races with QEMU so both of these points are currently true.

No, it doesn't. If the guest ejects a network card, the network card is still there. Queries against the database still return correct results.


If you look at management tools, they believe they are the authoritative source of configuration information (not guest state, which is more or less ignored).

It's because we've given them no other option.

It's the natural way of doing it. You have a web interface that talks to a database. When you want to list all VMs that have network cards on the production subnet, you issue a database query and get a recordset. How do you do that when the authoritative source of information is spread across a cluster?

This problem still exists today. A guest can eject a network card on it's own (without the management tool issuing a device_del command). QEMU will delete the NIC when this happens.

I think that's a bug.

The same is true with CDROM eject.

CDROM tray position is state, not configuration.


Management tools are simply not authoritative today.

Regards,

Anthony Liguori


--
error compiling committee.c: too many arguments to function




reply via email to

[Prev in Thread] Current Thread [Next in Thread]