qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH VERSION 3] Disk image exclusive and shared locks


From: Anthony Liguori
Subject: Re: [Qemu-devel] [PATCH VERSION 3] Disk image exclusive and shared locks.
Date: Tue, 15 Dec 2009 12:45:13 -0600
User-agent: Thunderbird 2.0.0.23 (X11/20090825)

Richard W.M. Jones wrote:
Thanks for looking at this.

Thanks for following up :-)

The use case is "cluster filesystem with an admin tool that must be
run exclusively".  Cluster nodes open the block device for write, but
with a shared lock.  The admin tool needs exclusive access (no nodes
must be writing), so it tries to open the device with lock=exclusive.
This only succeeds if the normal cluster nodes have backed off.

This seems like a reasonable uncommon scenario and my concern is that it really hurts the interface. lock=on|off I think qualifies as "The obvious use is the correct one" whereas I think lock=exclusive|shared|none would qualify as "Read the implementation and you'll get it right."

And that's mainly due to the number of corner cases that have to be considered in order for exclusive to degrade into shared.

However, the patch would be a bit simpler if we just had lock=on|off
at the moment, and it wouldn't stop us from adding the shared case in
future.  (For my needs I don't care about the shared case).

Let's stick with lock=on|off. If anyone comes along and actually wants shared|exclusive, we can revisit the thread :-)


I really dislike this as an interface. I think we need to make a decision about whether we delay open until after migration has completed. I think unless there's a really compelling argument against it, this is probably what we should do.

I'm guessing this needs some quite major surgery to qemu so that block
device opening is delayed in the case of migration.  I can take a look
at this.

I think it's actually easier than I expected it to be.

We really just need to delay lock acquisition until it's really needed (read/write op). For probing, we can get away without having a lock held since it's read only and we only use it to decide which bdrv to use.

So what I'd expect is that we would to acquire a lock until we actually ran the vm. We would basically have a bdrv_lock_all() that could fail if some lock couldn't be acquired. We would execute that right before running a VM (which means it happens after incoming migration completes).

After we completed migration (and did an fsck), we would call bdrv_unlock_all() to drop the lock. Before we did 'cont' again, we would issue bdrv_lock_all().

As long as we do a bdrv_unlock_all() before writing the last byte of the migration stream, it's not at all racy. We also don't have to close the fd on the source which means that we maintain the same level of robustness.


As it stands, we cannot make lock=!none the default if it takes an extra monitor command to allow for live migration. I think if we're going to introduce this functionality, we probably should be enabling it by default.

Xen has a similar feature, and it is enabled by default.

And I really, really dislike it :-) There are a lot of corner cases with requiring the use of '!' to the point where we ended up just using it for everything in our management tools. IIRC, they don't use advisory locking and instead try to be clever with something similar to lsof.

Regards,

Anthony Liguori




reply via email to

[Prev in Thread] Current Thread [Next in Thread]