|
From: | Vladimir Sementsov-Ogievskiy |
Subject: | Re: [Qemu-block] [PATCH v2 2/3] block/fleecing-filter: new filter driver for fleecing |
Date: | Tue, 3 Jul 2018 19:11:40 +0300 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 |
03.07.2018 14:15, Kevin Wolf wrote:
Am 02.07.2018 um 14:09 hat Vladimir Sementsov-Ogievskiy geschrieben:29.06.2018 20:40, Eric Blake wrote:On 06/29/2018 12:30 PM, John Snow wrote:On 06/29/2018 11:15 AM, Vladimir Sementsov-Ogievskiy wrote:We need to synchronize backup job with reading from fleecing image like it was done in block/replication.c. Otherwise, the following situation is theoretically possible: 1. client start reading 2. client understand, that there is no corresponding cluster in fleecing imageI don't think the client refocuses the read, but QEMU does. (the client has no idea if it is reading from the fleecing node or the backing file.) ... but understood:3. client is going to read from backing file (i.e. active image) 4. guest writes to active imageMy question here is if QEMU will allow reads and writes to interleave in general. If the client is reading from the backing file, the active image, will QEMU allow a write to modify that data before we're done getting that data?5. this write is stopped by backup(sync=none) and cluster is copied to fleecing image 6. guest write continues... 7. and client reads _new_ (or partly new) date from active imageI can't answer this for myself one way or the other right now, but I imagine you observed a failure in practice in your test setups, which motivated this patch? A test or some proof would help justification for this patch. It would also help prove that it solves what it claims to!In fact, do we really need a new filter, or do we just need to make the "sync":"none" blockdev-backup job take the appropriate synchronization locks?How? We'll need additional mechanism like serializing requests.. Or a way to reuse serializing requests. Using backup internal synchronization looks simpler, and it is already used in block replication.But it also just an ugly hack that fixes one special case and leaves everything else broken. replication is usually not a good example for anything. It always gives me bad surprises when I have to look at it. We'll have to figure out where to fix this problem (or what it really is, once you look more than just at fleecing), but I think requiring the user to add a filter driver to work around missing serialisation in other code, and corrupting their image if they forget to, is not a reasonable solution. I see at least two things wrong in this context: * The permissions don't seem to match reality. The NBD server unconditionally shares PERM_WRITE, which is wrong in this case. The client wants to see a point-in-time snapshot that never changes. This should become an option so that it can be properly reflected in the permissions used. * Once we have proper permissions, the fleecing setup breaks down because the guest needs PERM_WRITE on the backing file, but the fleecing overlay allows that only if the NBD client allows it (which it doesn't for fleecing). Now we can implement an exception right into backup that installs a backup filter driver between source and target if the source is the backing file of the target. The filter driver would be similar to the commit filter driver in that it simply promises !PERM_WRITE to its parents, but allows PERM_WRITE on the source because it has installed the before_write_notifier that guarantees this condition. All writes to the target that are made by the backup job in this setup (including before_write_notifier writes) need to be marked as serialising so that any concurrent reads are completed first. And if we decide to add a target filter to backup, we should probably at the same time use a filter driver for intercepting source writes instead of using before_write_notifier.
Hmm, is it possible to do all the staff in one super filter driver, which we insert into the tree like this:
top blk fleecing qcow2 + + | |backing v <-----+ super filter + |file v active image And super filter do the following:1. copy-on-write, before forwarding write to file, it do serializing write to fleecing qcow2
2. fake .bdrv_child_perm for fleecing qcow2, like in block commit and no block job is needed.
Max, I think you intended to make both source and target children of the same block job node (or at least for mirror). But wouldn't that create loops in a setup like this? I think we may need two filters that are only connected through the block job, but not with block graph edges. Kevin
-- Best regards, Vladimir
[Prev in Thread] | Current Thread | [Next in Thread] |