[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-block] KVM Forum block no[td]es
From: |
Denis V . Lunev |
Subject: |
Re: [Qemu-block] KVM Forum block no[td]es |
Date: |
Fri, 16 Nov 2018 07:47:57 +0000 |
On 11/12/18 1:25 AM, Max Reitz wrote:
> This is what I’ve taken from two or three BoF-like get-togethers on
> blocky things. Amendments are more than welcome, of course.
>
>
>
> Permission system
> =================
>
> GRAPH_MOD
> ---------
>
> We need some way for the commit job to prevent graph changes on its
> chain while it is running. Our current blocker doesn’t do the job,
> however. What to do?
>
> - We have no idea how to make a *permission* work. Maybe the biggest
> problem is that it just doesn’t work as a permission, because the
> commit job doesn’t own the BdrvChildren that would need to be
> blocked (namely the @backing BdrvChild).
>
> - A property of BdrvChild that can be set by a non-parent seems more
> feasible, e.g. a counter where changing the child is possible only
> if the counter is 0. This also actually makes sense in what it
> means.
> (We never quite knew what “taking the GRAPH_PERMISSION” or
> “unsharing the GRPAH_MOD permission” was supposed to mean. Figuring
> that out always took like half an our in any face-to-face meeting,
> and then we decided it was pretty much useless for any case we had
> at hand.)
>
>
> Reopen
> ------
>
> How should permissions be handled while the reopen is under way?
> Maybe we should take the union of @perm before and after, and the
> intersection of @shared before and after?
>
> - Taking permissions is a transaction that can fail. Reopen, too, is
> a transaction, and we want to go from the intermediate to the final
> permissions in reopen’s commit part, so that transition is not
> allowed to fail.
> Since with the above model we would only relax things during that
> transition (relinquishing bits from @perm and adding bits to
> @shared), this transition should in theory be possible without any
> failure. However, in practice things are different, as permission
> changes with file-posix nodes imply lock changes on the filesystem
> -- which may always fail. Arguably failures from changing the
> file-posix locks can be ignored, because that just means that the
> file claims more permissions to be taken and less to be shared than
> is actually the case. Which means you may not be able to open the
> file in some other application, while you should be, but that’s the
> benign kind of error. You won’t be able to access data in a way
> you shouldn’t be able to.
> - Note that we have this issue already, so in general dropping
> permissions sometimes aborts because code assumes that dropping
> permissions is always safe and can never result in an error. It
> seems best to ignore such protocol layer errors in the generic
> block layer rather than handling this in every protocol driver
> itself.
> (The block layer should discard errors from dropping permissions
> on the protocol layer.)
>
> - Is it possible that changing an option may require taking an
> intermediate permission that is required neither before nor after
> the reopen process?
> Changing a child link comes to mind (like changing a child from one
> BDS to another, where the visible data changes, which would mean we
> may want to e.g. unshare CONSISTENT_READ during the reopen).
> However:
> 1. It is unfeasible to unshare that for all child changes.
> Effectively everything requires CONSISTENT_READ, and for good
> reason.
> 2. Why would a user even change a BDS to something of a different
> content?
> 3. Anything that currently allows you to change a child node assumes
> that the user always changes it to something of the same content
> (some take extra care to verify this, like mirror, which makes
> sure that @replaces and the target are connected, and there are
> only filter nodes in between).
> Always using the same enforcing model as mirror does (no. 3 above)
> does not really work, though, because one use case is to copy a
> backing file offline to some different storage and then replace the
> files via QMP. To qemu, both files are completely unrelated.
>
>
> Block jobs, including blockdev-copy
> ===================================
>
> Example for use of the fleecing filter:
> - The real target is on slow storage. Put an overlay on fast storage
> on top of it. Then use that overlay as the target of the fleecing
> filter (and commit the data later or on the side), so that the
> backup job does not slow down the guest.
>
> For a unified copy job, having a backup/fleecing filter is not a
> problem on the way. One thing we definitely have to and can do is to
> copy common functionality into a shared file so that the different
> jobs can at least share that.
>
> COR/Stream:
> - There should be a way to discard ranges that have been copied into
> the overlay from the backing files to save space
> - Also, the COR filter should integrated with the stream job (at some
> point, as always)
>
> Hole punching with active commit:
> - Putting data into the target and punching holes in the overlays to
> make it visible on the active disk may be reasonable for some, but
> not for others -- it should be an option. You want this if saving
> space is important, but you may not want this if speed is more
> important (depends on your backing chain length and other factors
> then, but that’s your choice).
>
> - Another thing: If we don’t need to punch any holes because the
> intermediate layers aren’t allocated anyway, we don’t need to write
> the data into the active disk either. This can probably be done
> indiscriminately, because the check for this does not concern the
> protocol layer but only qemu-controlled metadata, so it should be
> deterministically fast (want_zero=false).
>
>
> qcow2
> =====
>
> Recovering corrupt images:
> - Salvaging qemu-img convert would help (one that doesn’t abort
> everything on encountering a single I/O error)
> - We may want to add an in-sync L1 table copy to recover from the
> worst kinds of corruptions. Checksumming would be a good idea
> (then), too.
> - Should we update the checksum every time? If it’s just the sum of
> all L1 entry values, why not, doing the update is trivial then and
> does not involve looking at any but the entries modified.
>
> Online check:
> - This would need to be a block job
> - The check function would probably need to be a proper coroutine
> (that does not just lock everything)
> - Would be very complicated if you wanted it to work on R/W images.
> It’s probably the best to focus on making this work for read-only
> images, because you can always just put a temporary snapshot over
> the image for the time of the test and then commit it down after the
> check is done.
>
>
> Bitmaps
> =======
>
> (Got this section from sneaking into a BoF I wasn’t invited to. Oh
> well. Won’t hurt to include them here.)
>
> Currently, when dirty bitmaps are loaded, all IN_USE bitmaps are just
> not loaded at all and completely ignored. That isn’t correct, though,
> they should either still be loaded (and automatically treated and
> written back as fully dirty), or at least qemu-img check should
> “repair” them (i.e. fully dirtying them).
>
> Sometimes qemu (running in a mode as bare as possible) is better than
> using qemu-img convert, for instance. It gives you more control
> (through QMP; you get e.g. better progress reporting), you get all of
> the mirror optimizations (we do have optimizations for convert, too,
> but whether it’s any good to write the same (or different?)
> optimizations twice is another question), and you get a common
> interface for everything (online and offline).
> Note that besides a bare qemu we’ve also always wanted to convert as
> many qemu-img operations into frontends for block jobs as possible.
> We have only done this for commit, however, even though convert looked
> like basically the ideal target. It was just too hard with too little
> apparent gain, like always (and convert supports additional features
> like concatenation which we don’t have in the runtime block layer
> yet).
>
> Someone (not that someone™, but actually some specific someone) is
> about to make qemu-img info display the list of persistent bitmaps.
> Potential reviewers should be aware of the fact that this should be
> done bye adding that information to ImageInfoSpecificQCow2.
/me
Den
- Re: [Qemu-block] KVM Forum block no[td]es, (continued)
- Re: [Qemu-block] KVM Forum block no[td]es, Kevin Wolf, 2018/11/16
- Re: [Qemu-block] KVM Forum block no[td]es, Max Reitz, 2018/11/16
- Re: [Qemu-block] KVM Forum block no[td]es, Kevin Wolf, 2018/11/16
- Re: [Qemu-block] KVM Forum block no[td]es, Max Reitz, 2018/11/16
- Re: [Qemu-block] KVM Forum block no[td]es, Kevin Wolf, 2018/11/16
- Re: [Qemu-block] KVM Forum block no[td]es, Max Reitz, 2018/11/16
- Re: [Qemu-block] KVM Forum block no[td]es, Alberto Garcia, 2018/11/16
- Re: [Qemu-block] KVM Forum block no[td]es, Alberto Garcia, 2018/11/19
- Re: [Qemu-block] KVM Forum block no[td]es, Max Reitz, 2018/11/16
- Re: [Qemu-block] KVM Forum block no[td]es, Alberto Garcia, 2018/11/16
Re: [Qemu-block] KVM Forum block no[td]es,
Denis V . Lunev <=