[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: qemu-img convert vs writing another copy tool
Re: qemu-img convert vs writing another copy tool
Thu, 23 Jan 2020 13:21:28 -0600
Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.4.1
On 1/23/20 12:35 PM, Richard W.M. Jones wrote:
I guess some people are aware that virt-v2v, which is a tool which
converts guests from VMware to run on KVM, and some other
OpenStack-OpenStack migration tools we have, use "qemu-img convert" to
copy the data around.
Historically we've had bugs here. The most recent was discussed in
the thread on this list called "Bug? qemu-img convert to preallocated
image makes it sparse"
We've been kicking around the idea of writing some alternate tool. My
proposal would be a tool (not yet written, maybe it will never be
written) called nbdcp for copying between NBD servers and local files.
An outline manual page for this proposed tool is attached.
Some of the things which this tool might do which qemu-img convert
cannot do right now:
- Hint that the target already contains zeroes. It's almost always
the case that we know this, but we cannot tell qemu. This was the
cause of a big performance regression last year.
This has just recently been proposed:
I'm also working on a patch that I will post soon that extends the NBD
protocol to advertise this information (it will help the situation where
the destination is NBD, but as that requires a new enough server to
advertise the information, having the feature as a command-line option
allows the same speedup even without the server supporting the extension).
- Declare that we want the target to be either sparse or
preallocated. qemu-img convert can sort of do this in a
round-about way (create the target in advance and use the -n
option), but also it's broken at the moment.
- NBD multi-conn. In my tests this makes a really massive
performance difference in certain situations. Again, virt-v2v has
a lot of information that we cannot pass to qemu: we know, for
example, exactly if the server supports the feature, how many
threads are available, in some situations even have information
about the network and backing disks that the data will travel over
/ be stored on.
Multi-conn for reading the source allows better parallelism. Multi-conn
for writing is a bit trickier - it should be safe if the different
connections are only touching distinct segments of the export (no
overlaps), but as qemu does not advertise multiconn in such situations,
you may still need a command-line switch to force multiple writers in
spite of the server not advertising it. Here, I'm not aware of anyone
with patches underway, but I also think it would be a good ground for
- Machine-parsable progress bars. You can, sort of, parse the
progress bar from qemu-img convert, but it's not as easy as it
could be. In particular it would be nice if the format was treated
as ABI, and if there was a way to have the tool write the progress
bar info to a precreated file descriptor.
Would be nice, but I'm not aware of anyone currently planning to add it.
- External block lists. This is a rather obscure requirement, but
it's necessary in the case where we can get the allocated block map
from another source (eg. pyvmomi) and then want to use that with an
NBD source that does not support extents (eg. nbdkit-ssh-plugin /
libssh / sftp). [Having said that, it may be possible to implement
this as an nbdkit filter, so maybe this is not a blocking feature.]
How are you intending to use this? I'm guessing you have some way of
feeding in information to qemu-img of which portions of the source image
you want to copy, and ignore remaining portions.
Note that it IS already possible to use qemu's copy-on-read feature as a
way to copy only a subset of a source file over to a destination file.
When demonstrating incremental backup, I wrote this shell function:
if test $# -lt 2 || test $# -gt 3; then
echo 'usage: copyif src dst [bitmap]'
if test -z "$3"; then
map_from="-f raw nbd://localhost:10809/$1"
$qemu_img info -f raw nbd://localhost:10809/$1 || return
$qemu_img info -f qcow2 $2 || return
$qemu_img rebase -u -f qcow2 -F raw -b nbd://localhost:10809/$1 $2
while read line; do
[[ $line =~ .*start.:.([0-9]*).*length.:.([0-9]*).*data.:.$state.* ]]
echo " $start $len:"
qemu-io -C -c "r $start $len" -f qcow2 $2
done < <($qemu_img map --output=json $map_from)
$qemu_img rebase -u -f qcow2 -b '' $2
if test $ret = 0; then echo 'Success!'; fi
The key lines here are 'qemu-io -C -c "r $start $len" -f qcow2 $2',
which is performed in a loop to read just targetted portions of the
destination qcow2 file with copy-on-read set to pull in that portion
from its backing file, and '<($qemu_img map --output=json $map_from)'
which was used to derive the extent map driving which portions of the
file to read.
We also have 'qemu-img dd' that can copy subsets of a file, although it
is not currently the ideal interface, and probably needs to be enhanced
(I have a branch where I had tried working on patches for it, but where
the feedback was that we want the improvements to be more generic, or
even teach 'qemu-img convert' to support offsets the way 'qemu-img dd'
tries to; I'd need to revisit that branch...)
One thing which qemu-img convert can do which nbdcp could not:
- Read or write from qcow2 files.
Although you could still couple things together: nbdcp for new features
plus qemu-nbd to drive an NBD wrapper around qcow2 (as source or as
So instead of splitting the ecosystem and writing a new tool that
doesn't do as much as qemu-img convert, I wonder what qemu developers
think about the above missing features? For example, are they in
scope for qemu-img convert?
I could see all of these being viable additions to qemu-img, but also
wonder if writing nbdcp would get those features available in a faster
nbdcp [-a|--target-allocation allocated|sparse]
These make sense for any qemu-img format.
[-m|--multi-conn <n>] [-M|--multi-conn-target <n>]
These might make more sense as tunables for how to set up NBD client
(destination) or server (source), rather than directly as qemu-img
options. That is, I could imagine that we'd use qemu-img
--image-format, and then expose new blockdev-style knobs for setting up
the NBD endpoint to enable multiconn usage of that endpoint.
[-p|--progress-bar] [-S|--sparse-detect <n>]
[-T|--threads <n>] [-z|--target-is-zero]
And these options also seem like they are useful to qemu-img proper.
This program cannot: copy from file to file (use cp(1) or dd(1)), copy
to or from formats other than raw (use qemu-img(1) convert), or access
servers other than NBD servers (also use qemu-img(1)).
Again, depending on how we want to mix-and-match things, using qemu-nbd
to create the NBD endpoint for the nbdcp source or destination may be
worthwhile (which is different than directly using qemu-img); we'd want
some decent examples of building such chains between tools. Or it could
help us decide whether we can cut out some overhead by consolidating
typical uses into one tool rather than requiring convoluted chains.
Load the list of extents from an external file. nbdcp considers
this to be the truth for source extents. The file should contain
one record per line in the same format as nbdkit-sh-plugin(1), ie:
offset length type
with "offset" and "length" in bytes, and the "type" field being a
comma-separated list of the words "hole" and "zero". For example:
1M 9M hole,zero
Could we also teach this to parse 'qemu-img map --output=json' format?
And/or add 'qemu-img map --output=XYZ' (different from the current
--output=human') that gives sufficient information? (Note:
--output=human is NOT suitable for extent lists - it intentionally
outputs only the data portions, and in so doing coalesces 'hole' and
'hole,zero' segments to be indistinguishable).
Display a progress bar during copying.
Write a machine-readable progress bar to file descriptor "FD".
This progress bar prints lines with the format "COPIED/TOTAL"
(where "COPIED" and "TOTAL" are 64 bit unsigned integers).
Supporting optional arguments to long options is okay, but supporting
optional arguments to short options gets tricky when using getopt. I
would recommend two separate options, '-p' with no argument as shorthand
for progress to stderr, and '-P description' with mandatory option for
where to send progress, rather than trying to let '-p' have optional
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3226
Virtualization: qemu.org | libvirt.org