[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: qemu-img convert vs writing another copy tool

From: Eric Blake
Subject: Re: qemu-img convert vs writing another copy tool
Date: Thu, 23 Jan 2020 13:21:28 -0600
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.4.1

On 1/23/20 12:35 PM, Richard W.M. Jones wrote:
I guess some people are aware that virt-v2v, which is a tool which
converts guests from VMware to run on KVM, and some other
OpenStack-OpenStack migration tools we have, use "qemu-img convert" to
copy the data around.

Historically we've had bugs here.  The most recent was discussed in
the thread on this list called "Bug? qemu-img convert to preallocated
image makes it sparse"

We've been kicking around the idea of writing some alternate tool.  My
proposal would be a tool (not yet written, maybe it will never be
written) called nbdcp for copying between NBD servers and local files.
An outline manual page for this proposed tool is attached.

Some of the things which this tool might do which qemu-img convert
cannot do right now:

  - Hint that the target already contains zeroes.  It's almost always
    the case that we know this, but we cannot tell qemu.  This was the
    cause of a big performance regression last year.

This has just recently been proposed:

I'm also working on a patch that I will post soon that extends the NBD protocol to advertise this information (it will help the situation where the destination is NBD, but as that requires a new enough server to advertise the information, having the feature as a command-line option allows the same speedup even without the server supporting the extension).

  - Declare that we want the target to be either sparse or
    preallocated.  qemu-img convert can sort of do this in a
    round-about way (create the target in advance and use the -n
    option), but also it's broken at the moment.

  - NBD multi-conn.  In my tests this makes a really massive
    performance difference in certain situations.  Again, virt-v2v has
    a lot of information that we cannot pass to qemu: we know, for
    example, exactly if the server supports the feature, how many
    threads are available, in some situations even have information
    about the network and backing disks that the data will travel over
    / be stored on.

Multi-conn for reading the source allows better parallelism. Multi-conn for writing is a bit trickier - it should be safe if the different connections are only touching distinct segments of the export (no overlaps), but as qemu does not advertise multiconn in such situations, you may still need a command-line switch to force multiple writers in spite of the server not advertising it. Here, I'm not aware of anyone with patches underway, but I also think it would be a good ground for exploring.

  - Machine-parsable progress bars.  You can, sort of, parse the
    progress bar from qemu-img convert, but it's not as easy as it
    could be.  In particular it would be nice if the format was treated
    as ABI, and if there was a way to have the tool write the progress
    bar info to a precreated file descriptor.

Would be nice, but I'm not aware of anyone currently planning to add it.

  - External block lists.  This is a rather obscure requirement, but
    it's necessary in the case where we can get the allocated block map
    from another source (eg. pyvmomi) and then want to use that with an
    NBD source that does not support extents (eg. nbdkit-ssh-plugin /
    libssh / sftp).  [Having said that, it may be possible to implement
    this as an nbdkit filter, so maybe this is not a blocking feature.]

How are you intending to use this? I'm guessing you have some way of feeding in information to qemu-img of which portions of the source image you want to copy, and ignore remaining portions.

Note that it IS already possible to use qemu's copy-on-read feature as a way to copy only a subset of a source file over to a destination file. When demonstrating incremental backup, I wrote this shell function:

copyif() {
if test $# -lt 2 || test $# -gt 3; then
  echo 'usage: copyif src dst [bitmap]'
  return 1
if test -z "$3"; then
  map_from="-f raw nbd://localhost:10809/$1"
  map_from="--image-opts driver=nbd,export=$1,server.type=inet"
$qemu_img info -f raw nbd://localhost:10809/$1 || return
$qemu_img info -f qcow2 $2 || return
$qemu_img rebase -u -f qcow2 -F raw -b nbd://localhost:10809/$1 $2
while read line; do
[[ $line =~ .*start.:.([0-9]*).*length.:.([0-9]*).*data.:.$state.* ]] || continue
  start=${BASH_REMATCH[1]} len=${BASH_REMATCH[2]}
  echo " $start $len:"
  qemu-io -C -c "r $start $len" -f qcow2 $2
done < <($qemu_img map --output=json $map_from)
$qemu_img rebase -u -f qcow2 -b '' $2
if test $ret = 0; then echo 'Success!'; fi
return $ret

The key lines here are 'qemu-io -C -c "r $start $len" -f qcow2 $2', which is performed in a loop to read just targetted portions of the destination qcow2 file with copy-on-read set to pull in that portion from its backing file, and '<($qemu_img map --output=json $map_from)' which was used to derive the extent map driving which portions of the file to read.

We also have 'qemu-img dd' that can copy subsets of a file, although it is not currently the ideal interface, and probably needs to be enhanced (I have a branch where I had tried working on patches for it, but where the feedback was that we want the improvements to be more generic, or even teach 'qemu-img convert' to support offsets the way 'qemu-img dd' tries to; I'd need to revisit that branch...)

One thing which qemu-img convert can do which nbdcp could not:

  - Read or write from qcow2 files.

Although you could still couple things together: nbdcp for new features plus qemu-nbd to drive an NBD wrapper around qcow2 (as source or as destination).

So instead of splitting the ecosystem and writing a new tool that
doesn't do as much as qemu-img convert, I wonder what qemu developers
think about the above missing features?  For example, are they in
scope for qemu-img convert?

I could see all of these being viable additions to qemu-img, but also wonder if writing nbdcp would get those features available in a faster manner.

         nbdcp [-a|--target-allocation allocated|sparse]
               [-b|--block-list <blocksfile>]

These make sense for any qemu-img format.

               [-m|--multi-conn <n>] [-M|--multi-conn-target <n>]

These might make more sense as tunables for how to set up NBD client (destination) or server (source), rather than directly as qemu-img options. That is, I could imagine that we'd use qemu-img --image-format, and then expose new blockdev-style knobs for setting up the NBD endpoint to enable multiconn usage of that endpoint.

               [-p|--progress-bar] [-S|--sparse-detect <n>]
               [-T|--threads <n>] [-z|--target-is-zero]
               'nbd://...'|DISK.IMG 'nbd://...'|DISK.IMG

And these options also seem like they are useful to qemu-img proper.

        This program cannot: copy from file to file (use cp(1) or dd(1)), copy
        to or from formats other than raw (use qemu-img(1) convert), or access
        servers other than NBD servers (also use qemu-img(1)).

Again, depending on how we want to mix-and-match things, using qemu-nbd to create the NBD endpoint for the nbdcp source or destination may be worthwhile (which is different than directly using qemu-img); we'd want some decent examples of building such chains between tools. Or it could help us decide whether we can cut out some overhead by consolidating typical uses into one tool rather than requiring convoluted chains.

        -b BLOCKSFILE
            Load the list of extents from an external file.  nbdcp considers
            this to be the truth for source extents.  The file should contain
            one record per line in the same format as nbdkit-sh-plugin(1), ie:

             offset length type

            with "offset" and "length" in bytes, and the "type" field being a
            comma-separated list of the words "hole" and "zero".  For example:

             0  1M
             1M 9M  hole,zero

Could we also teach this to parse 'qemu-img map --output=json' format? And/or add 'qemu-img map --output=XYZ' (different from the current --output=human') that gives sufficient information? (Note: --output=human is NOT suitable for extent lists - it intentionally outputs only the data portions, and in so doing coalesces 'hole' and 'hole,zero' segments to be indistinguishable).

            Display a progress bar during copying.

        -p machine:FD
            Write a machine-readable progress bar to file descriptor "FD".
            This progress bar prints lines with the format "COPIED/TOTAL"
            (where "COPIED" and "TOTAL" are 64 bit unsigned integers).

Supporting optional arguments to long options is okay, but supporting optional arguments to short options gets tricky when using getopt. I would recommend two separate options, '-p' with no argument as shorthand for progress to stderr, and '-P description' with mandatory option for where to send progress, rather than trying to let '-p' have optional argument.

Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

reply via email to

[Prev in Thread] Current Thread [Next in Thread]