qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 0/2] qcow2: Force preallocation with data-file-raw


From: Nir Soffer
Subject: Re: [PATCH 0/2] qcow2: Force preallocation with data-file-raw
Date: Mon, 22 Jun 2020 18:50:14 +0300

On Mon, Jun 22, 2020 at 12:47 PM Max Reitz <mreitz@redhat.com> wrote:
>
> On 22.06.20 00:25, Nir Soffer wrote:
> > On Fri, Jun 19, 2020 at 1:40 PM Max Reitz <mreitz@redhat.com> wrote:
> >>
> >> Hi,
> >>
> >> As discussed here:
> >>
> >> https://lists.nongnu.org/archive/html/qemu-block/2020-02/msg00644.html
> >> https://lists.nongnu.org/archive/html/qemu-block/2020-04/msg00329.html
> >> https://lists.nongnu.org/archive/html/qemu-block/2020-06/msg00240.html
> >>
> >> I think that qcow2 images with data-file-raw should always have
> >> preallocated 1:1 L1/L2 tables, so that the image always looks the same
> >> whether you respect or ignore the qcow2 metadata.
> >
> > I don't know the internals of qcow2 data_file, but are we really using
> > qcow2 metadata when accessing the data file?
>
> Yes.
>
> > This may have unwanted performance consequences.
>
> I don’t think so, because in practice normal lookups of L1/L2 mappings
> generally don’t cost that much performance.
>
> > If I understand correctly, qcow2 metadata is needed only for keeping
> > bitmaps (or maybe
> > future extensions) for raw data file, and reading from the qcow2 image
> > should be read
> > directly from the raw file without any extra work.
> >
> > Writing to the data file should also bypass the qcow2 metadata, since the 
> > bitmap
> > is updated in memory.
>
> Well, with this series, writing would no longer update the metadata at
> least, because it would always be preallocated already.
>
> >>  The easiest way to
> >> achieve that is to enforce at least metadata preallocation whenever
> >> data-file-raw is given.
> >
> > But preallocation is not free, even on file systems, it can be even
> > slow (NFS < 4.2).
>
> Metadata preallocation with an external data file should be the same
> speed on every file system.  We only need to create the metadata
> structures, which, with the default cluster size (64k) take up a bit
> more than 1/8192 of the full image size.
>
> Sure, it’s not free.  But if we decide we should indeed fully ignore the
> L1/L2 tables for data-file-raw images, the qcow2 spec must be amended.
> As I can read it, it currently doesn’t say so.
>
> (By the way, this is not a trivial change.  Right now, data-file-raw is
> an autoclear flag: If a version of qemu that doesn’t support it accesses
> the image, it will automatically clear the flag, but the image stays
> valid.  If we decide to completely ignore the L1/L2 tables (i.e. not
> even create them), then this can no longer be an autoclear flag.  We’d
> need a new incompatible flag.  (Because without L1/L2 tables, the image
> becomes useless to older qemu versions.))
>
> > With block storage this means you need to allocate the entire image size on
> > storage for writing the metadata.
> >
> > While oVirt does not use qcow2 with data_file, having preallocated qcow2
> > will make this very hard to use, for example for 500 GiB disk we will have 
> > to
> > allocate 500 GiB disk for the raw data file and 500 GiB disk for the qcow2
> > metadata disk which will be 99% unused.
>
> I don’t understand this.  When you use an external data file, the qcow2
> file will only contain the metadata:
>
> $ qemu-img create -f qcow2 \
>     -o data_file=foo.data,data_file_raw=on,preallocation=metadata \
>     foo.qcow2 8G
> Formatting 'foo.qcow2', fmt=qcow2 size=8589934592 data_file=foo.data
> data_file_raw=on cluster_size=65536 preallocation=metadata
> lazy_refcounts=off refcount_bits=16
> $ ls -l foo.qcow2
> ... 1310720 ... foo.qcow2
> $ ls -l foo.data
> ... 8589934592 ... foo.data

When allocating metadata in regular qcow2, need the to allocate the
entire device
(+ extra space for metadata overhead):

# qemu-img create -f qcow2 -o preallocation=metadata foo.qcow2 500g
Formatting 'foo.qcow2', fmt=qcow2 size=536870912000 cluster_size=65536
preallocation=metadata lazy_refcounts=off refcount_bits=16

# qemu-img check foo.qcow2
No errors were found on the image.
8192000/8192000 = 100.00% allocated, 0.00% fragmented, 0.00% compressed clusters
Image end offset: 536953094144

But I see that with metadata file we allocate much less:

# qemu-img create -f qcow2 -o
data_file=foo.data,data_file_raw=on,preallocation=metadata foo.qcow2
500g
Formatting 'foo.qcow2', fmt=qcow2 size=536870912000 data_file=foo.data
data_file_raw=on cluster_size=65536 preallocation=metadata
lazy_refcounts=off refcount_bits=16

# qemu-img check foo.qcow2
No errors were found on the image.
8192000/8192000 = 100.00% allocated, 0.00% fragmented, 0.00% compressed clusters
Image end offset: 65798144

I tested this also with block device:

# lvcreate --size 500g --name foo.data test
  Logical volume "foo.data" created.

 lvcreate --size 128m --name foo.qcow2 test
  Logical volume "foo.qcow2" created.

# time qemu-img create -f qcow2 -o
data_file=/dev/test/foo.data,data_file_raw=on,preallocation=metadata
/dev/test/foo.qcow2 500g
Formatting '/dev/test/foo.qcow2', fmt=qcow2 size=536870912000
data_file=/dev/test/foo.data data_file_raw=on cluster_size=65536
preallocation=metadata lazy_refcounts=off refcount_bits=16

real 0m4.263s
user 0m0.149s
sys 0m0.387s

# qemu-img info /dev/test/foo.qcow2
image: /dev/test/foo.qcow2
file format: qcow2
virtual size: 500 GiB (536870912000 bytes)
disk size: 0 B
cluster_size: 65536
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    data file: /dev/test/foo.data
    data file raw: true
    corrupt: false

# qemu-img check /dev/test/foo.qcow2
No errors were found on the image.
8192000/8192000 = 100.00% allocated, 0.00% fragmented, 0.00% compressed clusters
Image end offset: 65798144


The overhead 63 MiB per 500 GiB seems reasonable and preallocating the metadata
is not that bad.

> > I don't think that kubevirt is planning to use this either, but if
> > they decide to use
> > this it may be a problem for them as well when using block storage.
> >
> > It looks like we abuse preallocation for getting the side effect that
> > the backing file
> > will be rejected, instead of adding the validation rejecting backing
> > file in this case.
>
> That isn’t the case.
>
> I want to use preallocation because I interpret the spec such that it
> requires metadata preallocation.  It says when accessing a qcow2 file
> with data-file-raw, you can ignore the L1/L2 tables.  To me, that means
> that the L1/L2 tables must give a 1:1 mapping so that you get the same
> result whether you interpret them or not.

I agree that this is reasonable, and we will be able to use this if we need.

Not having to allocate metadata at all and never using the 1:1 mapping
would be even better.

Nir




reply via email to

[Prev in Thread] Current Thread [Next in Thread]