qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v5 1/3] qcow2: Add qcow2_shrink_l1_and_l2_table


From: Jun Li
Subject: Re: [Qemu-devel] [PATCH v5 1/3] qcow2: Add qcow2_shrink_l1_and_l2_table for qcow2 shrinking
Date: Tue, 27 Jan 2015 22:06:02 +0800
User-agent: Mutt/1.5.23 (2014-03-12)

On Thu, 01/22 14:14, Max Reitz wrote:
> On 2015-01-19 at 08:16, Jun Li wrote:
> >On Thu, 01/15 13:47, Max Reitz wrote:
> >>On 2015-01-03 at 07:23, Jun Li wrote:
> >>>On Fri, 11/21 11:56, Max Reitz wrote:
> >>>>So, as for what I think we do need to do when shrinking (and keep in mind:
> >>>>The offset given to qcow2_truncate() is the guest size! NOT the host image
> >>>>size!):
> >>>>
> >>>>(1) Determine the first L2 table and the first entry in the table which 
> >>>>will
> >>>>lie beyond the new guest disk size.
> >>>Here is not correct always. Due to the COW, using offset to calculate the
> >>>first entry of the first L2 table will be incorrect.
> >>Again: This is *not* about the host disk size or the host offset of some
> >>cluster, but about the *guest* disk size.
> >>
> >>Let's make up an example. You have a 2 GB disk but you want to resize it to
> >>1.25 GB. The cluster size is 64 kB, therefore we have 2 GB / 64 kB = 32,768
> >>data clusters (as long as there aren't any internal snapshots, which is a
> >>prerequisite for resizing qcow2 images).
> >>
> >>Every L2 table contains 65,536 / 8 = 8,192 entries; there are thus 32,768 /
> >>8,192 = 4 L2 tables.
> >>
> >>As you can see, one can directly derive the number of data clusters and L2
> >>tables from the guest disk size (as long as there aren't any internal
> >>snapshots).
> >>
> >>So of course we can do the same for the target disk size: 1.25 GB / 64 kB =
> >>20,480 data clusters; 20,480 / 8,192 = 2.5 L2 tables, therefore we need
> >>three L2 tables but only half of the last one (4,096 entries).
> >>
> >Sorry, last time is my mis-understanding. If do not use qcow2_truncate(), I
> >think don't existing above issue.
> >
> >For my original thought, I want to say:
> >Sometimes the second L2 table will contain some entry, the pointer in this
> >entry will point to a cluster which address is larger than 1.25 GB.
> 
> Correct.
> 
> >So if not use qcow2_truncate(), won't discard above cluster which address is
> >larger than 1.25 GB.

Sorry, I do not express my meaning clearly. 

Here I want to say:

As some entry(let call it entry1) will point to a cluster(let call it
cluster1) which address is larger than 1.25GB, so if we use qcow2_truncate()
and will discard this cluster1. So entry1 will have an error after cluster1
discard. If do not use qcow2_truncate(), so won't discard cluster1.

> 
> I'm sorry, I can't really follow what you are trying to say here, so I'll
> just try to reply with things that may or may not be what you wanted to talk
> about.
> 
> If you are using qemu-img resize and thus subsequently qcow2_truncate() to
> shrink an image, you cannot expect the image to shrink to the specified file
> length, for several reasons.
> 
> First, if you shrink it to 1 GB, but only half of that is actually used, the
> image might of course very well have a length below 1 GB.
> 
> Second, there is metadata overhead. So if you are changing the guest disk
> size to 1 GB (all of which is occupied), the host file size will exceed 1 GB
> because of that overhead.
> 
> Third, I keep repeating myself here, but file length is not file size. So
> you may observe a file length of 10 GB or more because the clusters are
> spread all over the image file. This is something we'd have to combat with
> defragmentation; but the question is whether we really need to (see below
> for more on that). The point is that it doesn't matter whether the image has
> a file length of 10 GB; the file size will be around 1 GB anyway.
> 
> >But I still have another worry.
> >
> >Suppose "virtual size" and "disk size" are all 2G. After we resize it to
> >1.25G, seems we will get "virtual size" is 1.25G but "disk size" is still 2G
> 
> No, it won't. I can prove it to you:

Yes, you are right. I have double checked my PATCH v5. Seems I don't use
qcow2_process_discards(and this function will call bdrv_discard() to discard
cluster on host) in my patch v5. I will submit a new version of patch. Thanks.

Regards,
Jun Li

> 
> $ qemu-img create -f qcow2 test.qcow2 64M
> $ qemu-io -c 'write 0 64M' test.qcow2
> $ qemu-img info test.qcow2
> ...
> disk size: 64M
> ...
> 
> Okay, so far it's just what we'd expect. Now let's implement my proposal for
> truncation: Let's assume the image should be shrinked to 32 MB, so we
> discard all clusters starting at 32 MB (guest offset) (which is 64 MB - 32
> MB = 32 MB of data):
> 
> $ qemu-io -c 'discard 32M 32M' test.qcow2
> $ qemu-img info test.qcow2
> ...
> disk size: 32M
> ...
> 
> Great!
> 
> >if do not use "qcow2_truncate()" to truncate the file(Yes, I know use
> >qcow2_truncate is not a resolution). This seems strange, not so perfect.
> >
> >>We know that every cluster references somewhere after that limit (that is,
> >>every entry in the fourth L2 table and every entry starting with index 4,096
> >>in the third L2 table) is a data cluster with a guest offset somewhere
> >>beyond 1.25 GB, so we don't need it anymore.
> >>
> >>Thus, we simply discard all those data clusters and after that we can
> >>discard the fourth L2 table. That's it.
> >>
> >>If we really want to we can calculate the highest cluster host offset in use
> >>and truncate the image accordingly. But that's optional, see the last point
> >>in my "problems with this approach" list (having discarded the clusters
> >>should save us all the space already). Furthermore, as I'm saying in that
> >>list, to really solve this issue, we'd need qcow2 defragmentation.
> >>
> >Do we already have "qcow2 defragmentation" realization?
> 
> No, we don't. The only way to defragment a qcow2 image right now is using
> qemu-img convert to create a (defragmented) copy and then delete the old
> image, which has the disadvantage of temporarily requiring double the disk
> space and being an offline operation.
> 
> So far, nobody has implemented online defragmentation, mainly for two
> reasons: It would probably be pretty complicated (it'd probably need to be a
> block job which links into a pretty low-level function provided by qcow2
> (defragment_some_clusters or something)) and second, so far there has been
> little demand. Disk space is not an issue (as said before), because it
> doesn't really matter to a modern file system whether your file has a length
> of 100 MB of 100 GB; that's just some number. What really matters is how
> much of that space is actually used; and if all unused clusters are
> discarded, there won't be any space used for them (well, maybe there is some
> metadata overhead, but that should be negligible).
> 
> There are a couple of reasons why you'd want to defragment an image:
> 
> First, it makes you feel better. I can relate to that, but it's not a real
> reason.
> 
> Second, it may improve performance: The guest may expect consecutive reads
> to be fast; but if the clusters are sprinkled all over the host, consecutive
> guest reads no longer necessarily translate to consecutive reads on the host
> (same for writes, of course). Defragmentation would probably fix that, but
> if you want to rely on this, you'd better use preallocated image files.
> 
> Third, it looks better. People expect the file length to be raw indicator of
> the file size. However, for me this is related to "it makes you feel
> better", because this also is not a really good reason.
> 
> Fourth, using a non-modern file system may let your file size explode
> because suddenly, file length is actually equal to the file size. But I
> think, in this case you should just use a better file system.
> 
> I don't know whether "cp" copies holes in files; its manpage says it does
> create sparse images, but I don't know how well it works; but I just assume
> it works well enough.
> 
> Max
> 
> >Jun Li
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]