[Qemu-block] [PULL 4/9] qcow2: Document some maximum size constraints

qemu-block

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-block] [PULL 4/9] qcow2: Document some maximum size constraints

From:	Kevin Wolf
Subject:	[Qemu-block] [PULL 4/9] qcow2: Document some maximum size constraints
Date:	Mon, 19 Nov 2018 15:29:39 +0100

From: Eric Blake <address@hidden>

Although off_t permits up to 63 bits (8EB) of file offsets, in
practice, we're going to hit other limits first.  Document some
of those limits in the qcow2 spec (some are inherent, others are
implementation choices of qemu), and how choice of cluster size
can influence some of the limits.

While we cannot map any uncompressed virtual cluster to any
address higher than 64 PB (56 bits) (due to the current L1/L2
field encoding stopping at bit 55), qemu's cap of 8M for the
refcount table can still access larger host addresses for some
combinations of large clusters and small refcount_order.  For
comparison, ext4 with 4k blocks caps files at 16PB.

Another interesting limit: for compressed clusters, the L2 layout
requires an ever-smaller maximum host offset as cluster size gets
larger, down to a 512 TB maximum with 2M clusters.  In particular,
note that with a cluster size of 8k or smaller, the L2 entry for
a compressed cluster could technically point beyond the 64PB mark,
but when you consider that with 8k clusters and refcount_order = 0,
you cannot access beyond 512T without exceeding qemu's limit of an
8M cap on the refcount table, it is unlikely that any image in the
wild has attempted to do so.  To be safe, let's document that bits
beyond 55 in a compressed cluster must be 0.

Signed-off-by: Eric Blake <address@hidden>
Signed-off-by: Kevin Wolf <address@hidden>
---
 docs/interop/qcow2.txt | 38 ++++++++++++++++++++++++++++++++++++--
 1 file changed, 36 insertions(+), 2 deletions(-)

diff --git a/docs/interop/qcow2.txt b/docs/interop/qcow2.txt
index 845d40a086..fb5cb47245 100644
--- a/docs/interop/qcow2.txt
+++ b/docs/interop/qcow2.txt
@@ -40,7 +40,18 @@ The first cluster of a qcow2 image contains the file header:
                     with larger cluster sizes.
 
          24 - 31:   size
-                    Virtual disk size in bytes
+                    Virtual disk size in bytes.
+
+                    Note: qemu has an implementation limit of 32 MB as
+                    the maximum L1 table size.  With a 2 MB cluster
+                    size, it is unable to populate a virtual cluster
+                    beyond 2 EB (61 bits); with a 512 byte cluster
+                    size, it is unable to populate a virtual size
+                    larger than 128 GB (37 bits).  Meanwhile, L1/L2
+                    table layouts limit an image to no more than 64 PB
+                    (56 bits) of populated clusters, and an image may
+                    hit other limits first (such as a file system's
+                    maximum size).
 
          32 - 35:   crypt_method
                     0 for no encryption
@@ -326,6 +337,17 @@ in the image file.
 It contains pointers to the second level structures which are called refcount
 blocks and are exactly one cluster in size.
 
+Although a large enough refcount table can reserve clusters past 64 PB
+(56 bits) (assuming the underlying protocol can even be sized that
+large), note that some qcow2 metadata such as L1/L2 tables must point
+to clusters prior to that point.
+
+Note: qemu has an implementation limit of 8 MB as the maximum refcount
+table size.  With a 2 MB cluster size and a default refcount_order of
+4, it is unable to reference host resources beyond 2 EB (61 bits); in
+the worst case, with a 512 cluster size and refcount_order of 6, it is
+unable to access beyond 32 GB (35 bits).
+
 Given an offset into the image file, the refcount of its cluster can be
 obtained as follows:
 
@@ -365,6 +387,16 @@ The L1 table has a variable size (stored in the header) 
and may use multiple
 clusters, however it must be contiguous in the image file. L2 tables are
 exactly one cluster in size.
 
+The L1 and L2 tables have implications on the maximum virtual file
+size; for a given L1 table size, a larger cluster size is required for
+the guest to have access to more space.  Furthermore, a virtual
+cluster must currently map to a host offset below 64 PB (56 bits)
+(although this limit could be relaxed by putting reserved bits into
+use).  Additionally, as cluster size increases, the maximum host
+offset for a compressed cluster is reduced (a 2M cluster size requires
+compressed clusters to reside below 512 TB (49 bits), and this limit
+cannot be relaxed without an incompatible layout change).
+
 Given an offset into the virtual disk, the offset into the image file can be
 obtained as follows:
 
@@ -427,7 +459,9 @@ Standard Cluster Descriptor:
 Compressed Clusters Descriptor (x = 62 - (cluster_bits - 8)):
 
     Bit  0 - x-1:   Host cluster offset. This is usually _not_ aligned to a
-                    cluster or sector boundary!
+                    cluster or sector boundary!  If cluster_bits is
+                    small enough that this field includes bits beyond
+                    55, those upper bits must be set to 0.
 
          x - 61:    Number of additional 512-byte sectors used for the
                     compressed data, beyond the sector containing the offset
-- 
2.19.1

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-block] [PULL 0/9] Block layer patches, Kevin Wolf, 2018/11/19
- [Qemu-block] [PULL 1/9] nvme: fix oob access issue(CVE-2018-16847), Kevin Wolf, 2018/11/19
- [Qemu-block] [PULL 3/9] vvfat: Fix memory leak, Kevin Wolf, 2018/11/19
- [Qemu-block] [PULL 2/9] fdc: fix segfault in fdctrl_stop_transfer() when DMA is disabled, Kevin Wolf, 2018/11/19
- [Qemu-block] [PULL 7/9] block: Always abort reopen after prepare succeeded, Kevin Wolf, 2018/11/19
- [Qemu-block] [PULL 4/9] qcow2: Document some maximum size constraints, Kevin Wolf <=
- [Qemu-block] [PULL 5/9] qcow2: Don't allow overflow during cluster allocation, Kevin Wolf, 2018/11/19
- [Qemu-block] [PULL 6/9] iotests: Add new test 220 for max compressed cluster offset, Kevin Wolf, 2018/11/19
- [Qemu-block] [PULL 8/9] file-posix: Fix shared locks on reopen commit, Kevin Wolf, 2018/11/19
- [Qemu-block] [PULL 9/9] iotests: Test file-posix locking and reopen, Kevin Wolf, 2018/11/19
- Re: [Qemu-block] [PULL 0/9] Block layer patches, Peter Maydell, 2018/11/19

Prev by Date: [Qemu-block] [PULL 7/9] block: Always abort reopen after prepare succeeded
Next by Date: [Qemu-block] [PULL 5/9] qcow2: Don't allow overflow during cluster allocation
Previous by thread: [Qemu-block] [PULL 7/9] block: Always abort reopen after prepare succeeded
Next by thread: [Qemu-block] [PULL 5/9] qcow2: Don't allow overflow during cluster allocation
Index(es):
- Date
- Thread