Re: [Qemu-block] [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster

qemu-block

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster

From:	Roman Kagan
Subject:	Re: [Qemu-block] [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation
Date:	Thu, 13 Apr 2017 19:42:51 +0300
User-agent:	Mutt/1.8.0 (2017-02-23)

On Thu, Apr 13, 2017 at 04:27:35PM +0200, Kevin Wolf wrote:
> Am 13.04.2017 um 16:15 hat Alberto Garcia geschrieben:
> > On Thu 13 Apr 2017 03:51:55 PM CEST, Kevin Wolf wrote:
> > >> This invariant is already broken by the very design of the qcow2
> > >> format, subclusters don't really add anything new there. For any
> > >> given cluster size you can write 4k in every odd cluster, then do the
> > >> same in every even cluster, and you'll get an equally fragmented
> > >> image.
> > >
> > > Because this scenario has appeared repeatedly in this thread: Can we
> > > please use a more realistic one that shows an actual problem? Because
> > > with 8k or more for the cluster size you don't get any qcow2
> > > fragmentation with 4k even/odd writes (which is a pathological case
> > > anyway), and the file systems are clever enough to cope with it, too.
> > >
> > > Just to confirm this experimentally, I ran this short script:
> > >
> > > ----------------------------------------------------------------
> > > #!/bin/bash
> > > ./qemu-img create -f qcow2 /tmp/test.qcow2 64M
> > >
> > > echo even blocks
> > > for i in $(seq 0 32767); do echo "write $((i * 8))k 4k"; done | ./qemu-io 
> > > /tmp/test.qcow2 > /dev/null
> > > echo odd blocks
> > > for i in $(seq 0 32767); do echo "write $((i * 8 + 4))k 4k"; done | 
> > > ./qemu-io /tmp/test.qcow2 > /dev/null
> > >
> > > ./qemu-img map /tmp/test.qcow2
> > > filefrag -v /tmp/test.qcow2
> > > ----------------------------------------------------------------
> > 
> > But that's because while you're writing on every other 4k block the
> > cluster size is 64k, so you're effectively allocating clusters in
> > sequential order. That's why you get this:
> > 
> > > Offset          Length          Mapped to       File
> > > 0               0x4000000       0x50000         /tmp/test.qcow2
> > 
> > You would need to either have 4k clusters, or space writes even more.
> > 
> > Here's a simpler example, mkfs.ext4 on an empty drive gets you something
> > like this:
> > [...]
> 
> My point wasn't that qcow2 doesn't fragment, but that Denis and you were
> both using a really bad example. You were trying to construct an
> artificially bad image and you actually ended up constructing a perfect
> one.
> 
> > Now, I haven't measured the effect of this on I/O performance, but
> > Denis's point seems in principle valid to me.
> 
> In principle yes, but especially his fear of host file system
> fragmentation seems a bit exaggerated. If I use 64k even/odd writes in
> the script, I end up with a horribly fragmented qcow2 image, but still
> perfectly contiguous layout of the image file in the file system.
> 
> We can and probably should do something about the qcow2 fragmentation
> eventually (I guess a more intelligent cluster allocation strategy could
> go a long way there), but I wouldn't worry to much about the host file
> system.

I beg to disagree.  I didn't have QEMU with subcluster allocation
enabled (you did, didn't you?) so I went ahead with a raw file:

# truncate --size 64k bbb                                                       
                                                                                
                   [14/14]
# filefrag -v bbb
Filesystem type is: ef53
File size of bbb is 65536 (16 blocks of 4096 bytes)
bbb: 0 extents found
# for i in {0..7}; do echo write $[(i * 2) * 4]k 4k; done | qemu-io bbb
...
# for i in {0..7}; do echo write $[(i * 2 + 1) * 4]k 4k; done | qemu-io bbb
...
# filefrag -v bbb
Filesystem type is: ef53
File size of bbb is 65536 (16 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..       1:   65860793..  65860794:      2:
   1:        2..       2:   65859644..  65859644:      1:   65860795:
   2:        3..       3:   65859651..  65859651:      1:   65859645:
   3:        4..       4:   65859645..  65859645:      1:   65859652:
   4:        5..       5:   65859652..  65859652:      1:   65859646:
   5:        6..       6:   65859646..  65859646:      1:   65859653:
   6:        7..       7:   65859653..  65859653:      1:   65859647:
   7:        8..       8:   65859647..  65859647:      1:   65859654:
   8:        9..       9:   65859654..  65859654:      1:   65859648:
   9:       10..      10:   65859648..  65859648:      1:   65859655:
  10:       11..      11:   65859655..  65859655:      1:   65859649:
  11:       12..      12:   65859649..  65859649:      1:   65859656:
  12:       13..      13:   65859656..  65859656:      1:   65859650:
  13:       14..      14:   65859650..  65859650:      1:   65859657:
  14:       15..      15:   65859657..  65859657:      1:   65859651: last,eof
bbb: 15 extents found

So the host filesystem did a very poor job here (ext4 on top of two-way
raid0 on top of rotating disks).

Naturally, replacing truncate with fallocate in the above example gives
no fragmenation:

...
# filefrag -v bbb
Filesystem type is: ef53
File size of bbb is 65536 (16 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..      15:  183616784.. 183616799:     16:             last,eof
bbb: 1 extent found

Roman.

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-block] [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation, (continued)
- Re: [Qemu-block] [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation, Denis V. Lunev, 2017/04/12
  - Re: [Qemu-block] [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation, Eric Blake, 2017/04/12
    - Re: [Qemu-block] [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation, Denis V. Lunev, 2017/04/12

Prev by Date: Re: [Qemu-block] [Qemu-devel] Regression from 2.8: stuck in bdrv_drain()
Next by Date: [Qemu-block] [PATCH 1/3] migration: Call blk_resume_after_migration() for postcopy
Previous by thread: Re: [Qemu-block] [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation
Next by thread: Re: [Qemu-block] [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation
Index(es):
- Date
- Thread