Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for

From:	Paolo Bonzini
Subject:	Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for write_zeroes operations
Date:	Wed, 14 Mar 2012 08:41:12 +0100
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:10.0.1) Gecko/20120216 Thunderbird/10.0.1

Il 13/03/2012 20:13, Richard Laager ha scritto:
>> > To be completely correct, I suggest the following behavior:
>>> > >      1. Add a discard boolean option to the disk layer.
>>> > >      2. If discard is not specified:
>>> > >               * For files, detect a true/false value by comparing
>>> > >                 stat.st_blocks != stat.st_size>>9.
>>> > >               * For devices, assume a fixed value (true?).
>>> > >      3. If discard is true, issue discards.
>>> > >      4. If discard is false, do not issue discards.
>> > 
>> > The problem is, who will use this interface?
> I'm a libvirt and virt-manager user; virt-manager already differentiates
> between thin and thick provisioning. So I'm envisioning passing that
> information to libvirt, which would save it in a config file and use
> that to set discard=true vs. discard=false when using QEMU.

Yeah, it could be set also at the pool level for libvirt.

>>> > >       * For SCSI, report an unmap_granularity to the guest as follows:
>>> > >       max(logical_block_size, discard_granularity) / logical_block_size
>> > 
>> > This is more or less already in place later in the series.
> I didn't see it. Which patch number?

Patch 11:

+    discard_granularity = s->qdev.conf.discard_granularity;
+    if (discard_granularity == -1) {
+        s->qdev.conf.discard_granularity = s->qdev.conf.logical_block_size;
+    } else if (discard_granularity < s->qdev.conf.logical_block_size) {
+        error_report("scsi-block: invalid discard_granularity");
+        return -1;
+    } else if (discard_granularity & (discard_granularity - 1)) {
+        error_report("scsi-block: discard_granularity not a power of two");
+        return -1;
+    }

> > If you have a new kernel that supports SEEK_HOLE/SEEK_DATA, it can also
> > be done by skipping the zero write on known holes.
> > 
> > This could even be done at the block layer level using bdrv_is_allocated.
> 
> Would we want to make all write_zeros operations check for and skip
> holes, or is write_zeros different from a discard in that it SHOULD/MUST
> allocate space?

I think that's pretty much the question to answer for this patch to graduate
from the RFC state (the rest is just technicalities, so to speak).  So far,
write_zeros was intended to be an efficient operation (it avoids allocating
a cluster in qed and will do the same in qcow3, which is why I decided to
merge it with discard).

>>> > > If we could probe for FALLOC_FL_PUNCH_HOLE support, then we could avoid
>>> > > advertising discard support based on FALLOC_FL_PUNCH_HOLE when it is not
>>> > > going to work. This would side step these problems. 
>> > 
>> > ... and introduce others when migrating if your datacenter doesn't have
>> > homogeneous kernel versions and/or filesystems. :(
> I hadn't thought of the migration issues. Thanks for bringing that up.
> 
> Worst case, you end up doing a bunch of zero writing if and only if you
> migrate from a discard_zeros_data host to one that doesn't (or doesn't
> do discard at all). But this only lasts until the guest reboots
> (assuming we also add a behavior of re-probing on guest reboot--or until
> it shuts down if we don't or can't). As far as I can see, this is
> unavoidable, though. And this is no worse than writing zeros ALL of the
> time that fallocate() fails, which is the behavior of your patch series,
> right?

It is worse in that we do not want the hardware parameters exposed to the
guest to change behind the scenes, except if you change the machine type
or if you use the default unversioned type.

> This might be another use case for a discard option on the disk. If some
> but not all of one's hosts support discard, a system administrator might
> want to set discard=0 to avoid this.

A discard option is already there (discard_granularity=0).  Libvirt could
choose to expose it even now, but it would be of little use in the absence
of support for unaligned discards.

Paolo

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] [RFC PATCH 02/17] qed: make write-zeroes bounce buffer smaller than a single cluster, (continued)
- [Qemu-devel] [RFC PATCH 02/17] qed: make write-zeroes bounce buffer smaller than a single cluster, Paolo Bonzini, 2012/03/08
- [Qemu-devel] [RFC PATCH 03/17] block: add discard properties to BlockDriverInfo, Paolo Bonzini, 2012/03/08
  - Re: [Qemu-devel] [RFC PATCH 03/17] block: add discard properties to BlockDriverInfo, Kevin Wolf, 2012/03/09
- [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for write_zeroes operations, Paolo Bonzini, 2012/03/08
  - Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for write_zeroes operations, Kevin Wolf, 2012/03/09
    - Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for write_zeroes operations, Paolo Bonzini, 2012/03/09
    - Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for write_zeroes operations, Richard Laager, 2012/03/10
    - Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for write_zeroes operations, Paolo Bonzini, 2012/03/12
    - Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for write_zeroes operations, Kevin Wolf, 2012/03/12
    - Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for write_zeroes operations, Richard Laager, 2012/03/13
    - Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for write_zeroes operations, Paolo Bonzini <=
    - Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for write_zeroes operations, Kevin Wolf, 2012/03/14
    - Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for write_zeroes operations, Paolo Bonzini, 2012/03/14
    - Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for write_zeroes operations, Kevin Wolf, 2012/03/14
    - Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for write_zeroes operations, Paolo Bonzini, 2012/03/14
    - Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for write_zeroes operations, Kevin Wolf, 2012/03/14
    - Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for write_zeroes operations, Christoph Hellwig, 2012/03/24
    - Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for write_zeroes operations, Christoph Hellwig, 2012/03/24
    - Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for write_zeroes operations, Richard Laager, 2012/03/26
    - Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for write_zeroes operations, Kevin Wolf, 2012/03/27
    - Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for write_zeroes operations, Christoph Hellwig, 2012/03/24

Prev by Date: Re: [Qemu-devel] [PATCH 0/7] QOM'ify UniCore32 CPU
Next by Date: Re: [Qemu-devel] Man page: Add -global description
Previous by thread: Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for write_zeroes operations
Next by thread: Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for write_zeroes operations
Index(es):
- Date
- Thread