qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for


From: Paolo Bonzini
Subject: Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for write_zeroes operations
Date: Wed, 14 Mar 2012 13:14:18 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.1) Gecko/20120216 Thunderbird/10.0.1

Il 14/03/2012 13:01, Kevin Wolf ha scritto:
> Am 14.03.2012 08:41, schrieb Paolo Bonzini:
>> Il 13/03/2012 20:13, Richard Laager ha scritto:
>>>> If you have a new kernel that supports SEEK_HOLE/SEEK_DATA, it can also
>>>> be done by skipping the zero write on known holes.
>>>>
>>>> This could even be done at the block layer level using bdrv_is_allocated.
>>>
>>> Would we want to make all write_zeros operations check for and skip
>>> holes, or is write_zeros different from a discard in that it SHOULD/MUST
>>> allocate space?
>>
>> I think that's pretty much the question to answer for this patch to graduate
>> from the RFC state (the rest is just technicalities, so to speak).  So far,
>> write_zeros was intended to be an efficient operation (it avoids allocating
>> a cluster in qed and will do the same in qcow3, which is why I decided to
>> merge it with discard).
> 
> Yes, for qcow3 and to some degree also for QED, setting the zero flag is
> the natural implementation for both discard and write_zeros. The big
> question is what happens with other formats.

Also raw if used with a sparse file.

> Paolo mentioned a use case as a fast way for guests to write zeros, but
> is it really faster than a normal write when we have to emulate it by a
> bdrv_write with a temporary buffer of zeros? 

No, of course not.

> On the other hand we have
> the cases where discard really means "I don't care about the data any
> more" and emulating it by writing zeros is just a waste of resources there.
> 
> So I think we only want to advertise that discard zeroes data if we can
> do it efficiently. This means that the format does support it, and that
> the device is able to communicate the discard granularity (= cluster
> size) to the guest OS.

Note that the discard granularity is only a hint, so it's really more a
maximum suggested value than a granularity.  Outside of a cluster
boundary the format would still have to write zeros manually.

Also, Linux for example will only round the number of sectors down to
the granularity, not the start sector.  Rereading the code, for SCSI we
want to advertise a zero granularity (aka do whatever you want),
otherwise we may get only misaligned discard requests and end up writing
zeroes inefficiently all the time.

The problem is that advertising discard_zeroes_data based on the backend
calls for trouble as soon as you migrate between storage formats,
filesystems or disks.

(BTW, if the backing file allows discard and zeroes data, efficient
write-zeroes could be done in qcow2 by allocating a cluster and
discarding its contents.  It's similar to how you do preallocated metadata).

Paolo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]