[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-devel] Re: Caching modes
From: |
Kevin Wolf |
Subject: |
[Qemu-devel] Re: Caching modes |
Date: |
Tue, 21 Sep 2010 10:15:56 +0200 |
User-agent: |
Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.12) Gecko/20100907 Fedora/3.0.7-1.fc12 Thunderbird/3.0.7 |
Am 21.09.2010 02:18, schrieb Anthony Liguori:
> On 09/20/2010 06:17 PM, Christoph Hellwig wrote:
>> On Mon, Sep 20, 2010 at 03:11:31PM -0500, Anthony Liguori wrote:
>>
>>>>> All read and write requests SHOULD avoid any type of caching in the
>>>>> host. Any write request MUST complete after the next level of storage
>>>>> reports that the write request has completed. A flush from the guest
>>>>> MUST complete after all pending I/O requests for the guest have been
>>>>> completed.
>>>>>
>>>>> As an implementation detail, with the raw format, these guarantees are
>>>>> only in place for preallocated images. Sparse images do not provide as
>>>>> strong of a guarantee.
>>>>>
>>>>>
>>>> That's not how cache=none ever worked nor works currently.
>>>>
>>>>
>>> How does it work today compared to what I wrote above?
>>>
>> For the guest point of view it works exactly as you describe
>> cache=writeback. There is no ordering or cache flushing guarantees. By
>> using O_DIRECT we do bypass the host file cache, but we don't even try
>> on the others (disk cache, commiting metadata transaction that are
>> required to actually see the commited data for sparse, preallocated or
>> growing images).
>>
>
> O_DIRECT alone to a pre-allocated file on a normal file system should
> result in the data being visible without any additional metadata
> transactions.
>
> The only time when that isn't true is when dealing with CoW or other
> special filesystem features.
I think preallocated files are the exception, usually people use sparse
files. And even with preallocation, the disk cache is still left.
>> What you describe above is the equivalent of O_DSYNC|O_DIRECT which
>> doesn't exist in current qemu, except that O_DSYNC|O_DIRECT also
>> guarantees the semantics for sparse images. Sparse images really aren't
>> special in any way - preallocaiton using posix_fallocate or COW
>> filesystems like btrfs,nilfs2 or zfs have exactly the same issues.
>>
>>
>>>> | WC enable | WC disable
>>>> -----------------------------------------------
>>>> direct | |
>>>> buffer | |
>>>> buffer + ignore flush | |
>>>>
>>>> currently we only have:
>>>>
>>>> cache=none direct + WC enable
>>>> cache=writeback buffer + WC enable
>>>> cache=writethrough buffer + WC disable
>>>> cache=unsafe buffer + ignore flush + WC enable
>>>>
>>>>
>>> Where does O_DSYNC fit into this chart?
>>>
>> O_DSYNC is used for all WC disable modes.
>>
>>
>>> Do all modern filesystems implement O_DSYNC without generating
>>> additional barriers per request?
>>>
>>> Having a barrier per-write request is ultimately not the right semantic
>>> for any of the modes. However, without the use of O_DSYNC (or
>>> sync_file_range(), which I know you dislike), I don't see how we can
>>> have reasonable semantics without always implementing write back caching
>>> in the host.
>>>
>> Barriers are a Linux-specific implementation details that is in the
>> process of going away, probably in Linux 2.6.37. But if you want
>> O_DSYNC semantics with a volatile disk write cache there is no way
>> around using a cache flush or the FUA bit on all I/O caused by it.
>
> If you have a volatile disk write cache, then we don't need O_DSYNC
> semantics.
What has semantics of a qemu option to do with the host disk write
cache? We always need to provide the same semantics. If anything, we can
take advantage of a host providing write-through/no caches so that we
don't have to issue the flushes ourselves.
>> We
>> currently use the cache flush, and although I plan to experiment a bit
>> more with the FUA bit for O_DIRECT | O_DSYNC writes I would be very
>> surprised if they actually are any faster.
>>
>
> The thing I struggle with understanding is that if the guest is sending
> us a write request, why are we sending the underlying disk a write +
> flush request? That doesn't seem logical at all to me.
>
> Even if we advertise WC disable, it should be up to the guest to decide
> when to issue flushes.
Why should a guest ever flush a cache when it's told that this cache
doesn't exist?
Kevin
- [Qemu-devel] Caching modes, Anthony Liguori, 2010/09/20
- Re: [Qemu-devel] Caching modes, Blue Swirl, 2010/09/20
- [Qemu-devel] Re: Caching modes, Christoph Hellwig, 2010/09/20
- [Qemu-devel] Re: Caching modes, Anthony Liguori, 2010/09/20
- [Qemu-devel] Re: Caching modes, Christoph Hellwig, 2010/09/20
- [Qemu-devel] Re: Caching modes, Anthony Liguori, 2010/09/20
- [Qemu-devel] Re: Caching modes,
Kevin Wolf <=
- [Qemu-devel] Re: Caching modes, Christoph Hellwig, 2010/09/21
- [Qemu-devel] Re: Caching modes, Anthony Liguori, 2010/09/21
- [Qemu-devel] Re: Caching modes, Christoph Hellwig, 2010/09/21
- [Qemu-devel] Re: Caching modes, Anthony Liguori, 2010/09/21