qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] Re: Caching modes


From: Kevin Wolf
Subject: [Qemu-devel] Re: Caching modes
Date: Tue, 21 Sep 2010 10:15:56 +0200
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.12) Gecko/20100907 Fedora/3.0.7-1.fc12 Thunderbird/3.0.7

Am 21.09.2010 02:18, schrieb Anthony Liguori:
> On 09/20/2010 06:17 PM, Christoph Hellwig wrote:
>> On Mon, Sep 20, 2010 at 03:11:31PM -0500, Anthony Liguori wrote:
>>    
>>>>> All read and write requests SHOULD avoid any type of caching in the
>>>>> host.  Any write request MUST complete after the next level of storage
>>>>> reports that the write request has completed.  A flush from the guest
>>>>> MUST complete after all pending I/O requests for the guest have been
>>>>> completed.
>>>>>
>>>>> As an implementation detail, with the raw format, these guarantees are
>>>>> only in place for preallocated images.  Sparse images do not provide as
>>>>> strong of a guarantee.
>>>>>
>>>>>          
>>>> That's not how cache=none ever worked nor works currently.
>>>>
>>>>        
>>> How does it work today compared to what I wrote above?
>>>      
>> For the guest point of view it works exactly as you describe
>> cache=writeback.  There is no ordering or cache flushing guarantees.  By
>> using O_DIRECT we do bypass the host file cache, but we don't even try
>> on the others (disk cache, commiting metadata transaction that are
>> required to actually see the commited data for sparse, preallocated or
>> growing images).
>>    
> 
> O_DIRECT alone to a pre-allocated file on a normal file system should 
> result in the data being visible without any additional metadata 
> transactions.
> 
> The only time when that isn't true is when dealing with CoW or other 
> special filesystem features.

I think preallocated files are the exception, usually people use sparse
files. And even with preallocation, the disk cache is still left.

>> What you describe above is the equivalent of O_DSYNC|O_DIRECT which
>> doesn't exist in current qemu, except that O_DSYNC|O_DIRECT also
>> guarantees the semantics for sparse images.  Sparse images really aren't
>> special in any way - preallocaiton using posix_fallocate or COW
>> filesystems like btrfs,nilfs2 or zfs have exactly the same issues.
>>
>>    
>>>>                        | WC enable | WC disable
>>>> -----------------------------------------------
>>>> direct                |           |
>>>> buffer                |           |
>>>> buffer + ignore flush |           |
>>>>
>>>> currently we only have:
>>>>
>>>>   cache=none               direct + WC enable
>>>>   cache=writeback  buffer + WC enable
>>>>   cache=writethrough       buffer + WC disable
>>>>   cache=unsafe             buffer + ignore flush + WC enable
>>>>
>>>>        
>>> Where does O_DSYNC fit into this chart?
>>>      
>> O_DSYNC is used for all WC disable modes.
>>
>>    
>>> Do all modern filesystems implement O_DSYNC without generating
>>> additional barriers per request?
>>>
>>> Having a barrier per-write request is ultimately not the right semantic
>>> for any of the modes.  However, without the use of O_DSYNC (or
>>> sync_file_range(), which I know you dislike), I don't see how we can
>>> have reasonable semantics without always implementing write back caching
>>> in the host.
>>>      
>> Barriers are a Linux-specific implementation details that is in the
>> process of going away, probably in Linux 2.6.37.  But if you want
>> O_DSYNC semantics with a volatile disk write cache there is no way
>> around using a cache flush or the FUA bit on all I/O caused by it.
> 
> If you have a volatile disk write cache, then we don't need O_DSYNC 
> semantics.

What has semantics of a qemu option to do with the host disk write
cache? We always need to provide the same semantics. If anything, we can
take advantage of a host providing write-through/no caches so that we
don't have to issue the flushes ourselves.

>>    We
>> currently use the cache flush, and although I plan to experiment a bit
>> more with the FUA bit for O_DIRECT | O_DSYNC writes I would be very
>> surprised if they actually are any faster.
>>    
> 
> The thing I struggle with understanding is that if the guest is sending 
> us a write request, why are we sending the underlying disk a write + 
> flush request?  That doesn't seem logical at all to me.
> 
> Even if we advertise WC disable, it should be up to the guest to decide 
> when to issue flushes.

Why should a guest ever flush a cache when it's told that this cache
doesn't exist?

Kevin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]