qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 0/7] qcow2: compressed write cache


From: Max Reitz
Subject: Re: [PATCH 0/7] qcow2: compressed write cache
Date: Tue, 9 Feb 2021 14:25:05 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.6.0

On 29.01.21 17:50, Vladimir Sementsov-Ogievskiy wrote:
Hi all!

I know, I have several series waiting for a resend, but I had to switch
to another task spawned from our customer's bug.

Original problem: we use O_DIRECT for all vm images in our product, it's
the policy. The only exclusion is backup target qcow2 image for
compressed backup, because compressed backup is extremely slow with
O_DIRECT (due to unaligned writes). Customer complains that backup
produces a lot of pagecache.

So we can either implement some internal cache or use fadvise somehow.
Backup has several async workes, which writes simultaneously, so in both
ways we have to track host cluster filling (before dropping the cache
corresponding to the cluster).  So, if we have to track anyway, let's
try to implement the cache.

I wanted to be excited here, because that sounds like it would be very easy to implement caching. Like, just keep the cluster at free_byte_offset cached until the cluster it points to changes, then flush the cluster.

But then I see like 900 new lines of code, and I’m much less excited...

Idea is simple: cache small unaligned write and flush the cluster when
filled.

Performance result is very good (results in a table is time of
compressed backup of 1000M disk filled with ones in seconds):

“Filled with ones” really is an edge case, though.

---------------  -----------  -----------
                  backup(old)  backup(new)
ssd:hdd(direct)  3e+02        4.4
                                 -99%
ssd:hdd(cached)  5.7          5.4
                                 -5%
---------------  -----------  -----------

So, we have benefit even for cached mode! And the fastest thing is
O_DIRECT with new implemented cache. So, I suggest to enable the new
cache by default (which is done by the series).

First, I’m not sure how O_DIRECT really is relevant, because I don’t really see the point for writing compressed images.

Second, I find it a bit cheating if you say there is a huge improvement for the no-cache case, when actually, well, you just added a cache. So the no-cache case just became faster because there is a cache now.

Well, I suppose I could follow that if O_DIRECT doesn’t make much sense for compressed images, qemu’s format drivers are free to introduce some caching (because technically the cache.direct option only applies to the protocol driver) for collecting compressed writes. That conclusion makes both of my complaints kind of moot.

*shrug*

Third, what is the real-world impact on the page cache? You described that that’s the reason why you need the cache in qemu, because otherwise the page cache is polluted too much. How much is the difference really? (I don’t know how good the compression ratio is for real-world images.)

Related to that, I remember a long time ago we had some discussion about letting qemu-img convert set a special cache mode for the target image that would make Linux drop everything before the last offset written (i.e., I suppose fadvise() with POSIX_FADV_SEQUENTIAL). You discard that idea based on the fact that implementing a cache in qemu would be simple, but it isn’t, really. What would the impact of POSIX_FADV_SEQUENTIAL be? (One advantage of using that would be that we could reuse it for non-compressed images that are written by backup or qemu-img convert.)

(I don’t remember why that qemu-img discussion died back then.)


Fourth, regarding the code, would it be simpler if it were a pure write cache? I.e., on read, everything is flushed, so we don’t have to deal with that. I don’t think there are many valid cases where a compressed image is both written to and read from at the same time. (Just asking, because I’d really want this code to be simpler. I can imagine that reading from the cache is the least bit of complexity, but perhaps...)

Max




reply via email to

[Prev in Thread] Current Thread [Next in Thread]