Re: [PATCH 0/7] qcow2: compressed write cache

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 0/7] qcow2: compressed write cache

From:	Max Reitz
Subject:	Re: [PATCH 0/7] qcow2: compressed write cache
Date:	Tue, 9 Feb 2021 15:47:15 +0100
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.6.0

On 09.02.21 15:10, Vladimir Sementsov-Ogievskiy wrote:

09.02.2021 16:25, Max Reitz wrote:

On 29.01.21 17:50, Vladimir Sementsov-Ogievskiy wrote:

Hi all!

I know, I have several series waiting for a resend, but I had to switch
to another task spawned from our customer's bug.

Original problem: we use O_DIRECT for all vm images in our product, it's
the policy. The only exclusion is backup target qcow2 image for
compressed backup, because compressed backup is extremely slow with
O_DIRECT (due to unaligned writes). Customer complains that backup
produces a lot of pagecache.

So we can either implement some internal cache or use fadvise somehow.
Backup has several async workes, which writes simultaneously, so in both
ways we have to track host cluster filling (before dropping the cache
corresponding to the cluster).  So, if we have to track anyway, let's
try to implement the cache.

I wanted to be excited here, because that sounds like it would be veryeasy to implement caching. Like, just keep the cluster atfree_byte_offset cached until the cluster it points to changes, thenflush the cluster.

The problem is that chunks are written asynchronously.. That's why thisall is not so easy.


But then I see like 900 new lines of code, and I’m much less excited...

Idea is simple: cache small unaligned write and flush the cluster when
filled.

Performance result is very good (results in a table is time of
compressed backup of 1000M disk filled with ones in seconds):


“Filled with ones” really is an edge case, though.


Yes, I think, all clusters are compressed to rather small chunks :)

---------------  -----------  -----------
                  backup(old)  backup(new)
ssd:hdd(direct)  3e+02        4.4
                                 -99%
ssd:hdd(cached)  5.7          5.4
                                 -5%
---------------  -----------  -----------

So, we have benefit even for cached mode! And the fastest thing is
O_DIRECT with new implemented cache. So, I suggest to enable the new
cache by default (which is done by the series).

First, I’m not sure how O_DIRECT really is relevant, because I don’treally see the point for writing compressed images.


compressed backup is a point

(Perhaps irrelevant, but just to be clear:) I meant the point of usingO_DIRECT, which one can decide to not use for backup targets (as youhave done already).

Second, I find it a bit cheating if you say there is a hugeimprovement for the no-cache case, when actually, well, you just addeda cache. So the no-cache case just became faster because there is acache now.
Still, performance comparison is relevant to show that O_DIRECT as isunusable for compressed backup.

(Again, perhaps irrelevant, but:) Yes, but my first point was exactlywhether O_DIRECT is even relevant for writing compressed images.

Well, I suppose I could follow that if O_DIRECT doesn’t make muchsense for compressed images, qemu’s format drivers are free tointroduce some caching (because technically the cache.direct optiononly applies to the protocol driver) for collecting compressed writes.
Yes I thought in this way, enabling the cache by default.
That conclusion makes both of my complaints kind of moot.

*shrug*
Third, what is the real-world impact on the page cache? You describedthat that’s the reason why you need the cache in qemu, becauseotherwise the page cache is polluted too much. How much is thedifference really? (I don’t know how good the compression ratio isfor real-world images.)
Hm. I don't know the ratio.. Customer reported that most of RAM ispolluted by Qemu's cache, and we use O_DIRECT for everything except fortarget of compressed backup.. Still the pollution may relate to severalbackups and of course it is simple enough to drop the cache after eachbackup. But I think that even one backup of 16T disk may pollute RAMenough.

Oh, sorry, I just realized I had a brain fart there. I was referring towhether this series improves the page cache pollution. But obviously itwill if it allows you to re-enable O_DIRECT.

Related to that, I remember a long time ago we had some discussionabout letting qemu-img convert set a special cache mode for the targetimage that would make Linux drop everything before the last offsetwritten (i.e., I suppose fadvise() with POSIX_FADV_SEQUENTIAL). Youdiscard that idea based on the fact that implementing a cache in qemuwould be simple, but it isn’t, really. What would the impact ofPOSIX_FADV_SEQUENTIAL be? (One advantage of using that would be thatwe could reuse it for non-compressed images that are written by backupor qemu-img convert.)
The problem is that writes are async. And therefore, not sequential.

In theory, yes, but all compressed writes still goes throughqcow2_alloc_bytes() right before submitting the write, so I wonderwhether in practice the writes aren’t usually sufficiently sequential tomake POSIX_FADV_SEQUENTIAL work fine.

So
I have to track the writes and wait until the whole cluster is filled.It's simple use fadvise as an option to my cache: instead of cachingdata and write when cluster is filled we can instead mark clusterPOSIX_FADV_DONTNEED.
(I don’t remember why that qemu-img discussion died back then.)
Fourth, regarding the code, would it be simpler if it were a purewrite cache? I.e., on read, everything is flushed, so we don’t haveto deal with that. I don’t think there are many valid cases where acompressed image is both written to and read from at the same time.(Just asking, because I’d really want this code to be simpler. I canimagine that reading from the cache is the least bit of complexity,but perhaps...)
Hm. I really didn't want to support reads, and do it only to make itpossible to enable the cache by default.. Still read function is reallysimple, and I don't think that dropping it will simplify the codesignificantly.


That’s too bad.

So the only question I have left is what POSIX_FADV_SEQUENTIAL actuallywould do in practice.


(But even then, the premise just doesn’t motivate me sufficiently yet...)

Max

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [PATCH 0/7] qcow2: compressed write cache, Vladimir Sementsov-Ogievskiy, 2021/02/01
- Re: [PATCH 0/7] qcow2: compressed write cache, Max Reitz, 2021/02/09
  - Re: [PATCH 0/7] qcow2: compressed write cache, Vladimir Sementsov-Ogievskiy, 2021/02/09
    - Re: [PATCH 0/7] qcow2: compressed write cache, Max Reitz <=
    - Re: [PATCH 0/7] qcow2: compressed write cache, Vladimir Sementsov-Ogievskiy, 2021/02/09
    - Re: [PATCH 0/7] qcow2: compressed write cache, Vladimir Sementsov-Ogievskiy, 2021/02/09
    - Re: [PATCH 0/7] qcow2: compressed write cache, Denis V. Lunev, 2021/02/09
    - Re: [PATCH 0/7] qcow2: compressed write cache, Vladimir Sementsov-Ogievskiy, 2021/02/09
    - Re: [PATCH 0/7] qcow2: compressed write cache, Max Reitz, 2021/02/10
    - Re: [PATCH 0/7] qcow2: compressed write cache, Vladimir Sementsov-Ogievskiy, 2021/02/10
    - Re: [PATCH 0/7] qcow2: compressed write cache, Denis V. Lunev, 2021/02/09
    - Re: [PATCH 0/7] qcow2: compressed write cache, Max Reitz, 2021/02/10
- Re: [PATCH 0/7] qcow2: compressed write cache, Kevin Wolf, 2021/02/10
  - Re: [PATCH 0/7] qcow2: compressed write cache, Vladimir Sementsov-Ogievskiy, 2021/02/10

Prev by Date: Re: [PATCH v5 1/2] drivers/misc: sysgenid: add system generation id driver
Next by Date: Re: [PATCH 10/22] Python: add utility function for retrieving port redirection
Previous by thread: Re: [PATCH 0/7] qcow2: compressed write cache
Next by thread: Re: [PATCH 0/7] qcow2: compressed write cache
Index(es):
- Date
- Thread