qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: qemu-img cache modes with Linux cgroup v1


From: Daniel P . Berrangé
Subject: Re: qemu-img cache modes with Linux cgroup v1
Date: Mon, 31 Jul 2023 18:19:55 +0100
User-agent: Mutt/2.2.9 (2022-11-12)

On Mon, Jul 31, 2023 at 11:40:36AM -0400, Stefan Hajnoczi wrote:
> Hi,
> qemu-img -t writeback -T writeback is not designed to run with the Linux
> cgroup v1 memory controller because dirtying too much page cache leads
> to process termination instead of usual non-cgroup and cgroup v2
> throttling behavior:
> https://bugzilla.redhat.com/show_bug.cgi?id=2196072

Ewww, a horrible behavioural change v1 is imposing on apps :-(

QEMU happens to hit it because we do lots of I/O, but plenty of
other apps do major I/o and can fall into the same trap :-( I
can imagine that simply running a big "tar zxvf" would have much
the same effect in terms of masses of I/O in a short time.

> I wanted to share my thoughts on this issue.
> 
> cache=none bypasses the host page cache and will not hit the cgroup
> memory limit. It's an easy solution to avoid exceeding the cgroup v1
> memory limit.

I go further and say that is a good recommendation even without
this bug in cgroups v1.

writeback caching helps if you have lots of free memory, but on
virtualization hosts memory is usually the biggest VM density
constraint, so apps shouldn't generally expect there to be lots
of free host memory to burn as I/O cache.

If you're using qemu-img in preparation for running qemu-system-XXX
and the latter will use cache=none anyway, then it is even less
desirable for qemu-img to fill the host cache with pages that won't
be accessed again when the VM starts in qemu-system-XXXX.

> However, not all Linux file systems support O_DIRECT and qemu-img's I/O
> pattern may perform worse under cache=none than cache=writeback.
> 
> 1. Which file systems support O_DIRECT in Linux 6.5?
> 
> I searched the Linux source code for file systems that implement
> .direct_IO or set FMODE_CAN_ODIRECT. This is not exhaustive and may not
> be 100% accurate.
> 
> The big name file systems (ext4, XFS, btrfs, nfs, smb, ceph) support
> O_DIRECT. The most obvious omission is tmpfs.

Rather than trying to fogure out a list of FS types, in openstack,
a bit of code was added to simply attempt to open a test file with
O_DIRECT on the target filesystem. If that works then run qemu-img
/ qemu-system-XXX with cache=none, otherwise use cache=writeback.
IOW, a "best effort" to avoid host cache where supported.

Could there be justification for QEMU to support a "best effort"
host cache bypass mode natively, to avoid every app needing to
re-implement this logic to check for support of O_DIRECT ?

eg a QEMU 'cache=trynone' option instead of 'cache=none' ?


> 2. Is qemu-img performance with O_DIRECT acceptable?
> 
> The I/O pattern matters more with O_DIRECT because every I/O request is
> sent to the storage device. This means buffer sizes matter more (more
> small I/Os have higher overhead than fewer large I/Os). Concurrency can
> also help saturate the storage device.

"qemu-img convert" supports the '--parallel' flag to use many
coroutines for I/O

> If you switch to O_DIRECT and encounter performance problems then
> qemu-img can be optimized to send I/O patterns with less overhead. This
> requires performance analysis.

Since we're in pretty direct control of the I/O pattern qemu-img imposes,
it feels very sensible to optimize it to such that cache=none achieves
ideal performance.  


> 3. Using buffered I/O because O_DIRECT is not universally supported?
> 
> If you can't use O_DIRECT, then qemu-img could be extended to manage its
> dirty page cache set carefully. This consists of picking a budget and
> writing back to disk when the budget is exhausted.

IOW, re-implementing what the kernel should already be doing for us :-(

This feels like the least desirable thing for QEMU to take on, especially
since cgroups v1 is an evolutionary dead-end, with v2 increasingly taking
over the world.


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|




reply via email to

[Prev in Thread] Current Thread [Next in Thread]