qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] util: NUMA aware memory preallocation


From: Daniel P . Berrangé
Subject: Re: [PATCH] util: NUMA aware memory preallocation
Date: Thu, 12 May 2022 09:15:22 +0100
User-agent: Mutt/2.2.1 (2022-02-19)

On Thu, May 12, 2022 at 09:41:29AM +0200, Paolo Bonzini wrote:
> On 5/11/22 18:54, Daniel P. Berrangé wrote:
> > On Wed, May 11, 2022 at 01:07:47PM +0200, Paolo Bonzini wrote:
> > > On 5/11/22 12:10, Daniel P. Berrangé wrote:
> > > > I expect creating/deleting I/O threads is cheap in comparison to
> > > > the work done for preallocation. If libvirt is using -preconfig
> > > > and object-add to create the memory backend, then we could have
> > > > option of creating the I/O threads dynamically in -preconfig mode,
> > > > create the memory backend, and then delete the I/O threads again.
> > > 
> > > I think this is very overengineered.  Michal's patch is doing the obvious
> > > thing and if it doesn't work that's because Libvirt is trying to 
> > > micromanage
> > > QEMU.
> > 
> > Calling it micromanaging is putting a very negative connotation on
> > this. What we're trying todo is enforce a host resource policy for
> > QEMU, in a way that a compromised QEMU can't escape, which is a
> > valuable protection.
> 
> I'm sorry if that was a bit exaggerated, but the negative connotation was
> intentional.
> 
> > > As mentioned on IRC, if the reason is to prevent moving around threads in
> > > realtime (SCHED_FIFO, SCHED_RR) classes, that should be fixed at the 
> > > kernel
> > > level.
> > 
> > We use cgroups where it is available to us, but we don't always have
> > the freedom that we'd like.
> 
> I understand.  I'm thinking of a new flag to sched_setscheduler that fixes
> the CPU affinity and policy of the thread and prevents changing it in case
> QEMU is compromised later.  The seccomp/SELinux sandboxes can prevent
> setting the SCHED_FIFO class without this flag.
> 
> In addition, my hunch is that this works only because the RT setup of QEMU
> is not safe against priority inversion.  IIRC the iothreads are set with a
> non-realtime priority, but actually they should have a _higher_ priority
> than the CPU threads, and the thread pool I/O bound workers should have an
> even higher priority; otherwise you have a priority inversion situation
> where an interrupt is pending that would wake up the CPU, but the iothreads
> cannot process it because they have a lower priority than the CPU.

At least for RHEL deployments of KVM-RT, IIC the expectation is that
the VCPUs with RT priority never do I/O, and that there is at least 1
additional non-RT vCPU from which the OS performs I/O. IOW, the RT
VCPU works in a completely self contained manner with no interaction
to any other QEMU threads. If that's not the case, then you would
have to make sure those other threads have priority / schedular
adjustments to avoid priority inversion

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|




reply via email to

[Prev in Thread] Current Thread [Next in Thread]