[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unpredictable performance degradation in QEMU KVMs

From: Frantisek Rysanek
Subject: Re: Unpredictable performance degradation in QEMU KVMs
Date: Wed, 06 Oct 2021 22:33:58 +0200

Hello Parnell,

I'm just a part-time Linux admin / enthusiast, by no means a Dev-Ops 
professional. I have some historical experience under the hood as a 
HW/OS troubleshooter. So whatever I voice here is just my "two cents 

One last note regarding the hypothetical "architecture discrepancy": 
the x86_64 instruction set is "modular". QEMU has an option (or maybe 
it's the default) to "pass through" the CPU feature set from the host 
to guest (see the "flags" row in /proc/cpuinfo). Thus, especially 
given VT-x, no instructions need to be "emulated" - software in the 
guest can see what the host CPU can provide in bare metal, and all 
the instructions in the VM guest run on bare metal of the host CPU.

If I understand correctly, when the "performance degradation" 
happens, the affected VM guest is still basically functional, 
accessible, can be inspected, given the right software tools it can 
collect "metrics" and either store them locally or make the data 
available over the network - correct?

When debugging some intermittent phenomena, my favourite approach is 
to measure and record and graph whatever interesting data I can come 
across. In one-second interval, if need be.

There are ready-made tools such as Nagios (and many others in that 
vein) which can help you do this data collection and graphing in a 
centralized fashion - provided that you can learn to work with those 

In my daily practice, rather than install and configure Nagios (which 
I'm not familiar with), I tend to cobble together simple scripts or 
dedicated C proggies that produce timestamped textual CSV format, 
which can then be graphed using e.g. Gnuplot. I use this in a 
different area of interest, I've never felt a need to collect basic 
system stats, but my approach should be easily applicable...
I can provide some examples if you want.

Even if your goal, for a particular "metric", is not to turn it into 
a time-series chart (and therefore invest some effort to extract the 
data from some half-convenient "data source"), it may make good sense 
just to log available data in the raw format they are in, into a 
file, with timestamps, for later reference...

Apart from top and latencytop, there is iotop and iostat (from the 
sysstat package), for network traffic there is nethogs or iftop. My 
general objection against these tools is, that many of them are 
"interactive" / full screen = do not produce a "scrolling output on 
stdout", viable for storing in a log for later use... For continuous 
collection, either the particular tool has a cmdline option for 
non-interactive streaming output, or you need to look for a different 
One possibility is to install snmpd = the SNMP agent, which comes 
packaged with a subagent dedicated to local system monitoring. Not 
sure what variables are served and how exhaustive or useful for your 
case these are.
SNMP can be polled using tools such as Nagios and friends, or using 
custom/dedicated tools - here's one of my own:
For interactive browsing of the SNMP tree, you can use a tool called 
"snmpb" or some commercial work-alikes...

And that's probably not the end of your options.

Just use tools that are familiar to you - thus saving time needed to 
configure the data collection and analysis "framework"... 


On 6 Oct 2021 at 10:58, Parnell Springmeyer wrote:
> Hi Frantisek, thanks for replying. 
> I've not checked using `latencytop`. I will do that, thanks for the 
> suggestion.
> The most frustrating problem is that the degradation in performance 
> is so far very hard to reproduce manually so we haven't really been 
> able to determine if it's a CPU performance issue, storage IO, or 
> contention.
> Not dumb questions, you're talking to someone who doesn't work on 
> this sort of technology much, so it is very helpful to get an idea of 
> what I might or should look at.
> I know we use the same architecture so we can eliminate that as an 
> issue.
> Thanks for the feedback, I'll see if I can discover anything 
> interesting given the ideas you've suggested I poke around at.
> On Wed, Oct 6, 2021 at 4:06 AM Frantisek Rysanek 
> <Frantisek.Rysanek@post.cz> wrote:
>     On 5 Oct 2021 at 18:58, Parnell Springmeyer wrote:
>     >
>     > Hi, we use QEMU VMs for running our integration testing
>     > infrastructure and have run into a very difficult to debug problem:
>     > occasionally we will see a severe performance degradation in some of
>     > our QEMU VMs.
>     >
>     If memory serves, QEMU guests appear to run as processes in the Linux
>     host instance. I'm not "in the know enough" to tell you, how much is
>     possibly happening under the hood in the kernel support side of
>     things, which is potentially not well described by that superficial
>     abstraction visible in "top".
>     Esoteric issues aside (CPU arch incompatibilities between host and
>     guest), have you tried inspecting what the load looks like, in the
>     guest and in the host OS instance? What does "top" show? With CPU
>     cores expanded? (press "1")
>     Have you tried "latencytop" by any chance?
>     Are you sure this is a CPU performance/emulation issue?
>     What storage are your VM's using? Could storage be the bottleneck?
>     Isn't the observed "sluggishness" storage-io-bound, rather than CPU
>     bound? Can you tell the difference? (Heck... apologies, that's
>     probably a series of dumb questions to someone @arista.com)
>     Stuff can get sluggish when IRQ's don't work right. Any signs of that
>     in the guest instance? Interesting messages in dmesg, interesting
>     numbers in /proc/interrupts?
>     CPU arch emulation issues (guest vs. host) might also be an issue. If
>     you specify a different CPU core for the guest than the host actually
>     has, you may get some fringe parts of the instruction set, even
>     within the x86_64 family, that needs to be tediously emulated for the
>     guest instance... also, I'd hazard a guess 32bit vs. 64bit *might*
>     play a role, albeit marginal. I have fond memories of the 387 math
>     co-processor emulation (and its effects on program runtime), but
>     that's a *long* time ago :-)
>     I've seen EXT3 and EXT4 hang for no apparent reason, on bare metal,
>     under heavy IOps stress. CPU consumption at 0%, disk IOps at pure 0,
>     but the filesystem would block forever in a standstill. If I recall
>     correctly, I used Bonnie++ to generate that kind of stress
>     reproducibly, against fast block storage (HW RAID back then). There
>     was no QEMU in the game.
>     = feel free to add some juicy detail for us to ponder :-)
>     Frank
> --
> Parnell Springmeyer

reply via email to

[Prev in Thread] Current Thread [Next in Thread]