[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] performance monitor
From: |
Rob Landley |
Subject: |
Re: [Qemu-devel] performance monitor |
Date: |
Fri, 4 Jan 2008 02:49:22 -0600 |
User-agent: |
KMail/1.9.6 (enterprise 0.20070907.709405) |
On Thursday 03 January 2008 15:38:02 Clemens Kolbitsch wrote:
> Does anyone have an idea on how I can measure performance in qemu to a
> somewhat accurate level?
hwclock --show > time1
tar xvjf linux-2.6.23.tar.bz2 && cd linux-2.6.23 && make allnoconfig && make
cd ..
hwclock --show > time2
Do that on host and client, and you've got a ratio of the performance of qemu
to your host that should be good to within a few percent.
> I have modified qemu (the memory handling) and the
> linux kernel and want to find out the penalty this introduced... does
> anyone have any comments / ideas on this?
If it's something big, you can compare the result in minutes and seconds.
That's probably the best you're going to do. (Although really you want
hwclock --show before and after, and then do the math. That tunnels out to
the host system to get its idea of the time, which doesn't get thrown off by
timer interrupt delivery (as a signal) getting deferred by the host system's
scheduler. Of course the fact that hwclock _takes_ a second or so to read
the clock is a bit of a downer, but anything that takes less than a minute or
so to run isn't going to give you a very accurate time because the
performance of qemu isn't constant, and your results are going to skew all
over the place.
Especially for small things, the performance varies from run to run. Start by
imagining qemu as having the mother of all page fault latencies. The cost of
faulting code into the L2 cache includes dynamic recompilation, which is
expensive.
Worse, when the dynamic recompilation buffer fills up it blanks the whole
thing, and recompiles every new page it hits one at a time until the buffer
fills up again. (What is it these days, 16 megs of translated code before it
resets?) No LRU or anything, no cache management at _all_, just "when the
bucket fills up, dump it and start over". (Well, that's what it did back
around the last stable release anyway. It has been almost a year since then,
so maybe it's changed. I've been busy with other things and not really
keeping track of changes that didn't affect what I could and couldn't get to
run.)
So anyway, depending on what code you run in what order, the performance can
_differ_ from one run to the next due to when the cache gets blanked and
stuff gets retranslated. By a lot. There's no obvious way to predict this
or control it. And the "software" clock inside your emulated system can lie
to you about it if timer interrupts get deferred.
All this should pretty much average out if you do something big with lots of
execs (like build a linux kernel from source). But if you do something small
expect serious butterfly effects. Expect microbenchmarks to swing around
wildly.
Quick analogy: you know the performance difference faulting your executable in
from disk vs running it out of cache? Imagine a daemon that makes random
intermittent calls to "echo 1 > /proc/sys/vm/drop_caches", and now try to do
a sane benchmark. No matter what you use to measure, what you're measuring
isn't going to be consistent from one run to the next.
Performance should be better (and more stable) with kqemu or kvm. Maybe that
you can benchmark sanely, I wouldn't know. Ask somebody else. :)
P.S. Take the above with a large grain of salt, I'm not close to an expert in
this area...
Rob
--
"One of my most productive days was throwing away 1000 lines of code."
- Ken Thompson.