qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] performance monitor


From: Clemens Kolbitsch
Subject: Re: [Qemu-devel] performance monitor
Date: Fri, 4 Jan 2008 16:09:18 +0100
User-agent: KMail/1.9.6

On Friday 04 January 2008 09:49:22 Rob Landley wrote:
> On Thursday 03 January 2008 15:38:02 Clemens Kolbitsch wrote:
> > Does anyone have an idea on how I can measure performance in qemu to a
> > somewhat accurate level?
>
> hwclock --show > time1
> tar xvjf linux-2.6.23.tar.bz2 && cd linux-2.6.23 && make allnoconfig &&
> make cd ..
> hwclock --show > time2
>
> Do that on host and client, and you've got a ratio of the performance of
> qemu to your host that should be good to within a few percent.
>
> > I have modified qemu (the memory handling) and the
> > linux kernel and want to find out the penalty this introduced... does
> > anyone have any comments / ideas on this?
>
> If it's something big, you can compare the result in minutes and seconds.
> That's probably the best you're going to do.  (Although really you want
> hwclock --show before and after, and then do the math.  That tunnels out to
> the host system to get its idea of the time, which doesn't get thrown off
> by timer interrupt delivery (as a signal) getting deferred by the host
> system's scheduler.  Of course the fact that hwclock _takes_ a second or so
> to read the clock is a bit of a downer, but anything that takes less than a
> minute or so to run isn't going to give you a very accurate time because
> the performance of qemu isn't constant, and your results are going to skew
> all over the place.
>
> Especially for small things, the performance varies from run to run.  Start
> by imagining qemu as having the mother of all page fault latencies.  The
> cost of faulting code into the L2 cache includes dynamic recompilation,
> which is expensive.
>
> Worse, when the dynamic recompilation buffer fills up it blanks the whole
> thing, and recompiles every new page it hits one at a time until the buffer
> fills up again.  (What is it these days, 16 megs of translated code before
> it resets?)  No LRU or anything, no cache management at _all_, just "when
> the bucket fills up, dump it and start over".  (Well, that's what it did
> back around the last stable release anyway.  It has been almost a year
> since then, so maybe it's changed.  I've been busy with other things and
> not really keeping track of changes that didn't affect what I could and
> couldn't get to run.)
>
> So anyway, depending on what code you run in what order, the performance
> can _differ_ from one run to the next due to when the cache gets blanked
> and stuff gets retranslated.  By a lot.  There's no obvious way to predict
> this or control it.  And the "software" clock inside your emulated system
> can lie to you about it if timer interrupts get deferred.
>
> All this should pretty much average out if you do something big with lots
> of execs (like build a linux kernel from source).  But if you do something
> small expect serious butterfly effects.  Expect microbenchmarks to swing
> around wildly.
>
> Quick analogy: you know the performance difference faulting your executable
> in
>
> >from disk vs running it out of cache?  Imagine a daemon that makes random
>
> intermittent calls to "echo 1 > /proc/sys/vm/drop_caches", and now try to
> do a sane benchmark.  No matter what you use to measure, what you're
> measuring isn't going to be consistent from one run to the next.
>
> Performance should be better (and more stable) with kqemu or kvm.  Maybe
> that you can benchmark sanely, I wouldn't know.  Ask somebody else. :)
>
> P.S.  Take the above with a large grain of salt, I'm not close to an expert
> in this area...

:-)

Ok. What you've said pretty much covers how I've made up my mind in the last 
couple of hours trying to think about the problem *g*

Guess I'll have to be happy counting TLB misses and page faults, adding up 
executed instructions (in user/kernel mode) per process and doing some timing 
stuff... then running the examples a lot of times, making an average of all 
numbers and finally just ignoring them since I *know* that they are bogus ;-)

No, seriously... I understand the problem, but I think the above is the best I 
can do since I'm really only interested in the effekt it has on QEMU for the 
moment :-)

Thanks again for your ideas!!






reply via email to

[Prev in Thread] Current Thread [Next in Thread]