[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] Instruction counting instrumentation for ARM + initial
Re: [Qemu-devel] Instruction counting instrumentation for ARM + initial patch
Sat, 23 May 2009 15:23:45 +0200
On Wed, May 20, 2009 at 10:35 PM, Vince Weaver <address@hidden> wrote:
> I wonder if a simplistic stats gathering frame work could be added to Qemu.
> The problem is there currently are at least 3 users of Qemu:
> 1. People who want fast simulation
> 2. People who are doing virtualization
> 3. People trying to do instrumentation/research
> Unfortunately those three groups have conflicting interests.
> The main problem is that adding instrumentation infrastructure will either
> slow down the common case, or else introduce lots of #ifdefs all over the
> code. Neither is very attractive.
I don't think adding command-line enabled options will slow down
the standard translation in a measurable way, provided, for instance,
it isn't being checked before running every translated block. If it's
checked before/after translating a block, then it shouldn't effect
> It would be nice if maybe a limited instrumentation architecture could
> be put into qemu, that could be configured out. It would save the various
> researchers the problem of everyone re-implementing it differently.
> It would be nice to have:
> 1. A way to dump an instruction trace (address, length (for CISC),
> and opcode, CPU# for multi-thread)
> 2. A way to dump memory traces (address, length, possibly the value
> loaded/stored, CPU# for multi-thread)
> 3. A way to dump basic-block entry/exit
> Many of the various research metrics can be gained from these stats.
> #1 and #2 are enough for cache simulators.
> #1 (if post-processed) is enough to get a frequency plot for instruction
> count and type.
> #1 can be used to extrapolate branch-taken statistics for branch
> #3 Can be used for basic block vectors, or to get faster instruction
You don't need to generate an instruction trace as I said in my
previous mail. For user mode applications, a TB trace is enough
(of course there are some fine points that can cause trouble to
derive the instruction trace from a TB trace such as dynamically
generated code, or TB flushing) to derive an instruction trace.
As an example, my TB counter requires <30% more time to run
one of the SPEC 2k tests, while a full TB trace (binary format
>2.7 GB) going to a file doubles the run-time, which is very
acceptable. Of course, you then need to process the output
using other programs.
Getting memory traces would be more intrusive and would
certainly slow down simulation significantly.
> Pin manages to have their null plugin run very fast; at least one
> of the Spec2k binaries runs faster translated than it does natively.
Too bad they only support x86 now and are not open source.
Anyway Pin is not the only binary tool that can make programs
faster, Diablo (LTO) also was able to speed up ARM programs.