[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] Instruction counting instrumentation for ARM + initial
Re: [Qemu-devel] Instruction counting instrumentation for ARM + initial patch
Mon, 25 May 2009 18:04:09 +0300
On Sat, 2009-05-23 at 15:23 +0200, Laurent Desnogues wrote:
> On Wed, May 20, 2009 at 10:35 PM, Vince Weaver <address@hidden> wrote:
> > The main problem is that adding instrumentation infrastructure will either
> > slow down the common case, or else introduce lots of #ifdefs all over the
> > code. Neither is very attractive.
> I don't think adding command-line enabled options will slow down
> the standard translation in a measurable way, provided, for instance,
> it isn't being checked before running every translated block. If it's
> checked before/after translating a block, then it shouldn't effect
I tried to measure the performance difference between vanilla Qemu and
Qemu with this patch but without the command-line switch. As suggested
above, I couldn't measure the difference. However, to disable this
feature compile-time, I think that it should be enough to:
1. define macro instr_count_inc which conveniently eliminates all
function calls to instr_count_inc() and instr_count_inc_init().
2. insert some #ifdefs to disable the framework code (e.g. #ifdef in
CPUARMState to remove counters)
For my small set of workloads, I've measured around 10%..40% overhead
when instruction counting is enabled and this is definitely acceptable
for us. Your mileage may vary.
> > It would be nice if maybe a limited instrumentation architecture could
> > be put into qemu, that could be configured out. It would save the various
> > researchers the problem of everyone re-implementing it differently.
I think this is something that many software developers would be
interested in, too. E.g., getting proper cache utilization etc. BTW, for
on-line cache simulation, wouldn't it be enough to instrument memory
accesses at TCG level (e.g., tcg_gen_ld8u_i32, ...)?
> You don't need to generate an instruction trace as I said in my
> previous mail. For user mode applications, a TB trace is enough
> (of course there are some fine points that can cause trouble to
> derive the instruction trace from a TB trace such as dynamically
> generated code, or TB flushing) to derive an instruction trace.
We considered this when designing the patch. However, we decided to
start with the current implementation due to concerns of dynamically
generated code, as you pointed out. And then there's the system
emulation. Anyways, the most important thing would be to have the
instr_count_inc():s in the decoders. It shouldn't be too hard to change
the implementation later with at most trivial modifications to the
Embedded Software Group / Helsinki University of Technology