qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Re: Performance Monitoring


From: Vince Weaver
Subject: Re: [Qemu-devel] Re: Performance Monitoring
Date: Thu, 22 May 2008 23:38:21 -0400 (EDT)

> I would like to run an OS, say Linux, and take a sample for a small period
> of time (seconds) while some app(s) are running and get a list of opcode
> names and how many times they were executed. I'm not interested in CPI at
> the moment.

What you are trying to do is relatively straightforward, especially if you
are going to be running binaries from a RISC type machine.

The way I'd recommend doing it is getting Qemu to output the raw
instruction stream, and then write an external program that
decodes the instructions and counts what kinds are in each.  This
is fairly straightforward to do on an arch like MIPS; it would be
very complicated on something like x86.

I have some code I can dig up that does this kind of thing (I used
it to run a branch predictor simulator).  I'll include it at the end ot
this e-mail.

>    - Paul mentioned "With either alternative you'll still have issues with
>    exceptions. MMU faults abort a TB early, so will screw up your statistics.
>    One possibility is to terminate a TB on every memory access, like we do for
>    watchpoints." - is this an issue addressed by your patch?

I've actually only tested my method of generating things with the
userspace linux-user type method of emulation, I haven't tested it at all
when doing full-system simulation.  I'd imagine it would still work.


Here's the code.  It's based on a pre-TCG version of Qemu so you can't use
it on the latest snapshots.  It also only works with MIPS, but it
probably will be similar with other architectures.  The code
buffers a large block of values before writing it out (for performance).
To avoid creating huge traces to disk (and they will be huge) you
can write to a named pipe (mkfifo) and have your analysis routine
run at the same time reading in from the same pipe.

Hopefully if I am doing something horribly wrong with this code, someone
will correct me.  I've been using it for a while now though and have been
getting good results when compared to hw perf counters.


This adds code to dump the pc and instruction every executed instruction:

--- ./target-mips/translate.c   2008-04-23 12:23:55.000000000 -0400
+++ ./target-mips/translate.c   2008-05-22 23:31:13.000000000 -0400
@@ -6696,6 +6696,7 @@
             gen_opc_instr_start[lj] = 1;
         }
         ctx.opcode = ldl_code(ctx.pc);
+        gen_op_dump_brpred(ctx.pc,ctx.opcode);
         decode_opc(env, &ctx);
         ctx.pc += 4;


Add this to "op.c"

void op_dump_brpred(void) {
   helper_dump_brpred(PARAM1,PARAM2);
}

Add this to "helper.c":

static int brpred_fd=-1,brpred_ptr=0;

static char error_message[]="Write error!\n";

struct brpredtype {
   unsigned int addr;
   unsigned int insn;
} __attribute__((__packed__));

#define TRACE_UNITS 4096

static struct brpredtype brpred_buf[TRACE_UNITS];

void helper_dump_brpred(unsigned long address,unsigned long insn) {

     int result;

     if (brpred_fd<0) {
        brpred_fd=creat("trace.bpred",0666);
     }

     brpred_buf[memtrace_ptr].addr=address;
     brpred_buf[memtrace_ptr].insn=insn;

     brpred_ptr++;

     if (brpred_ptr>TRACE_UNITS) {
        brpred_ptr=0;
        result=write(brpred_fd,brpred_buf,
                     TRACE_UNITS*sizeof(struct brpredtype));
        if (result!=TRACE_UNITS*sizeof(struct brpredtype)) {
           write(2,error_message,13);
        }
     }
}






reply via email to

[Prev in Thread] Current Thread [Next in Thread]