qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] [PATCH] Instruction counting instrumentation for ARM, 2nd v


From: Sami Kiminki
Subject: [Qemu-devel] [PATCH] Instruction counting instrumentation for ARM, 2nd version
Date: Fri, 12 Jun 2009 13:56:08 +0300

Hi,

Attached is the second version of Timo Töyry's instruction counting
patch for arm-linux-user targets. Applies to QEMU 0.10.4. To try this,
just add -instrcount to your regular qemu-arm command line. See [1] for
description on how this is implemented.

The patch should now be complete for ARMv6, VFP and Thumb. However,
Thumb2, NEON, and more advanced VFP instrumentation is still missing due
to lacking public documentation from ARM. Measured overhead for
instruction counting using this approach is between 5-35%, and overhead
when the patch is applied but instruction counting disabled is very
close to 0%. See below for details.

Instruction counting can be used to choosing the most appropriate CPU
variant for an embedded application. It can also be used as a part of a
more detailed performance analysis, as instruction mix reflects the
performance of the code.

There are also other ways to implement instruction counting (see
responses to our previous patch [1]). As to our approach, I believe this
is the cleanest way to implement this specific instrumentation. However,
from the general point of view, I'm not so sure. I'd like to ask for
more comments and options.

Two main reasons for us for to construct this patch: First, we needed
the functionality. Second, we're considering using QEMU as a platform
for other instrumentation work too, so we thought to start with
something simple enough.

What we would really like to see is a more general instrumentation
framework, not just instruction counting and not just for ARM targets.
The reason for this is that many measurements are so much easier to do
in emulated/simulated environment than in real HW (e.g. instruction
counting) and doing them in QEMU is way faster than using an ISA
interpreter such as Valgrind. Some other instrumentation examples that
come to my mind are cache usage efficiency analysis, branch profiling,
and naturally instruction and memory access tracing.

A specific bonus of such framework would be that it could be used for
the profiling part of possible future optimizations in code translation.
Consider IA-32 EL (Intel's x86 emulator for Itanium).

Anyways, I'd like to know if there's a general interest in this kind of
framework, and if so, how we should proceed to get it implemented into
mainline.

Regards,
Sami Kiminki
Embedded Software Group / Helsinki University of Technology


References:
[1] http://lists.gnu.org/archive/html/qemu-devel/2009-05/msg00922.html


Overhead measurements
=====================
(Core 2 Q6600, QEMU compiled with gcc 4.3.3 and benchmarks with gcc
4.3.2 (arm1176jzf-s), average of 5 runs)

Vanilla QEMU:
- dhrystone:          1.709 s
- dhrystone thumb:    2.185 s
- mibench/jpeg/cjpeg: 0.453 s
- mibench/mad:        0.822 s

Patched QEMU, instruction counting disabled:
- dhrystone:          1.700 s (-0.5%)
- dhrystone thumb:    2.172 s (-0.6%)
- mibench/jpeg/cjpeg: 0.456 s (+0.7%)
- mibench/mad:        0.820 s (-0.2%)

Patched QEMU, instruction counting enabled:
- dhrystone:          1.882 s (+10%)
- dhrystone thumb:    2.300 s (+5%)
- mibench/jpeg/cjpeg: 0.513 s (+13%)
- mibench/mad:        1.078 s (+31%)



Attachment: qemu-0.10.4-instrcount.patch
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]