Re: [Qemu-devel] Profiling Qemu for speed?

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Profiling Qemu for speed?

From:	Ian Rogers
Subject:	Re: [Qemu-devel] Profiling Qemu for speed?
Date:	Mon, 18 Apr 2005 10:51:02 +0100
User-agent:	Mozilla Thunderbird 0.8 (X11/20040913)

There are some code sequences that are quite common, for example comparefollowed by branch. A threaded decoder tends to look like:


... // do some work
load <instruction>
mask out opcode
address_of_decoder = load decoder_lookup<opcode>
goto *address_of_decoder

but if you say compare and branch are common then possibly

compare_instruction:
... // do some work
load <instruction>
mask out opcode
if opcode == branch then goto branch_decoder
address_of_decoder = load decoder_lookup<opcode>
goto *address_of_decoder

is more optimal, as in the branch case you only have one load. So itseems you can use knowledge of common code sequences to remove onememory access. The overall effect of this is going to come down to thebranch prediction hardware too - so a win isn't obvious. If you havelonger code sequences then you can use specialization to create aspeciailized and optimally laid out decoder.

I'm not sure if you can get GCC to generate code sequences like this,but you probably at least need to use the -fprofile-generate and-fprofile-use options

http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

I'm doing this in Java and it appears to be worth upto a 10% speedup inan interpreter. So possibly this would save 10% in the translator.


Sorry if this is obvious. Regards,

Ian Rogers
-- http://www.binarytranslator.org/


Daniel J Guinan wrote:

This conversation, below, is very interesting.  It is precisely this
part of QEMU that fascinates me and potentially holds the most promise
for performance gains.  I have even imagined using a genetic algorithm
to discover optimal block-sizes and instruction re-ordering and
whatnot.  This could be done in order to generate translation tables of
guest instruction sequences and host translated instruction sequences.
Even if  only a handful of very common sequences were translated in
this fashion, the potential speedups are enormous.

Before even discussing the exotic possibilities, however, we need to
figure out what is possible within the framework of the current QEMU
translation system.  Rewiring QEMU to support translating sequences
(blocks of instructions) rather than single instructions may or may not
be necessary.  It should be rather simple to instrument QEMU to keep
track of the most common sequences in order to figure out if there are,
in fact, sequences that show up with a high enough frequency to make
this endeavor worthwhile (I would think the answer would be yes).Then, someone skilled in machine code for the host and guest could take
a stab at hand-coding the translation for the most common couple of
sequences to see how the performance gains come out.

I would love to see some work in this direction and would be willing to
help, although my skills are limited in x86 machine.

-Daniel
One thought would be to have a peephole optimizer that looks back
over
the just translated basic block (or a state machine that matches such
sequences as an on-line algorithm) and match against common, known
primitive sequences, and replaces them with optimized versions.

The kind of profiling you would want to do here is to run, say,
windows
and take a snapshot of the dynamic code cache, and look for common
instruction sequences. Ideally, you could write some software to do
this
automatically.

Anyway, I'm sure there are lots of other ideas laying around.


-- John.
Another thing I've thought about is checking what sequences ofinstructions often appear in x86 programs (such as e.g. "push %ebp;movl %esp, %ebp") and then creating C-functions which emulate such an
antire block, so they can be optimized as a whole by gcc. That wouldgive a similar performance gain on all supported targets, and notjuston the one you created the peephole optimizer for (+ less work todebug).
The only possible downside is that you can't jump to a particularinstruction in such a block (the same goes for several kinds ofpeephole optimizations though). I don't know yet how Qemu exactlykeepstrack of the translations it has already performed, whether itsupportsmultiple existing translations of the same instruction and/or whether
it can already automatically invalidate the old block in case it
turnsout it needs to be splitted and thus re-translated (I guess it should
at least some of these things, since it theory an x86 could jump into
the middle of an instruction in order to reinterpret the bytes asanother instruction stream).
Jonas


Unfortunately it's not that simple. The push instruction may cause an

exception. Whatever optimizations you apply you've got to make sure
that theguest state is still consistent when the exception occurs.
Paul


If we just concatenate the C code of the two procedures, won't gcc
takecare of that for us? Or could scheduling mess this up? Maybe there'saswitch to avoid having it reschedule instructions in a way that sideeffects happen in a different order? (that would still give us theadvantage of CSE and peephole optimizations)
Jonas
_______________________________________________
Qemu-devel mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/qemu-devel

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] Profiling Qemu for speed?, (continued)
- Re: [Qemu-devel] Profiling Qemu for speed?, John R. Hogerhuis, 2005/04/17
  - Re: [Qemu-devel] Profiling Qemu for speed?, Jonas Maebe, 2005/04/17
    - Re: [Qemu-devel] Profiling Qemu for speed?, Paul Brook, 2005/04/17
    - Re: [Qemu-devel] Profiling Qemu for speed?, Jonas Maebe, 2005/04/17
    - Re: [Qemu-devel] Profiling Qemu for speed?, Nathaniel G H, 2005/04/17
    - Re: [Qemu-devel] Profiling Qemu for speed?, John R. Hogerhuis, 2005/04/17
    - Re: [Qemu-devel] Profiling Qemu for speed?, André Braga, 2005/04/17
    - Re: [Qemu-devel] Profiling Qemu for speed?, Karl Magdsick, 2005/04/18
- Re: [Qemu-devel] Profiling Qemu for speed?, Paul Brook, 2005/04/17
- Re: [Qemu-devel] Profiling Qemu for speed?, Daniel J Guinan, 2005/04/18
  - Re: [Qemu-devel] Profiling Qemu for speed?, Ian Rogers <=
    - Re: [Qemu-devel] Profiling Qemu for speed?, Daniel Egger, 2005/04/18
    - Re: [Qemu-devel] Profiling Qemu for speed?, Christian MICHON, 2005/04/18
    - Re: [Qemu-devel] Profiling Qemu for speed?, Ian Rogers, 2005/04/18
    - Re: [Qemu-devel] Profiling Qemu for speed?, Ian Rogers, 2005/04/18
    - Re: [Qemu-devel] Profiling Qemu for speed?, Paul Brook, 2005/04/18
- Re: [Qemu-devel] Profiling Qemu for speed?, Daniel J Guinan, 2005/04/18

Prev by Date: [Qemu-devel] [patch] fix compilation of sdlaudio debugging
Next by Date: [Qemu-devel] [patch] remove S3VGA
Previous by thread: Re: [Qemu-devel] Profiling Qemu for speed?
Next by thread: Re: [Qemu-devel] Profiling Qemu for speed?
Index(es):
- Date
- Thread