qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v2 0/3] Fix exceptions handling for MIPS and i38


From: Aurelien Jarno
Subject: Re: [Qemu-devel] [PATCH v2 0/3] Fix exceptions handling for MIPS and i386
Date: Thu, 18 Jun 2015 10:16:40 +0200
User-agent: Mutt/1.5.23 (2014-03-12)

On 2015-06-18 10:12, Pavel Dovgaluk wrote:
> > From: Aurelien Jarno [mailto:address@hidden
> > On 2015-06-17 15:41, Pavel Dovgalyuk wrote:
> > > In icount mode every translation block looks as follows:
> > >
> > > if icount < n then exit
> > > icount -= n
> > > instr1
> > > instr2
> > > ...
> > > instrn
> > > exit
> > >
> > > When one of these instructions initiates an exception, icount should be
> > > restored and adjusted number of instructions should be subtracted from 
> > > icount
> > > instead of initial n.
> > >
> > > tlb_fill function passes retaddr to raise_exception, which allows 
> > > restoring
> > > current instructions in TB and correct icount calculation.
> > >
> > > When exception triggered with other function (e.g. by embedding call to
> > > exception raising helper into TB), then PC is not passed as retaddr and
> > > correct icount is not recovered. In such cases icount will be decreased
> > > by the value equal to the size of TB.
> > 
> > Looking at how icount work, I see it's basically a variable in the CPU
> > state (icount_decr.u16.low), which is already accessed from the TB.
> > Couldn't we adjust it using additional code before generating an
> > exception, when in icount mode.
> > 
> > For example for MIPS, we can add some code before generate_exception
> > which use the value from s->gen_opc_icount[j] to adjust
> > the variable icount_decr.u16.low.
> 
> It is possible, but it will incur additional overhead, because we will 
> have to update icount every time the exception might be generated.
> We'll have to update icount value before and after every helper call, 
> that can cause an exception:
> 
> icount -= n
> ...
> instr_k
> icount += n - k
> helper
> icount -= n - k
> ...
> 
> And this overhead will slowdown the code even if no exception occur.

That's where I might disagree. Retranslation seems a very good idea on
the paper, but in practice it doesn't seems to always bring the
performance improvement it should. In addition it seems to be highly
dependent on the target. Just to give some numbers, on MIPS (as your
patch originally concerns this architecture), 40% of code generation is
actually due to retranslation. The problem is that over the time we have
improved a lot the code generation (liveness analysis, better register
allocation, constant propagation, ...) and thus we have increased the
code generation time. While it clearly has some benefits when this code
is actually executed, it's not the case when the code is simply
retranslated. In short we spend more time to find the CPU state
corresponding to an exception than before.

A simple way to show that is to apply the simple patch below, which
disable retranslation and save the CPU state before each instruction:

diff --git a/target-mips/translate.c b/target-mips/translate.c
index 1d128ee..5238d71 100644
--- a/target-mips/translate.c
+++ b/target-mips/translate.c
@@ -19435,6 +19435,7 @@ gen_intermediate_code_internal(MIPSCPU *cpu, 
TranslationBlock *tb,
     LOG_DISAS("\ntb %p idx %d hflags %04x\n", tb, ctx.mem_idx, ctx.hflags);
     gen_tb_start(tb);
     while (ctx.bstate == BS_NONE) {
+        save_cpu_state(&ctx, 1);
         if (unlikely(!QTAILQ_EMPTY(&cs->breakpoints))) {
             QTAILQ_FOREACH(bp, &cs->breakpoints, entry) {
                 if (bp->pc == ctx.pc) {
diff --git a/translate-all.c b/translate-all.c
index b6b0e1c..3d4c017 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -212,6 +212,8 @@ static int cpu_restore_state_from_tb(CPUState *cpu, 
TranslationBlock *tb,
     int64_t ti;
 #endif
 
+    return -1;
+
 #ifdef CONFIG_PROFILER
     ti = profile_getclock();
 #endif

On x86, this patch brings a 5% boot time improvement on MIPS. One of the
reason is that the TCG code generator has a good knowledge about which
TCG ops or helpers can trigger an exception, so it can optimize out part
of the instructions saving the CPU state. I guess that the host CPUs have
also evolved over the time, now being superscalar and out-of-order so
that saving the CPU state can be done "in background". Also it's just a
quick and dirty patch, we can probably even do better.

All of that to say that I am worried for the performances to see more
paths through the retranslation code, especially on MIPS as it seems to
be costly. That said I haven't really look in details at other targets,
nor hosts.

Now to come back about your patches, we might want to simply fix icount
first, even if it has some performance impact, and deal with the
retranslation issue separately, as it concerns more than just icount.

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
address@hidden                 http://www.aurel32.net



reply via email to

[Prev in Thread] Current Thread [Next in Thread]