qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v3 9/9] tcg: Lower indirect registers in a separ


From: Aurelien Jarno
Subject: Re: [Qemu-devel] [PATCH v3 9/9] tcg: Lower indirect registers in a separate pass
Date: Mon, 25 Jul 2016 21:23:42 +0200
User-agent: Mutt/1.6.0 (2016-04-01)

On 2016-06-23 20:48, Richard Henderson wrote:
> Rather than rely on recursion during the middle of register allocation,
> lower indirect registers to loads and stores off the indirect base into
> plain temps.
> 
> For an x86_64 host, with sufficient registers, this results in identical
> code, modulo the actual register assignments.
> 
> For an i686 host, with insufficient registers, this means that temps can
> be (temporarily) spilled to the stack in order to satisfy an allocation.
> This as opposed to the possibility of not being able to spill, to allocate
> a register for the indirect base, in order to perform a spill.
> 
> Signed-off-by: Richard Henderson <address@hidden>
> ---
>  include/qemu/log.h |   1 +
>  tcg/optimize.c     |  31 +-----
>  tcg/tcg.c          | 306 
> +++++++++++++++++++++++++++++++++++++++++++----------
>  tcg/tcg.h          |   4 +
>  util/log.c         |   5 +-
>  5 files changed, 263 insertions(+), 84 deletions(-)

This patch is a difficult one to review... On the purely technical side
it does what it is supposed to do and I haven't found any issue, though
it's probably very easy to miss one in this kind of code. I have done
tests with various sparc images and I haven't found any obvious
regression on an x86_64 host.

Now on the less technical side, I really like the idea of being able to
transform more or less in place the TCG instruction stream. Your more or
less recent patches towards that direction are great. That said I am a
bit worried that we loop many times on the various ops. We used to have
one forward pass (optimizer) and one backward pass (liveness analysis).
Your patch adds up to two additional passes (one forward and one
backward), this clearly has a cost. Given that indirect registers bring
a lot of performance I think it is worth it. Now I wonder if there is
any way to do the lowering of registers earlier, I mean before the
liveness analysis. This would probably generate plenty of useless ops,
but that are later removed by the liveness analysis. Maybe you have
already try that?

I think it also depends on which direction we want to go with TCG,
either plenty of small independent optimization passes, or keep the
number of passes limited which means more complex code. Contrary to
a compiler we have to do a much more difficult trade-off between the
optimization time and the level of optimization.

Nevertheless I think it's the correct way to go forward for now and
this patch fixes real issues on hosts with limited registers. Maybe just
add a note saying there *might* be better ways to do that.

Reviewed-by: Aurelien Jarno <address@hidden>

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
address@hidden                 http://www.aurel32.net



reply via email to

[Prev in Thread] Current Thread [Next in Thread]