qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v3 9/9] tcg: Lower indirect registers in a separ


From: Richard Henderson
Subject: Re: [Qemu-devel] [PATCH v3 9/9] tcg: Lower indirect registers in a separate pass
Date: Thu, 4 Aug 2016 00:57:20 +0530
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.1.1

On 07/26/2016 12:53 AM, Aurelien Jarno wrote:
Now on the less technical side, I really like the idea of being able to
transform more or less in place the TCG instruction stream. Your more or
less recent patches towards that direction are great. That said I am a
bit worried that we loop many times on the various ops. We used to have
one forward pass (optimizer) and one backward pass (liveness analysis).
Your patch adds up to two additional passes (one forward and one
backward), this clearly has a cost. Given that indirect registers bring
a lot of performance I think it is worth it. Now I wonder if there is
any way to do the lowering of registers earlier, I mean before the
liveness analysis. This would probably generate plenty of useless ops,
but that are later removed by the liveness analysis. Maybe you have
already try that?

No, I did not try that, simply because we don't do liveness analysis of memory. And that's what we have with lowering indirect registers earlier. Indeed, it means we're right back where we were before introducing them.

We need liveness analysis on the tcg globals in order to know where to add the reads and writes. I see no way around that.

The one place where the code could be improved to remove a pass is to have the indirect lowering pass update liveness at the same time. We need accurate liveness in order to satisfy the asserts in the final code generation pass, so we have to do something. I simply thought it was easier to re-run the original liveness pass rather than complicating the indirect lowering pass.

I think it also depends on which direction we want to go with TCG,
either plenty of small independent optimization passes, or keep the
number of passes limited which means more complex code. Contrary to
a compiler we have to do a much more difficult trade-off between the
optimization time and the level of optimization.

Indeed. Fewer passes over large amounts of data is better, but I'm not sure we have "large" amounts of data for the average TB. On the other hand, smaller passes can reduce the code size of any one loop so that each fits in icache when one unified pass might not.


r~



reply via email to

[Prev in Thread] Current Thread [Next in Thread]