qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC][PATCH v0 0/8] Improve register allocator


From: Richard Henderson
Subject: Re: [Qemu-devel] [RFC][PATCH v0 0/8] Improve register allocator
Date: Tue, 24 May 2011 09:07:20 -0700
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.17) Gecko/20110428 Fedora/3.1.10-1.fc14 Thunderbird/3.1.10

On 05/24/2011 04:31 AM, Kirill Batuzov wrote:
> 
> 
> On Mon, 23 May 2011, Aurelien Jarno wrote:
> 
>>
>> Thanks for this patch series. Your approach to solve this issue is
>> really different than mine. Instead I added more state to the dead/live
>> states, and use them to mark some input deads even for global, and mark
>> some output arguments to be synced. This informations are then used
>> directly in the tcg_reg_alloc_* functions to make better usage of the
>> available registers. On the other hand my patch series only tries to
>> really lower the number of spills and doesn't try to make better spill
>> choices.
>>
>> I guess it would be a good idea that I continue with this approach (I
>> basically just have to fix a few cases were some regs are wrongly copied
>> back to memory), so that we can more easily compare the two approaches.
>> Your last patch is anyway interesting, having some statistics is always
>> something interesting.
>>
>> In any case I really think we need a better register allocator before we
>> can do any serious optimization passes like constant or copy propagation,
>> otherwise we end up with a lot of register in use for no real reason.
>>
> When I started working on this patch series I first wanted to write a
> better register allocator, something linear scan based.  But TBs
> currently have quite specific and very simple structure.  They have globals 
> which are alive everywhere and temps, packed in a count of nests.  Each nest
> is a result of translation of one guest instruction.  Live ranges of temps in
> one nest always intersect, while live ranges of temps from different
> nests never intersect.  As a result more sophisticated algorithm being
> applied to this test case works very similar to a simple greedy algorithm we
> have right now.

Something that would be helpful for the RISC hosts would be to add some
mechanism to add constants -- or constant fragments, if you like -- into
the register allocation mix.

If you have access to a Sparc or PPC host (perhaps emulated under qemu),
have a look at the code generated for an i386, or even arm executable.
You'll see lots of similar constants being created, all in a 2-3 insn
sequence.  Have a look at the code generated for a 64-bit target like
Alpha and it'll be a 4-6 insn sequence.

Ideally we'd be able to register-allocate these partial constant loads,
and so collapse similar sequences.  We have tons of registers that are
not being used on these hosts, which seems a shame.


r~



reply via email to

[Prev in Thread] Current Thread [Next in Thread]