qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] target-mips: apply workaround for TCG optimizat


From: Paolo Bonzini
Subject: Re: [Qemu-devel] [PATCH] target-mips: apply workaround for TCG optimizations for MFC1
Date: Tue, 14 Jul 2015 22:56:50 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.0.1


On 14/07/2015 20:37, Aurelien Jarno wrote:
>> > 
>> > I certainly don't have a global view, so much that I didn't think at 
>> > all of the optimizer... Instead, it looks to me like a bug in the 
>> > register allocator.  In particular this code in tcg_reg_alloc_mov:
> That's exactly my point when I said that someone doesn't have a global
> view. I think the fact that we don't check for type when simplifying
> moves in the register allocator is intentional, the same way we simply
> transform the trunc op into a mov op (except on sparc). This is done
> because it's not needed for example on x86 and most architectures,
> given 32-bit instructions do not care about the high part of the
> registers.
> 
> Basically size changing ops are trunc_i64_i32, ext_i32_i64 and
> extu_i32_i64. We can be conservative and implement all of them as real
> instructions in all TCG backends. In that case the mov op never has
> to deal with registers of different size (just like we enforce that at
> the TCG frotnend level), and the register allocator and the optimizer
> do not have to deal with this. However that's suboptimal on some
> architectures, that's why on x86 we decided to just replace the
> trunc_i64_i32 by a move. But if we do this simplification it should be
> done everywhere (in that case, including in the qemu_ld op). And
> DOCUMENTED somewhere, given different choices can be made for different
> backends.

I think there are four cases:

1) 64-bit processors that do not have loads with 32-bit addresses, and
do not zero extend on 32-bit operations---possibly because 32-bit
operations do not exist at all.

        => qemu_ld/qemu_st must truncate the address

        ia64, s390, sparc all fall under this group.

2) 64-bit processors that have loads with 32-bit addresses.

        => qemu_ld/qemu_st can use 32-bit addresses to do the
           truncation

        aarch64, I think, falls under this group

3) Processors that do not have 32-bit loads, and automatically zero
extend on 32-bit operations

        => qemu_ld/qemu_st could use 64-bit addresses and no truncation

x86 currently falls under 3, because it doesn't use ADDR32, but the
register allocator is breaking case 3 by forcing 64-bit operations when
loading from a global.

I am not sure if the optimizer could also break this case, or if it is
working by chance.  So, the simplest fix for 2.4 would be to add the
prefix as suggested in the comment and make x86 fall under 2.

If the optimizer is not breaking this case, fixing the register
allocator would be an option, and then the ADDR32 prefix could be reverted.

Even if the prefix was added, modifying the register allocator to use
32-bit loads would still be useful as an optimization, since on x86
32-bit loads are smaller than 64-bit loads.

Paolo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]