qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] target-mips: apply workaround for TCG optimizat


From: Aurelien Jarno
Subject: Re: [Qemu-devel] [PATCH] target-mips: apply workaround for TCG optimizations for MFC1
Date: Wed, 15 Jul 2015 10:06:33 +0200
User-agent: Mutt/1.5.23 (2014-03-12)

On 2015-07-15 09:31, Paolo Bonzini wrote:
> Ok, I see your point.  If you put it like this :) the fault definitely
> lies in the backends.  What I'm proposing would be in a new
> tcg_reg_alloc_trunc function, and it would require implementing a
> non-noop trunc.

Why not reusing the existing trunc_shr_i64_i32 op? AFAIU, it has been 
designed exactly for that.

Actually I think we should implement the following ops as optional but
*real* TCG ops:
- trunc_shr_i64_i32
- extu_i32_i64
- ext_i32_i64

Then each backend can implement the one it considers necessary. If not
implemented in a backend it is simply replaced by a mov. This would also
allow to remove the "remember high bits as garbage" in the optimizer,
which I consider a band aid more than a real fix.

Note that we might have multiple choices for example on x86:

1) implement trunc_shr_i64_i32 and ext_i32_i64
This way we make sure that all 32-bit values are always stored
zero-extended (even if a move has been propagated by the register
allocator or by the optimizer). The extu_i32_i64 can therefore always
be considered as a mov op.

2) implement extu_i32_i64 and ext_i32_i64
We have to guarantee that all 32-bit ops ignore the high part of the
registers (which is not the case currently for qemu_ld/st in user mode)
as they might contain garbage. Given that we have to properly zero and
sign extend the value when converting a 32-bit value in a 64-bit value.

> I still believe the register allocator can be improved to do 32-bit
> loads, though as an optimization and not as a bugfix:
> 
> > > Even if the prefix was added, modifying the register allocator to use
> > > 32-bit loads would still be useful as an optimization, since on x86
> > > 32-bit loads are smaller than 64-bit loads.
> >
> > AFAIK, that's already the case. The REXW prefix is only emitted for
> > 64-bit ops.
> 
> Yes, but a load from a 64-bit register to a 32-bit destination emits
> REX.W.  From Leon's dump:
> 
>  mov_i32 tmp1,w0.d0  => mov    0xe8(%r14),%rbp
>  mov_i32 tmp0,tmp1
>  mov_i32 t8,tmp0     => mov    %ebp,0x60(%r14)
> 
> Note %rbp as the load destination and %ebp as the source of the store.

Indeed, that's something we might want to improve (and is due to the
fact we have just replaced trunc_shr_i64_i32 by a move on x86). Note
however that this simplification might be target specific (it is at
least little endian specific if we don't adjust the address).

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
address@hidden                 http://www.aurel32.net



reply via email to

[Prev in Thread] Current Thread [Next in Thread]