qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] target-mips: apply workaround for TCG optimizat


From: Aurelien Jarno
Subject: Re: [Qemu-devel] [PATCH] target-mips: apply workaround for TCG optimizations for MFC1
Date: Tue, 14 Jul 2015 20:37:35 +0200
User-agent: Mutt/1.5.23 (2014-03-12)

On 2015-07-14 20:20, Paolo Bonzini wrote:
> 
> 
> On 14/07/2015 19:09, Aurelien Jarno wrote:
> > On 2015-07-14 17:38, Leon Alrae wrote:
> >> There seems to be an issue when trying to keep a pointer in bottom 32-bits
> >> of a 64-bit floating point register. Load and store instructions accessing
> >> this address for some reason use the whole 64-bit content of floating point
> >> register rather than truncated 32-bit value. The following load uses
> >> incorrect address which leads to a crash if upper 32 bits of $f0 isn't 0:
> >>
> >> 0x00400c60:  mfc1       t8,$f0
> >> 0x00400c64:  lw t9,0(t8)
> >>
> >> It can be reproduced with the following linux userland program when running
> >> on a MIPS32 with CP0.Status.FR=1 (by default mips32r5-generic and
> >> mips32r6-generic CPUs have this bit set in linux-user).
> >>
> >> int main(int argc, char *argv[])
> >> {
> >>     int tmp = 0x11111111;
> >>     /* Set f0 */
> >>     __asm__ ("mtc1  %0, $f0\n"
> >>              "mthc1 %1, $f0\n"
> >>              : : "r" (&tmp), "r" (tmp));
> >>     /* At this point $f0: w:76fff040 d:1111111176fff040 */
> >>     __asm__ ("mfc1 $t8, $f0\n"
> >>              "lw   $t9, 0($t8)\n"); /* <--- crash! */
> >>     return 0;
> >> }
> >>
> >> Running above program in normal (non-singlestep mode) leads to:
> >>
> >> Program received signal SIGSEGV, Segmentation fault.
> >> 0x00005555559f6f37 in static_code_gen_buffer ()
> >> (gdb) x/i 0x00005555559f6f37
> >> => 0x5555559f6f37 <static_code_gen_buffer+78359>:       mov    
> >> %gs:0x0(%rbp),%ebp
> >> (gdb) info registers rbp
> >> rbp            0x1111111176fff040       0x1111111176fff040
> >>
> >> The program runs fine in singlestep mode, or with disabled TCG
> >> optimizations. Also, I'm not able to reproduce it in system emulation.
> > 
> > I am able to reproduce the problem, but for me disabling the
> > optimizations doesn't help. That said the problem is just another issue
> > with the "let's assume the target supports move between 32 and 64 bit
> > registers". At some point we should add a paragraph to tcg/README, to
> > define how handle 32 vs 64 bit registers and what the TCG targets should
> > expect. We had to add special code to handle that for sparc
> > (trunc_shr_i32 instruction), but also code to the optimizer to remember
> > about "garbage" high bits. I am not sure someone has a global view about
> > how all this code interacts.
> 
> I certainly don't have a global view, so much that I didn't think at 
> all of the optimizer... Instead, it looks to me like a bug in the 
> register allocator.  In particular this code in tcg_reg_alloc_mov:

That's exactly my point when I said that someone doesn't have a global
view. I think the fact that we don't check for type when simplifying
moves in the register allocator is intentional, the same way we simply
transform the trunc op into a mov op (except on sparc). This is done
because it's not needed for example on x86 and most architectures,
given 32-bit instructions do not care about the high part of the
registers.

Basically size changing ops are trunc_i64_i32, ext_i32_i64 and
extu_i32_i64. We can be conservative and implement all of them as real
instructions in all TCG backends. In that case the mov op never has
to deal with registers of different size (just like we enforce that at
the TCG frotnend level), and the register allocator and the optimizer
do not have to deal with this. However that's suboptimal on some
architectures, that's why on x86 we decided to just replace the
trunc_i64_i32 by a move. But if we do this simplification it should be
done everywhere (in that case, including in the qemu_ld op). And
DOCUMENTED somewhere, given different choices can be made for different
backends.

As for the optimizer, it's goal is to predict the value of the registers
by constant folding. It should be seen as another CPU, with its own
rules. For example TCG internally stores 32-bit constants as signed
extended. The optimizer should follow the same convention.

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
address@hidden                 http://www.aurel32.net



reply via email to

[Prev in Thread] Current Thread [Next in Thread]