[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-arm] [Qemu-devel] ARM64 STR Instruction Crash Regression in TC
From: |
Richard Henderson |
Subject: |
Re: [Qemu-arm] [Qemu-devel] ARM64 STR Instruction Crash Regression in TCG |
Date: |
Sun, 22 Jul 2018 18:45:53 -0700 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 |
On 07/22/2018 02:31 PM, Richard Henderson wrote:
> On 07/22/2018 01:47 PM, Jason A. Donenfeld wrote:
>> Hello,
>>
>> Gcc 7.3 compiles bash's array_flush's dual assignment using:
>>
>> STP X20, X20, [X20,#0x10]
>>
>> But gcc 8.1 compiles it as:
>>
>> STR Q0, [X20,#0x10]
>>
>> Real processors seem okay, and qemu 2.11 seems okay. But qemu 2.12
>> results in a segfaulting process. I'm pretty sure this is a TCG bug.
>>
>> In the attached tarball, please find kernel and run.sh. Calling
>> ./run.sh will start the kernel with the bad bash executable that tries
>> to execute `config=({1..100000})` and crashes. Also included in there
>> is the actual crashing bash binary, in case you'd like to disassemble
>> a little bit.
>
> Interesting. The test passes on master with --enable-debug, but fails when
> qemu is compiled with optimization...
>
> I'll dig a bit deeper.
The failing sequence is
0x0045ba44: 4e080e80 dup v0.2d, x20
0x0045ba48: 90000340 adrp x0, #0x4c3000
0x0045ba4c: 91098003 add x3, x0, #0x260
0x0045ba50: 92800001 movn x1, #0
0x0045ba54: f9413002 ldr x2, [x0, #0x260]
0x0045ba58: 3d800680 str q0, [x20, #0x10]
...
OP after optimization and liveness analysis:
ld_i32 tmp0,env,$0xffffffffffffffdc dead: 1
movi_i32 tmp1,$0x0
brcond_i32 tmp0,tmp1,lt,$L0 dead: 0 1
---- 000000000045ba44 0000000000000000 0000000000000000
dup_vec v128,e64,tmp2,x20
st_vec v128,e8,tmp2,env,$0x8c0 dead: 0
...
---- 000000000045ba58 0000000000000000 0000000000000000
movi_i64 tmp4,$0x10
add_i64 tmp3,x20,tmp4 dead: 1 2
ld_i64 tmp4,env,$0x8c0
movi_i64 tmp6,$0x8
add_i64 tmp5,tmp3,tmp6 dead: 2
qemu_st_i64 tmp4,tmp3,leq,0 dead: 0 1
ld_i64 tmp4,env,$0x8c8 dead: 1
qemu_st_i64 tmp4,tmp5,leq,0 dead: 0 1
...
0x7fffcd2e678c: vmovq 0xe0(%r14), %xmm0
0x7fffcd2e6795: vpbroadcastq %xmm0, %xmm1
0x7fffcd2e679a: vmovdqu %xmm1, 0x8c0(%r14)
...
0x7fffcd2c0e78: vmovq %xmm0, %r12
0x7fffcd2c0e7d: addq $0x10, %r12
The guest x20 is loaded in to xmm0 for the dup at 0x45ba44, and was reused for
the store at 0x45ba58. However, if the load at 0x45ba54 misses the TLB, then
we will have a function call, which can clobber xmm0.
With -O0, it just so happens that the function call does not clobber xmm0; with
optimization enabled, the compiler's different code generation does clobber
xmm0.
Fix by properly considering xmm registers to be call-clobbered. At which point
the saved value is evicted from xmm0 naturally. Patch posted separately.
r~