qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Hexagon toolchain update vs linux-user signals


From: Richard Henderson
Subject: Hexagon toolchain update vs linux-user signals
Date: Wed, 3 Nov 2021 11:22:03 -0400
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.13.0

On 11/3/21 9:31 AM, Alex Bennée wrote:
Could it be a toolchain thing?

Not likely a toolchain problem.  If I can access both of the signals
binaries, I can confirm.

Testing against two signals binaries I see a 4-7% failure rate against the
new binary versus the original pre-toolchain change one. That's not to
say the binary is broken - it could be a subtle change that exacerbated
our existing poor signals support.

   https://transfer.sh/xA2ejk/signals.old (pre-toolchain change)
   https://transfer.sh/vSsn5s/signals

something in the CI ensures it fails much more reliably as U can't get
it to pass on a retry.

I've had a closer look at the signals failure, and it really could be a 
toolchain problem.

The sigsegv is at

#0  0x00005555557a387f in stb_p (ptr=0x10000, v=0 '\000')
    at /home/richard.henderson/qemu/src/include/qemu/bswap.h:326
#1  0x00005555557a4bc5 in cpu_stb_mmu (env=0x555555e4eb50, addr=0,
    val=0 '\000', oi=0, ra=93824992935986) at ../src/accel/tcg/user-exec.c:359
#2  0x00005555557a5396 in cpu_stb_mmuidx_ra (env=0x555555e4eb50, addr=0,
    val=0, mmu_idx=0, ra=93824992935986)
    at ../src/accel/tcg/ldst_common.c.inc:83
#3  0x00005555557a57e6 in cpu_stb_data_ra (env=0x555555e4eb50, addr=0, val=0,
    ra=93824992935986) at ../src/accel/tcg/ldst_common.c.inc:183
#4  0x00005555555ff6f0 in helper_commit_store (env=0x555555e4eb50, slot_num=1)
    at ../src/target/hexagon/op_helper.c:151
#5  0x0000555555600032 in check_noshuf (env=0x555555e4eb50, slot=0)
    at ../src/target/hexagon/op_helper.c:407
#6  0x00005555556000e4 in mem_load4 (env=0x555555e4eb50, slot=0, vaddr=305000)
    at ../src/target/hexagon/op_helper.c:431
#7  0x00005555556063c0 in helper_L2_loadri_io (env=0x555555e4eb50, RsV=305000,
    siV=0, slot=0) at target/hexagon/helper_funcs_generated.c.inc:1013
#8  0x00007fffe8034f5a in code_gen_buffer ()

which is a store to address 0, which obviously should fail.

This comes from

IN: nontrivial_free
0x000224c4:  0x78004003 {       R3 = #0x0
0x000224c8:  0xf204d001         P1 = cmp.eq(R4,R16) }
0x000224cc:  0x5c00413e {       if (P1) jump:nt PC+124
0x000224d0:  0x38034000         if (P0) memb(R3+#0x0) = #0x0
0x000224d4:  0x9180c002         R2 = memw(R0+#0x0) }

which is part of the new toolchain's libc. This is quite obviously a store to address 0 if P0 is true. Which looks pretty questionable. Presumably P0 is not always set, which is why the program does not always crash. But there doesn't appear to be anything wrong with the qemu translation.

I'm suspicious of the new compiler. This looks like some sort of code scheduling bug, where R3=0 got moved ahead of the final use of the previous value in R3.

In the short term, I recommend dropping the hexagon toolchain update and that Taylor generate a new HVX pull request with the new tests present but disabled in the makefile.


r~



reply via email to

[Prev in Thread] Current Thread [Next in Thread]