qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] tcg/tcg.c:1892: tcg fatal error


From: Artyom Tarasenko
Subject: Re: [Qemu-devel] tcg/tcg.c:1892: tcg fatal error
Date: Mon, 11 Apr 2011 19:53:17 +0200

On Mon, Apr 11, 2011 at 5:16 AM, Igor Kovalenko
<address@hidden> wrote:
> On Mon, Apr 11, 2011 at 12:00 AM, Artyom Tarasenko <address@hidden> wrote:
>> On Sun, Apr 10, 2011 at 9:41 PM, Igor Kovalenko
>> <address@hidden> wrote:
>>> On Sun, Apr 10, 2011 at 11:37 PM, Artyom Tarasenko <address@hidden> wrote:
>>>> On Sun, Apr 10, 2011 at 8:52 PM, Igor Kovalenko
>>>> <address@hidden> wrote:
>>>>> On Sun, Apr 10, 2011 at 10:35 PM, Artyom Tarasenko <address@hidden> wrote:
>>>>>> On Sun, Apr 10, 2011 at 7:57 PM, Blue Swirl <address@hidden> wrote:
>>>>>>> On Sun, Apr 10, 2011 at 8:48 PM, Artyom Tarasenko <address@hidden> 
>>>>>>> wrote:
>>>>>>>> On Sun, Apr 10, 2011 at 4:44 PM, Blue Swirl <address@hidden> wrote:
>>>>>>>>> On Sun, Apr 10, 2011 at 5:09 PM, Artyom Tarasenko <address@hidden> 
>>>>>>>>> wrote:
>>>>>>>>>> On Sun, Apr 10, 2011 at 3:24 PM, Aurelien Jarno <address@hidden> 
>>>>>>>>>> wrote:
>>>>>>>>>>> On Sun, Apr 10, 2011 at 02:29:59PM +0200, Artyom Tarasenko wrote:
>>>>>>>>>>>> Trying to boot some proprietary OS I get qemu-system-sparc64 crash 
>>>>>>>>>>>> with a
>>>>>>>>>>>>
>>>>>>>>>>>> tcg/tcg.c:1892: tcg fatal error
>>>>>>>>>>>>
>>>>>>>>>>>> error message.
>>>>>>>>>>>>
>>>>>>>>>>>> It looks like it can be a platform independent bug though, because
>>>>>>>>>>>> when a '-singlestep' option IS present, qemu doesn't crash and 
>>>>>>>>>>>> seems
>>>>>>>>>>>> to translate the code properly.
>>>>>>>>>>>>
>>>>>>>>>>>> (gdb) bt
>>>>>>>>>>>> #0  0x00000032c2e327f5 in raise () from /lib64/libc.so.6
>>>>>>>>>>>> #1  0x00000032c2e33fd5 in abort () from /lib64/libc.so.6
>>>>>>>>>>>> #2  0x000000000051933d in tcg_reg_alloc_call (s=<value optimized 
>>>>>>>>>>>> out>,
>>>>>>>>>>>> def=0x89d340, opc=INDEX_op_call, args=0x10acc98, dead_iargs=3) at
>>>>>>>>>>>> qemu/tcg/tcg.c:1892
>>>>>>>>>>>> #3  0x000000000051a557 in tcg_gen_code_common (s=0x10b8940,
>>>>>>>>>>>> gen_code_buf=0x40338b60 "address@hidden 3\355I\211\256\220") at
>>>>>>>>>>>> qemu/tcg/tcg.c:2099
>>>>>>>>>>>> #4  tcg_gen_code (s=0x10b8940, gen_code_buf=0x40338b60 
>>>>>>>>>>>> "address@hidden
>>>>>>>>>>>> 3\355I\211\256\220") at qemu/tcg/tcg.c:2142
>>>>>>>>>>>> #5  0x00000000004d38f1 in cpu_sparc_gen_code (env=0x10cce10,
>>>>>>>>>>>> tb=0x7fffe91bc218, gen_code_size_ptr=0x7fffffffd9b4) at
>>>>>>>>>>>> qemu/translate-all.c:93
>>>>>>>>>>>> #6  0x00000000004d1fd7 in tb_gen_code (env=0x10cce10, pc=18868776,
>>>>>>>>>>>> cs_base=18868780, flags=15, cflags=0) at qemu/exec.c:989
>>>>>>>>>>>> #7  0x00000000004d4029 in tb_find_slow (env1=<value optimized 
>>>>>>>>>>>> out>) at
>>>>>>>>>>>> qemu/cpu-exec.c:167
>>>>>>>>>>>> #8  tb_find_fast (env1=<value optimized out>) at cpu-exec.c:194
>>>>>>>>>>>> #9  cpu_sparc_exec (env1=<value optimized out>) at 
>>>>>>>>>>>> qemu/cpu-exec.c:556
>>>>>>>>>>>> #10 0x0000000000408868 in tcg_cpu_exec () at qemu/cpus.c:1066
>>>>>>>>>>>> #11 cpu_exec_all () at qemu/cpus.c:1102
>>>>>>>>>>>> #12 0x000000000053c756 in main_loop (argc=<value optimized out>,
>>>>>>>>>>>> argv=<value optimized out>, envp=<value optimized out>) at
>>>>>>>>>>>> qemu/vl.c:1430
>>>>>>>>>>>>
>>>>>>>>>>>> I inspected ts->val_type causing the abort() case and it turned 
>>>>>>>>>>>> out to be 0.
>>>>>>>>>>>>
>>>>>>>>>>>> The last lines of qemu.log (without -singlestep)
>>>>>>>>>>>> IN:
>>>>>>>>>>>> 0x00000000011fe9f0:  rdpr  %pstate, %g1
>>>>>>>>>>>> 0x00000000011fe9f4:  wrpr  %g1, 2, %pstate
>>>>>>>>>>>> --------------
>>>>>>>>>>>> IN:
>>>>>>>>>>>> 0x00000000011fe9f8:  ldub  [ %o0 ], %o1
>>>>>>>>>>>> 0x00000000011fe9fc:  mov  %o1, %o2
>>>>>>>>>>>> 0x00000000011fea00:  rdpr  %tick, %o3
>>>>>>>>>>>> 0x00000000011fea04:  cmp  %o1, %o2
>>>>>>>>>>>> 0x00000000011fea08:  be  %icc, 0x11fea00
>>>>>>>>>>>> 0x00000000011fea0c:  ldub  [ %o0 ], %o2
>>>>>>>>>>>>
>>>>>>>>>>>> Search PC...
>>>>>>>>>>>> Search PC...
>>>>>>>>>>>> Search PC...
>>>>>>>>>>>> Search PC...
>>>>>>>>>>>> Search PC...
>>>>>>>>>>>> Search PC...
>>>>>>>>>>>> --------------
>>>>>>>>>>>> IN:
>>>>>>>>>>>> 0x00000000011fe9f8:  ldub  [ %o0 ], %o1
>>>>>>>>>>>> 0x00000000011fe9fc:  mov  %o1, %o2
>>>>>>>>>>>> 0x00000000011fea00:  rdpr  %tick, %o3
>>>>>>>>>>>> 0x00000000011fea04:  cmp  %o1, %o2
>>>>>>>>>>>> 0x00000000011fea08:  be  %icc, 0x11fea00
>>>>>>>>>>>> 0x00000000011fea0c:  ldub  [ %o0 ], %o2
>>>>>>>>>>>>
>>>>>>>>>>>> 110521: Data Access MMU Miss (v=0068) pc=00000000011fe9f8
>>>>>>>>>>>> npc=00000000011fe9fc SP=000000000180ae41
>>>>>>>>>>>> pc: 00000000011fe9f8  npc: 00000000011fe9fc
>>>>>>>>>>>>
>>>>>>>>>>>> IN:
>>>>>>>>>>>> 0x00000000011fea00:  rdpr  %tick, %o3
>>>>>>>>>>>> 0x00000000011fea04:  cmp  %o1, %o2
>>>>>>>>>>>> 0x00000000011fea08:  be  %icc, 0x11fea00
>>>>>>>>>>>> 0x00000000011fea0c:  ldub  [ %o0 ], %o2
>>>>>>>>>>>> --------------
>>>>>>>>>>>> IN:
>>>>>>>>>>>> 0x00000000011fea10:  brz,pn   %o2, 0x11fe9f8
>>>>>>>>>>>> 0x00000000011fea14:  mov  %o2, %o4
>>>>>>>>>>>> --------------
>>>>>>>>>>>> IN:
>>>>>>>>>>>> 0x00000000011fea18:  rdpr  %tick, %o5
>>>>>>>>>>>> 0x00000000011fea1c:  cmp  %o2, %o4
>>>>>>>>>>>> 0x00000000011fea20:  be  %icc, 0x11fea18
>>>>>>>>>>>> 0x00000000011fea24:  ldub  [ %o0 ], %o4
>>>>>>>>>>>> --------------
>>>>>>>>>>>> IN:
>>>>>>>>>>>> 0x00000000011fea28:  brz,pn   %o4, 0x11fe9f4
>>>>>>>>>>>> 0x00000000011fea2c:  wrpr  %g0, %g1, %pstate
>>>>>>>>>>>> <EOF>
>>>>>>>>>>>>
>>>>>>>>>>>> The crash is 100% reproducible and happens always on the same 
>>>>>>>>>>>> place,
>>>>>>>>>>>> so it's probably a pure TCG issue, not related on getting the
>>>>>>>>>>>> external/timer interrupts.
>>>>>>>>>>>>
>>>>>>>>>>>> Do you need any additional info?
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> What would be interesting would be to get the corresponding TCG code
>>>>>>>>>>> from qemu.log (-d op,op_opt).
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> OP:
>>>>>>>>>>  ---- 0x11fea28
>>>>>>>>>>  ld_i64 tmp6,regwptr,$0x20
>>>>>>>>>>  movi_i64 cond,$0x0
>>>>>>>>>>  movi_i64 tmp8,$0x0
>>>>>>>>>>  brcond_i64 tmp6,tmp8,ne,$0x0
>>>>>>>>>>  movi_i64 cond,$0x1
>>>>>>>>>>  set_label $0x0
>>>>>>>>>>
>>>>>>>>>>  ---- 0x11fea2c
>>>>>>>>>>  movi_i64 tmp7,$0x0
>>>>>>>>>>  xor_i64 tmp0,tmp7,g1
>>>>>>>>>>  movi_i64 pc,$0x11fea2c
>>>>>>>>>>  movi_i64 tmp8,$compute_psr
>>>>>>>>>>  call tmp8,$0x0,$0
>>>>>>>>>>  movi_i64 tmp8,$0x0
>>>>>>>>>>  brcond_i64 cond,tmp8,eq,$0x1
>>>>>>>>>>  movi_i64 npc,$0x11fe9f4
>>>>>>>>>>  br $0x2
>>>>>>>>>>  set_label $0x1
>>>>>>>>>>  movi_i64 npc,$0x11fea30
>>>>>>>>>>  set_label $0x2
>>>>>>>>>>  movi_i64 tmp8,$wrpstate
>>>>>>>>>>  call tmp8,$0x0,$0,tmp0
>>>>>>>>>>  mov_i64 pc,npc
>>>>>>>>>>  movi_i64 tmp8,$0x4
>>>>>>>>>>  add_i64 npc,npc,tmp8
>>>>>>>>>>  exit_tb $0x0
>>>>>>>>>>
>>>>>>>>>> OP after liveness analysis:
>>>>>>>>>>  ---- 0x11fea28
>>>>>>>>>>  ld_i64 tmp6,regwptr,$0x20
>>>>>>>>>>  movi_i64 cond,$0x0
>>>>>>>>>>  movi_i64 tmp8,$0x0
>>>>>>>>>>  brcond_i64 tmp6,tmp8,ne,$0x0
>>>>>>>>>>  movi_i64 cond,$0x1
>>>>>>>>>>  set_label $0x0
>>>>>>>>>>
>>>>>>>>>>  ---- 0x11fea2c
>>>>>>>>>>  nopn $0x2,$0x2
>>>>>>>>>>  nopn $0x3,$0x68,$0x3
>>>>>>>>>>  movi_i64 pc,$0x11fea2c
>>>>>>>>>>  movi_i64 tmp8,$compute_psr
>>>>>>>>>>  call tmp8,$0x0,$0
>>>>>>>>>>  movi_i64 tmp8,$0x0
>>>>>>>>>>  brcond_i64 cond,tmp8,eq,$0x1
>>>>>>>>>>  movi_i64 npc,$0x11fe9f4
>>>>>>>>>>  br $0x2
>>>>>>>>>>  set_label $0x1
>>>>>>>>>>  movi_i64 npc,$0x11fea30
>>>>>>>>>>  set_label $0x2
>>>>>>>>>>  movi_i64 tmp8,$wrpstate
>>>>>>>>>>  call tmp8,$0x0,$0,tmp0
>>>>>>>>>>  mov_i64 pc,npc
>>>>>>>>>>  movi_i64 tmp8,$0x4
>>>>>>>>>>  add_i64 npc,npc,tmp8
>>>>>>>>>>  exit_tb $0x0
>>>>>>>>>>  end
>>>>>>>>>>
>>>>>>>>>> Does it mean the last block is processed correctly and the crash
>>>>>>>>>> happens on the next instruction which doesn't make it to the log?
>>>>>>>>>> The next instruction would be a
>>>>>>>>>>
>>>>>>>>>> 0x00000000011fea30:  retl
>>>>>>>>>>
>>>>>>>>>> Since it's a branch instruction I guess this would also be a tcg 
>>>>>>>>>> block boundary.
>>>>>>>>>
>>>>>>>>> Because abort() was called from tcg_reg_alloc_call, I'd say 'retl'
>>>>>>>>> (synthetic op for 'jmpl %o8 + 8, %g0') was the problem.
>>>>>>>>
>>>>>>>> Any idea why? retl is not a rare instruction...
>>>>>>>
>>>>>>> Sorry, calls are generated for helpers, so it's not 'jmpl' but the
>>>>>>> call to wrpstate helper.
>>>>>>
>>>>>> And why it doesn't happen in a singlestep mode?
>>>>>> I tried to comment out
>>>>>> cpu_check_irqs(env);
>>>>>> in the helper_wrpstate but it made no difference. The only suspicious
>>>>>> thing left is register bank switching. Is it safe to switch register
>>>>>> banks in the helper function? Shouldn't we end the translation block
>>>>>> before?
>>>>>
>>>>> Not sure if I have seen write to pstate in delay slot, but switching
>>>>> globals with PS_AG appears to be safe.
>>>>> Do you know which bits are changed in the pstate?
>>>>
>>>> Hard to say. With a breakpoint set qemu doesn't crash.
>>>> The breakpoint shows the change from 0x14->0x16.
>>>> So the only difference is that interrupts are getting enabled. No
>>>> register bank change.
>>>> (And now also no cpu_check_irqs(env) call, because I commented it out.)
>>>>
>>>> But given there was a Data Access MMU Miss, I would expect there must
>>>> have beeb a PS_MG switch.
>>>>
>>>> Also the breakpoint makes tcg to cut the translation block before the wrpr:
>>>>
>>>> IN:
>>>> 0x00000000011fea18:  rdpr  %tick, %o5
>>>> 0x00000000011fea1c:  cmp  %o2, %o4
>>>> 0x00000000011fea20:  be  %icc, 0x11fea18
>>>> 0x00000000011fea24:  ldub  [ %o0 ], %o4
>>>> --------------
>>>> IN:
>>>> 0x00000000011fea28:  brz,pn   %o4, 0x11fe9f4
>>>> --------------
>>>> IN:
>>>> 0x00000000011fea2c:  wrpr  %g0, %g1, %pstate
>>>> --------------
>>>> IN:
>>>> 0x00000000011fea30:  retl
>>>> --------------
>>>> IN:
>>>> 0x00000000011fea30:  retl
>>>> 0x00000000011fea34:  sub  %o5, %o3, %o0
>>>>
>>>
>>> You can try enabling DEBUG_PSTATE to see which bits are changed.
>>
>> I put an additional DPRINTF in the helper and it doesn't get executed
>> at 11fea2c. Only at 11fe9f4 (0x16->0x14).
>
> In such cases I would run with -d in_asm,int to have more data to
> compare two runs.
> May the patch attached help a bit to add verbose pstate output.

Can do it, but I'd like to understand first what we are looking for.
How does the main works in this case? Is it something like following?

translate {brz,pn ; wrpr} -> optimize -> execute ->translate {retl ;
...} ->optimize -> execute.

The subject error is a tcg error, so it is happening in one of the two
translate/optimise phases drawn above, right?
So, why are we looking at the wrpr helper code?

> Do you have public test case?
> It is possible to code this delay slot write test but real issue may
> be corruption elsewhere.

You assume ts->val_type gets corrupted? But then it must happen before
the wrpr helper call, or actually before the  translation of {brz,pn ;
wrpr} block, no?

-- 
Regards,
Artyom Tarasenko

solaris/sparc under qemu blog: http://tyom.blogspot.com/



reply via email to

[Prev in Thread] Current Thread [Next in Thread]