[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] [Bug 823902] Re: multithreaded ARM seg/longjmp causes unini

From: Peter Maydell
Subject: [Qemu-devel] [Bug 823902] Re: multithreaded ARM seg/longjmp causes uninitialized stack frame due to0d10193870b5a81c3bce13a602a5403c3a55cf6c
Date: Thu, 15 Dec 2011 22:55:24 -0000

** Changed in: qemu
       Status: Fix Committed => Fix Released

You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.

  multithreaded ARM seg/longjmp causes uninitialized stack frame due

Status in QEMU:
  Fix Released

Bug description:
    I've got an ARM multithreaded test program that I wrote as a gcc testcase 
(attached) that fails on QEmu, firefox from Ubuntu ARM maverick also fails in 
the same way.  The failure is either a seg fault or '*** longjmp causes 
uninitialized stack frame ***: ./arm-linux-user/qemu-arm terminated' and it 
fails every time.

  The test works on real hardware - a dual core A9 panda board.  Firefox
  in an ARM maverick chroot also fails in the same way and is fixed in
  the same way.

  On 64bit Oneiric (i7-860 quad core) the backtrace from the seg looks like:
  #0  __sigsetjmp () at ../sysdeps/x86_64/setjmp.S:26
  #1  0x0000000060034cf4 in cpu_arm_exec (env=0x0) at 
  #2  0x0000000060006467 in cpu_loop (env=0x6226d060) at 
  #3  0x0000000060007984 in main (argc=<value optimised out>, argv=<value 
optimised out>, envp=<value optimised out>) at 

  On 32bit lucid (core2 duo dual core) when it gives the longjmp error it's 
taken a bit of a more tortuous route but it looks like it originally took a seg 
at about the same place:
  #0  pthread_cond_wait ()
      at ../nptl/sysdeps/unix/sysv/linux/i386/i486/pthread_cond_wait.S:123
  #1  0x60000344 in exclusive_idle ()
      at /home/dg/linaro/git/qemu/linux-user/main.c:134
  #2  start_exclusive () at /home/dg/linaro/git/qemu/linux-user/main.c:144
  #3  stop_all_tasks () at /home/dg/linaro/git/qemu/linux-user/main.c:2996
  #4  0x60016491 in force_sig (target_sig=6)
      at /home/dg/linaro/git/qemu/linux-user/signal.c:378
  #5  0x60016f1d in queue_signal (env=0x639ff698, sig=6, info=0xb5610280)
      at /home/dg/linaro/git/qemu/linux-user/signal.c:451
  #6  0x60017375 in host_signal_handler (host_signum=6, info=0xb561031c, 
      puc=0xb561039c) at /home/dg/linaro/git/qemu/linux-user/signal.c:504
  #7  <signal handler called>
  #8  0x600c53d1 in raise ()
  #9  0x6009a133 in abort ()
  #10 0x600a0345 in __libc_message ()
  #11 0x600b977c in __fortify_fail ()
  #12 0x600b9717 in ____longjmp_chk ()
  #13 0x600b9697 in __longjmp_chk ()
  #14 0x6002b478 in cpu_loop_exit (env=0xb5611068)
      at /home/dg/linaro/git/qemu/cpu-exec.c:37
  #15 0x6001d4ff in exception_action (host_signum=11, pinfo=0xb5610c8c, 
      puc=0xb5610d0c) at /home/dg/linaro/git/qemu/user-exec.c:46
  ---Type <return> to continue, or q <return> to quit---
  #16 handle_cpu_signal (host_signum=11, pinfo=0xb5610c8c, puc=0xb5610d0c)
      at /home/dg/linaro/git/qemu/user-exec.c:123
  #17 cpu_arm_signal_handler (host_signum=11, pinfo=0xb5610c8c, puc=0xb5610d0c)
      at /home/dg/linaro/git/qemu/user-exec.c:186
  #18 0x600172f6 in host_signal_handler (host_signum=11, info=0xb5610c8c, 
      puc=0xb5610d0c) at /home/dg/linaro/git/qemu/linux-user/signal.c:492
  #19 <signal handler called>
  #20 0x60099ac6 in _setjmp ()
  #21 0x6002b4eb in cpu_arm_exec (env=0x0)
      at /home/dg/linaro/git/qemu/cpu-exec.c:233
  #22 0x600005bc in cpu_loop (env=0x639ff698)
      at /home/dg/linaro/git/qemu/linux-user/main.c:739
  #23 0x60006134 in clone_func (arg=0xbfdcf95c)
      at /home/dg/linaro/git/qemu/linux-user/syscall.c:3953
  #24 0x6008a8d0 in start_thread (arg=0xb5611b70) at pthread_create.c:300
  #25 0x600b7f1e in clone ()

  Things I've tried (with suggestions from Pete Maydell):

  If I remove the 'env = cpu_single_env;'  added by
  0d10193870b5a81c3bce13a602a5403c3a55cf6c (tcg: Reload local variables
  after return from longjmp) the test works reliably (10 out of 10
  passes) on 32bit Lucid and partially (7 out of 10 passes) on 64 bit
  Oneiric (some segs, some hangs).

  If I make cpu_single_env thread local with __thread and leave 0d101...
  in, then again it works reliably on 32bit Lucid, and is flaky on 64
  bit Oneiric (5/10 2 hangs, 3 segs)

  I've also tried using a volatile local variable in cpu_exec to hold a
  copy of env and restore that rather than cpu_single_env.  With this
  it's solid on 32bit lucid and flaky on 64bit Oneirc; these failures on
  64bit OO look like it running off the end of the code buffer (all 0
  code), jumping to non-existent code addresses and a seg in

  With both __thread and the volatile local I still get failures on
  64bit oneiric; they look mostly like they've run off the end of
  generated code (they're executing out of a buffer of all 0's).

  (I also tried some of the 64bit tests on an EC2 Xen Natty VM with
  similar results).

  My guess is I'm hitting multiple bugs here:
    1) The Lucid install is probably too old to hit the compiler bugs for which 
0d101... is a fix - but it is in itself triggering a new bug on the old 
    2) The 64bit Natty and Oneiric installs are new enough to hit the compiler 
bug for which 0d101 is a fix
    3) I'm probably hitting something else as well, my guess is that it could 
be bug  668799 but I'm not clear why it doesn't happen on my 32bit lucid install


To manage notifications about this bug go to:

reply via email to

[Prev in Thread] Current Thread [Next in Thread]