qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: locking for TCG (was: Re: [Qemu-devel] [Bug 668799] Re: qemu-arm seg


From: Jan-Simon Möller
Subject: Re: locking for TCG (was: Re: [Qemu-devel] [Bug 668799] Re: qemu-arm segfaults executing msgmerge (gettext))
Date: Wed, 01 Dec 2010 21:15:57 -0000

Am Mittwoch, 1. Dezember 2010, 20:40:37 schrieb Peter Maydell:
> On 28 November 2010 11:24, Peter Maydell <address@hidden> wrote:
> > 2010/11/28 Brian Harring <address@hidden>:
> >> Additional note... it *looks* like the deadlock potential is there
> >> already in 0.13, it's just heavily exacerbated by this patch- out of
> >> about 600 builds I've seen 2 lockup in the same fashion (rate was far
> >> higher with the patch on this ticket).
> 
> > (I think that running multithreaded programs under user-mode
> > emulation is effectively hitting a lot of the same locking issues
> > that you would get for emulating an MP core in multiple threads.)
> 
> Having looked in a bit more detail at the code, I definitely think
> the locking is insufficient for the TCG TranslationBlock structures.
> In particular:
>  * cpu_exit() updates the linked lists in the TB ->jmp_next[] arrays
>  * cpu_exit() is called:
>    + by other threads, in linux-user mode
>    + by signal handlers (both in linux user mode for taking signals
> and in system mode via qemu_notify_event() when timers expire)
>    + by normal generated code

Thanks for investigation this further!

> At the moment nothing blocks signals when it is modifying the TB
> jmp_next arrays (either via cpu_exit() or tb_add_jump()), so if you're
> unlucky and you take a signal while you're in the middle of modifying
> a jmp_next list you might end up with the list corrupted. This is
> more likely to happen with multithreaded linux-user mode code I
> think, but there's still a possibility there even in pure system mode.
> 
> I'm not sure what the best approach to solving this is. We could
> add "block signals; take mutex" around all the places in the code
> that touch the TB data structures. That seems a bit heavyweight
> and it's also not totally clear to me what the best points in the
> exec.c code to put the locking are; but it would fix the problem.


Adding locks everywhere is probably the "save but horribly slow" solution.

 
> Alternatively we could try making cpu_exit() not have to actually
> fiddle with the TB graph. To do that you'd need to do one of:
>  * explicit checks for a "should we exit" flag at backwards and
> indirect branches and every few hundred insns. This is extra
> straight-line-code overhead, but on the other hand you get to
> avoid having all your cached next-tb links trashed every time
> something has to call cpu_exit(), so it's not totally obvious that
> it would be a net loss
>  * have cpu_exit() force itself to be running in the thread for that
> virtual CPU by sending a signal, to avoid the "thread has executed
> its way out of the TB" problem that currently requires us to trace
> through the whole TB call graph. Then we could do something
> simpler and atomic/reentrant to stop the cpu rather than chasing
> and editing linked lists
> 
> I think on balance I maybe favour the last one, but I'm not
> sure. Does anybody have an opinion?

Sounds reasonable to me.

Brainstorming:
Would per-thread data-structures make any sense ?


-- JSM

-- 
qemu-arm segfaults executing msgmerge (gettext)
https://bugs.launchpad.net/bugs/668799
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.

Status in QEMU: New

Bug description:
upstream qemu.git revision b45e9c05dbacba8e992f0bffeca04c6379c3ad45

Starting program: /usr/bin/qemu-arm msgmerge-static ar.po anjuta.pot

[Thread debugging using libthread_db enabled]
[New Thread 0x7ffff4bc3ff0 (LWP 26108)]
[New Thread 0x7ffff4b8aff0 (LWP 26109)]
[New Thread 0x7ffff4b51ff0 (LWP 26110)]
[New Thread 0x7ffff4b18ff0 (LWP 26111)]
[New Thread 0x7ffff4adfff0 (LWP 26112)]
[New Thread 0x7ffff4aa6ff0 (LWP 26113)]
[New Thread 0x7ffff4a6dff0 (LWP 26114)]
[New Thread 0x7ffff4a34ff0 (LWP 26115)]
[New Thread 0x7ffff49fbff0 (LWP 26116)]
[New Thread 0x7ffff49c2ff0 (LWP 26117)]
[New Thread 0x7ffff4989ff0 (LWP 26118)]
[New Thread 0x7ffff4950ff0 (LWP 26119)]
[New Thread 0x7ffff4917ff0 (LWP 26120)]
[New Thread 0x7ffff48deff0 (LWP 26121)]
[New Thread 0x7ffff48a5ff0 (LWP 26122)]
[New Thread 0x7ffff486cff0 (LWP 26123)]
[New Thread 0x7ffff4833ff0 (LWP 26124)]
[New Thread 0x7ffff47faff0 (LWP 26125)]
[New Thread 0x7ffff47c1ff0 (LWP 26126)]
[New Thread 0x7ffff4788ff0 (LWP 26127)]
[New Thread 0x7ffff474fff0 (LWP 26128)]
[New Thread 0x7ffff4716ff0 (LWP 26129)]
[New Thread 0x7ffff46ddff0 (LWP 26130)]
.........................
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff4aa6ff0 (LWP 26113)]
0x00000000600480d4 in tb_reset_jump_recursive2 (tb=0x7ffff4c63540, n=0)
    at /home/user/git/qemu/exec.c:1333
1333                tb1 = tb1->jmp_next[n1];

(gdb) bt
#0  0x00000000600480d4 in tb_reset_jump_recursive2 (tb=0x7ffff4c63540, n=0)
    at /home/user/git/qemu/exec.c:1333
#1  0x00000000600481c0 in tb_reset_jump_recursive (tb=0x7ffff4c63540)
    at /home/user/git/qemu/exec.c:1361
#2  0x0000000060048160 in tb_reset_jump_recursive2 (tb=0x7ffff4c634d8, n=0)
    at /home/user/git/qemu/exec.c:1355
#3  0x00000000600481c0 in tb_reset_jump_recursive (tb=0x7ffff4c634d8)
    at /home/user/git/qemu/exec.c:1361
#4  0x0000000060048160 in tb_reset_jump_recursive2 (tb=0x7ffff4c63470, n=0)
    at /home/user/git/qemu/exec.c:1355
#5  0x00000000600481c0 in tb_reset_jump_recursive (tb=0x7ffff4c63470)
    at /home/user/git/qemu/exec.c:1361
#6  0x0000000060048160 in tb_reset_jump_recursive2 (tb=0x7ffff4c63408, n=1)
    at /home/user/git/qemu/exec.c:1355
#7  0x00000000600481d1 in tb_reset_jump_recursive (tb=0x7ffff4c63408)
    at /home/user/git/qemu/exec.c:1362
#8  0x0000000060048160 in tb_reset_jump_recursive2 (tb=0x7ffff4c633a0, n=0)
    at /home/user/git/qemu/exec.c:1355
#9  0x00000000600481c0 in tb_reset_jump_recursive (tb=0x7ffff4c633a0)
    at /home/user/git/qemu/exec.c:1361
#10 0x0000000060048160 in tb_reset_jump_recursive2 (tb=0x7ffff4c63338, n=0)
    at /home/user/git/qemu/exec.c:1355
#11 0x00000000600481c0 in tb_reset_jump_recursive (tb=0x7ffff4c63338)
    at /home/user/git/qemu/exec.c:1361
#12 0x0000000060048160 in tb_reset_jump_recursive2 (tb=0x7ffff4c632d0, n=0)
    at /home/user/git/qemu/exec.c:1355
---Type <return> to continue, or q <return> to quit---
#13 0x00000000600481c0 in tb_reset_jump_recursive (tb=0x7ffff4c632d0)
    at /home/user/git/qemu/exec.c:1361
#14 0x0000000060048160 in tb_reset_jump_recursive2 (tb=0x7ffff4c63268, n=1)
    at /home/user/git/qemu/exec.c:1355
#15 0x00000000600481d1 in tb_reset_jump_recursive (tb=0x7ffff4c63268)
    at /home/user/git/qemu/exec.c:1362
#16 0x0000000060048160 in tb_reset_jump_recursive2 (tb=0x7ffff4c63200, n=0)
    at /home/user/git/qemu/exec.c:1355
#17 0x00000000600481c0 in tb_reset_jump_recursive (tb=0x7ffff4c63200)
    at /home/user/git/qemu/exec.c:1361
#18 0x00000000600487c5 in cpu_unlink_tb (env=0x62385400) at 
/home/user/git/qemu/exec.c:1617
#19 0x00000000600488e8 in cpu_exit (env=0x62385400) at 
/home/user/git/qemu/exec.c:1662
#20 0x0000000060000798 in start_exclusive () at 
/home/user/git/qemu/linux-user/main.c:152
#21 0x0000000060000a4b in do_kernel_trap (env=0x62359940)
    at /home/user/git/qemu/linux-user/main.c:493
#22 0x00000000600023f3 in cpu_loop (env=0x62359940) at 
/home/user/git/qemu/linux-user/main.c:797
#23 0x00000000600123df in clone_func (arg=0x7ffffffd76e0)
    at /home/user/git/qemu/linux-user/syscall.c:3561
#24 0x00000000600b382d in start_thread (arg=<value optimized out>) at 
pthread_create.c:297
#25 0x00000000600f1809 in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#26 0x0000000000000000 in ?? ()
(gdb) 



Its interesting to see this :
#0  0x00000000600480d4 in tb_reset_jump_recursive2 (tb=0x7ffff4c63540, n=0)
    at /home/user/git/qemu/exec.c:1333
        tb1 = 0x0                                           <<<<<<<<<<
        tb_next = 0xf4c63610                        <<<<<<<<<<
        ptb = 0x60341c91                              <<<<<<<<<<
        n1 = 0
#1  0x00000000600481c0 in tb_reset_jump_recursive (tb=0x7ffff4c63540)
    at /home/user/git/qemu/exec.c:1361
No locals.
#2  0x0000000060048160 in tb_reset_jump_recursive2 (tb=0x7ffff4c634d8, n=0)
    at /home/user/git/qemu/exec.c:1355
        tb1 = 0x7ffff4c634d8                          <<<<<<<<<<<
        tb_next = 0x7ffff4c63540                    <<<<<<<<<<<
        ptb = 0x7ffff4c63860                           <<<<<<<<<<<
        n1 = 0
#3  0x00000000600481c0 in tb_reset_jump_recursive (tb=0x7ffff4c634d8)
    at /home/user/git/qemu/exec.c:1361
No locals.
#4  0x0000000060048160 in tb_reset_jump_recursive2 (tb=0x7ffff4c63470, n=0)
    at /home/user/git/qemu/exec.c:1355
        tb1 = 0x7ffff4c63470
        tb_next = 0x7ffff4c634d8
        ptb = 0x7ffff4c63530
        n1 = 0
#5  0x00000000600481c0 in tb_reset_jump_recursive (tb=0x7ffff4c63470)
    at /home/user/git/qemu/exec.c:1361





reply via email to

[Prev in Thread] Current Thread [Next in Thread]