[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] qemu_system_reset_request() broken w.r.t BQL locking re
From: |
Alex Bennée |
Subject: |
Re: [Qemu-devel] qemu_system_reset_request() broken w.r.t BQL locking regime |
Date: |
Wed, 05 Jul 2017 21:10:05 +0100 |
User-agent: |
mu4e 0.9.19; emacs 25.2.50.3 |
Peter Maydell <address@hidden> writes:
> On 5 July 2017 at 20:30, Alex Bennée <address@hidden> wrote:
>>
>> Peter Maydell <address@hidden> writes:
>>
>>> On 5 July 2017 at 17:01, Alex Bennée <address@hidden> wrote:
>>>> An interesting bug was reported on #qemu today. It was bisected to
>>>> 8d04fb55 (drop global lock for TCG) and only occurred when QEMU was run
>>>> with taskset -c 0. Originally the fingers where pointed at mttcg but it
>>>> occurs in both single and multi-threaded modes.
>>>>
>>>> I think the problem is qemu_system_reset_request() is certainly racy
>>>> when resetting a running CPU. AFAICT:
>>>>
>>>> - Guest resets board, writing to some hw address (e.g.
>>>> arm_sysctl_write)
>>>> - This triggers qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET)
>>>> - We exit iowrite and drop the BQL
>>>> - vl.c schedules qemu_system_reset->qemu_devices_reset...arm_cpu_reset
>>>> - we start writing new values to CPU env while still in TCG code
>>>> - CHAOS!
>>>>
>>>> The general solution for this is to ensure these sort of tasks are done
>>>> with safe work in the CPUs context when we know nothing else is running.
>>>> It seems this is probably best done by modifying
>>>> qemu_system_reset_request to queue work up on current_cpu and execute it
>>>> as safe work - I don't think the vl.c thread should ever be messing
>>>> about with calling cpu_reset directly.
>>>
>>> My first thought is that qemu_system_reset() should absolutely
>>> stop every CPU (or other runnable thing like a DMA agent) in the
>>> system.
>>
>> Are all these reset calls system wide though?
>
> It's called 'system_reset' because it resets the entire system...
>
>> After all with PCSI you
>> can bring individual cores up and down. I appreciate the vexpress stuff
>> pre-dates those well defined semantics though.
>
> It's individual core reset that's a more ad-hoc afterthought,
> really.
>
>> vm_stop certainly tries to deal with things gracefully as well as send
>> qapi events, drain IO queues and the rest of it. My only concern is it
>> handles two cases - external vm_stops and those from the current CPU.
>>
>> I think it may be cleaner for CPU originated halts to use the
>> async_safe_run_on_cpu() mechanism.
>
> System reset already has an async component to it -- you call
> qemu_system_reset_request(), which just says "schedule a system
> reset as soon as convenient". qemu_system_reset() is the thing
> that runs later and actually does the job (from the io thread,
> not the CPU thread).
>
> Looking more closely at the vl.c code, it looks like it
> calls pause_all_vcpus() before calling qemu_system_reset():
> shouldn't that be pausing all the TCG CPUs?
Hmm it should - but it doesn't seem to have in this backtrace:
#0 0x000055555593fdd3 in arm_cpu_reset (s=0x5555569abb90) at
/home/alex/lsrc/qemu/qemu.git/target/arm/cpu.c:119
#1 0x0000555555bcc74a in cpu_reset (cpu=0x5555569abb90) at qom/cpu.c:268
#2 0x000055555589d82a in do_cpu_reset (opaque=0x5555569abb90) at
/home/alex/lsrc/qemu/qemu.git/hw/arm/boot.c:570
#3 0x0000555555a257e4 in qemu_devices_reset () at hw/core/reset.c:69
#4 0x00005555559697a8 in qemu_system_reset (reason=SHUTDOWN_CAUSE_GUEST_RESET)
at vl.c:1713
#5 0x0000555555969c0d in main_loop_should_exit () at vl.c:1885
#6 0x0000555555969cda in main_loop () at vl.c:1922
#7 0x0000555555971aca in main (argc=16, argv=0x7fffffffd918,
envp=0x7fffffffd9a0) at vl.c:4749
Thread 4 (Thread 0x7fff731ff700 (LWP 10098)):
#0 0x00007fffdf4f5a15 in do_futex_wait (private=0, abstime=0x7fff731fc670,
expected=0, futex_word=0x7fff64cbb5b8) at
../sysdeps/unix/sysv/linux/futex-internal.h:205
#1 0x00007fffdf4f5a15 in do_futex_wait (address@hidden, address@hidden) at
sem_waitcommon.c:111
#2 0x00007fffdf4f5adf in __new_sem_wait_slow (sem=0x7fff64cbb5b8,
abstime=0x7fff731fc670) at sem_waitcommon.c:181
#3 0x00007fffdf4f5b92 in sem_timedwait (sem=<optimised out>,
abstime=<optimised out>) at sem_timedwait.c:36
#4 0x0000555555d27488 in qemu_sem_timedwait (sem=0x7fff64cbb5b8, ms=10000) at
util/qemu-thread-posix.c:271
#5 0x0000555555d20aad in worker_thread (opaque=0x7fff64cbb550) at
util/thread-pool.c:92
#6 0x00007fffdf4ed6ba in start_thread (arg=0x7fff731ff700) at
pthread_create.c:333
#7 0x00007fffdf2233dd in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:109
Thread 3 (Thread 0x7fff7ebff700 (LWP 10097)):
#0 0x00007fffdf4f630a in __lll_unlock_wake () at
../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:371
#1 0x00007fffdf4f14ff in __GI___pthread_mutex_unlock (decr=1,
mutex=0x55555641ae20 <qemu_global_mutex>) at pthread_mutex_unlock.c:55
#2 0x00007fffdf4f14ff in __GI___pthread_mutex_unlock (mutex=0x55555641ae20
<qemu_global_mutex>) at pthread_mutex_unlock.c:314
#3 0x0000555555d27091 in qemu_mutex_unlock (mutex=0x55555641ae20
<qemu_global_mutex>) at util/qemu-thread-posix.c:88
#4 0x00005555557aa911 in qemu_mutex_unlock_iothread () at
/home/alex/lsrc/qemu/qemu.git/cpus.c:1589
#5 0x00005555557d791a in io_writex (env=0x5555569b3e20,
iotlbentry=0x5555569c61d0, val=3230662656, addr=268435620,
retaddr=140736397128697, size=4) at
/home/alex/lsrc/qemu/qemu.git/accel/tcg/cputlb.c:800
#6 0x00005555557daabd in io_writel (env=0x5555569b3e20, mmu_idx=3, index=0,
val=3230662656, addr=268435620, retaddr=140736397128697) at
/home/alex/lsrc/qemu/qemu.git/softmmu_template.h:265
#7 0x00005555557dadab in helper_le_stl_mmu (env=0x5555569b3e20,
addr=268435620, val=3230662656, oi=35, retaddr=140736397128697) at
/home/alex/lsrc/qemu/qemu.git/softmmu_template.h:300
#8 0x00007fffbef533f9 in code_gen_buffer ()
#9 0x00005555557e3c46 in cpu_tb_exec (cpu=0x5555569abb90, itb=0x7fffbef53280
<code_gen_buffer+1327702>) at
/home/alex/lsrc/qemu/qemu.git/accel/tcg/cpu-exec.c:166
#10 0x00005555557e4976 in cpu_loop_exec_tb (cpu=0x5555569abb90,
tb=0x7fffbef53280 <code_gen_buffer+1327702>, last_tb=0x7fff7ebfc638,
tb_exit=0x7fff7ebfc634) at
/home/alex/lsrc/qemu/qemu.git/accel/tcg/cpu-exec.c:574
#11 0x00005555557e4c74 in cpu_exec (cpu=0x5555569abb90) at
/home/alex/lsrc/qemu/qemu.git/accel/tcg/cpu-exec.c:673
#12 0x00005555557aa132 in tcg_cpu_exec (cpu=0x5555569abb90) at
/home/alex/lsrc/qemu/qemu.git/cpus.c:1270
#13 0x00005555557aa359 in qemu_tcg_rr_cpu_thread_fn (arg=0x5555569abb90) at
/home/alex/lsrc/qemu/qemu.git/cpus.c:1365
#14 0x00007fffdf4ed6ba in start_thread (arg=0x7fff7ebff700) at
pthread_create.c:333
#15 0x00007fffdf2233dd in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:109
Thread 2 (Thread 0x7fffcf610700 (LWP 10095)):
#0 0x00007fffdf21d499 in syscall () at
../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1 0x0000555555d275d8 in qemu_futex_wait (f=0x5555568528c0 <rcu_gp_event>,
val=4294967295) at /home/alex/lsrc/qemu/qemu.git/include/qemu/futex.h:26
#2 0x0000555555d276e3 in qemu_event_wait (ev=0x5555568528c0 <rcu_gp_event>) at
util/qemu-thread-posix.c:415
#3 0x0000555555d3ea6c in wait_for_readers () at util/rcu.c:131
#4 0x0000555555d3eb25 in synchronize_rcu () at util/rcu.c:162
#5 0x0000555555d3ecb4 in call_rcu_thread (opaque=0x0) at util/rcu.c:256
#6 0x00007fffdf4ed6ba in start_thread (arg=0x7fffcf610700) at
pthread_create.c:333
#7 0x00007fffdf2233dd in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:109
Thread 1 (Thread 0x7ffff7edff80 (LWP 10091)):
#0 0x000055555593fdd3 in arm_cpu_reset (s=0x5555569abb90) at
/home/alex/lsrc/qemu/qemu.git/target/arm/cpu.c:119
#1 0x0000555555bcc74a in cpu_reset (cpu=0x5555569abb90) at qom/cpu.c:268
#2 0x000055555589d82a in do_cpu_reset (opaque=0x5555569abb90) at
/home/alex/lsrc/qemu/qemu.git/hw/arm/boot.c:570
#3 0x0000555555a257e4 in qemu_devices_reset () at hw/core/reset.c:69
#4 0x00005555559697a8 in qemu_system_reset (reason=SHUTDOWN_CAUSE_GUEST_RESET)
at vl.c:1713
#5 0x0000555555969c0d in main_loop_should_exit () at vl.c:1885
#6 0x0000555555969cda in main_loop () at vl.c:1922
#7 0x0000555555971aca in main (argc=16, argv=0x7fffffffd918,
envp=0x7fffffffd9a0) at vl.c:4749
>
> thanks
> -- PMM
--
Alex Bennée