Re: [Qemu-devel] another locking issue in current dataplane code?

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] another locking issue in current dataplane code?

From:	Christian Borntraeger
Subject:	Re: [Qemu-devel] another locking issue in current dataplane code?
Date:	Tue, 08 Jul 2014 10:38:39 +0200
User-agent:	Mozilla/5.0 (X11; Linux i686; rv:24.0) Gecko/20100101 Thunderbird/24.6.0

On 08/07/14 09:43, Ming Lei wrote:
> On Tue, Jul 8, 2014 at 3:19 PM, Christian Borntraeger
> <address@hidden> wrote:
>> Ping.
>>
>> has anyone seen a similar hang on x86?
>>
>>
>>
>> On 07/07/14 13:58, Christian Borntraeger wrote:
>>> Folks,
>>>
>>> with current 2.1-rc0 (
>>> +  dataplane: do not free VirtQueueElement in vring_push()
>>> +  virtio-blk: avoid dataplane VirtIOBlockReq early free
>>> + some not-ready yet s390 patches for migration
>>> )
>>>
>>> I still having issues with dataplane during managedsave (without dataplane 
>>> everything seems to work fine):
>>>
>>> With 1 CPU and 1 disk (and some workload, e.g. a simple dd on the disk) I 
>>> get:
>>>
>>>
>>> Thread 3 (Thread 0x3fff90fd910 (LWP 27218)):
>>> #0  0x000003fffcdb7ba0 in __lll_lock_wait () from /lib64/libpthread.so.0
>>> #1  0x000003fffcdbac0c in __pthread_mutex_cond_lock () from 
>>> /lib64/libpthread.so.0
>>> #2  0x000003fffcdb399a in pthread_cond_wait@@GLIBC_2.3.2 () from 
>>> /lib64/libpthread.so.0
>>> #3  0x00000000801fff06 in qemu_cond_wait (cond=<optimized out>, 
>>> address@hidden <qemu_global_mutex>) at 
>>> /home/cborntra/REPOS/qemu/util/qemu-thread-posix.c:135
>>> #4  0x00000000800472f4 in qemu_kvm_wait_io_event (cpu=<optimized out>) at 
>>> /home/cborntra/REPOS/qemu/cpus.c:843
>>> #5  qemu_kvm_cpu_thread_fn (arg=0x809ad6b0) at 
>>> /home/cborntra/REPOS/qemu/cpus.c:879
>>> #6  0x000003fffcdaf412 in start_thread () from /lib64/libpthread.so.0
>>> #7  0x000003fffba350ae in thread_start () from /lib64/libc.so.6
>>>
>>> Thread 2 (Thread 0x3fff88fd910 (LWP 27219)):
>>> #0  0x000003fffba2a8e0 in ppoll () from /lib64/libc.so.6
>>> #1  0x00000000801af250 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized 
>>> out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
>>> #2  qemu_poll_ns (address@hidden, address@hidden, timeout=-1) at 
>>> /home/cborntra/REPOS/qemu/qemu-timer.c:314
>>> #3  0x00000000801b0702 in aio_poll (ctx=0x807f2230, address@hidden) at 
>>> /home/cborntra/REPOS/qemu/aio-posix.c:221
>>> #4  0x00000000800be3c4 in iothread_run (opaque=0x807f20d8) at 
>>> /home/cborntra/REPOS/qemu/iothread.c:41
>>> #5  0x000003fffcdaf412 in start_thread () from /lib64/libpthread.so.0
>>> #6  0x000003fffba350ae in thread_start () from /lib64/libc.so.6
>>>
>>> Thread 1 (Thread 0x3fff9c529b0 (LWP 27215)):
>>> #0  0x000003fffcdb38f0 in pthread_cond_wait@@GLIBC_2.3.2 () from 
>>> /lib64/libpthread.so.0
>>> #1  0x00000000801fff06 in qemu_cond_wait (address@hidden, address@hidden) 
>>> at /home/cborntra/REPOS/qemu/util/qemu-thread-posix.c:135
>>> #2  0x0000000080212906 in rfifolock_lock (address@hidden) at 
>>> /home/cborntra/REPOS/qemu/util/rfifolock.c:59
>>> #3  0x000000008019e536 in aio_context_acquire (address@hidden) at 
>>> /home/cborntra/REPOS/qemu/async.c:295
>>> #4  0x00000000801a34e6 in bdrv_drain_all () at 
>>> /home/cborntra/REPOS/qemu/block.c:1907
>>> #5  0x0000000080048e24 in do_vm_stop (state=RUN_STATE_PAUSED) at 
>>> /home/cborntra/REPOS/qemu/cpus.c:538
>>> #6  vm_stop (address@hidden) at /home/cborntra/REPOS/qemu/cpus.c:1221
>>> #7  0x00000000800e6338 in qmp_stop (address@hidden) at 
>>> /home/cborntra/REPOS/qemu/qmp.c:98
>>> #8  0x00000000800e1314 in qmp_marshal_input_stop (mon=<optimized out>, 
>>> qdict=<optimized out>, ret=<optimized out>) at qmp-marshal.c:2806
>>> #9  0x000000008004b91a in qmp_call_cmd (cmd=<optimized out>, 
>>> params=0x8096cf50, mon=0x8080b8a0) at 
>>> /home/cborntra/REPOS/qemu/monitor.c:5038
>>> #10 handle_qmp_command (parser=<optimized out>, tokens=<optimized out>) at 
>>> /home/cborntra/REPOS/qemu/monitor.c:5104
>>> #11 0x00000000801faf16 in json_message_process_token (lexer=0x8080b7c0, 
>>> token=0x808f2610, type=<optimized out>, x=<optimized out>, y=6) at 
>>> /home/cborntra/REPOS/qemu/qobject/json-streamer.c:87
>>> #12 0x0000000080212bac in json_lexer_feed_char (address@hidden, 
>>> ch=<optimized out>, address@hidden) at 
>>> /home/cborntra/REPOS/qemu/qobject/json-lexer.c:303
>>> #13 0x0000000080212cfe in json_lexer_feed (lexer=0x8080b7c0, 
>>> buffer=<optimized out>, size=<optimized out>) at 
>>> /home/cborntra/REPOS/qemu/qobject/json-lexer.c:356
>>> #14 0x00000000801fb10e in json_message_parser_feed (parser=<optimized out>, 
>>> buffer=<optimized out>, size=<optimized out>) at 
>>> /home/cborntra/REPOS/qemu/qobject/json-streamer.c:110
>>> #15 0x0000000080049f28 in monitor_control_read (opaque=<optimized out>, 
>>> buf=<optimized out>, size=<optimized out>) at 
>>> /home/cborntra/REPOS/qemu/monitor.c:5125
>>> #16 0x00000000800c8636 in qemu_chr_be_write (len=1, buf=0x3ffffa9e010 
>>> "}[B\377\373\251\372\b", s=0x807f5af0) at 
>>> /home/cborntra/REPOS/qemu/qemu-char.c:213
>>> #17 tcp_chr_read (chan=<optimized out>, cond=<optimized out>, 
>>> opaque=0x807f5af0) at /home/cborntra/REPOS/qemu/qemu-char.c:2690
>>> #18 0x000003fffcc9f05a in g_main_context_dispatch () from 
>>> /lib64/libglib-2.0.so.0
>>> #19 0x00000000801ae3e0 in glib_pollfds_poll () at 
>>> /home/cborntra/REPOS/qemu/main-loop.c:190
>>> #20 os_host_main_loop_wait (timeout=<optimized out>) at 
>>> /home/cborntra/REPOS/qemu/main-loop.c:235
>>> #21 main_loop_wait (nonblocking=<optimized out>) at 
>>> /home/cborntra/REPOS/qemu/main-loop.c:484
>>> #22 0x00000000800169e2 in main_loop () at 
>>> /home/cborntra/REPOS/qemu/vl.c:2024
>>> #23 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) 
>>> at /home/cborntra/REPOS/qemu/vl.c:4551
>>>
>>> Now. If aio_poll never returns, we have a deadlock here.
>>> To me it looks like, that aio_poll could be called from iothread_run, even 
>>> if there are no outstanding request.
>>> Opinions?
> 
> I have sent out one patch to fix the issue, and the title is
> "virtio-blk: data-plane: fix save/set .complete_request in start".
> 
> Please try this patch to see if it fixes your issue.

Yes, I have seen that patch. Unfortunately it does not make a difference for 
the managedsave case.

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] another locking issue in current dataplane code?, (continued)
- Re: [Qemu-devel] another locking issue in current dataplane code?, Christian Borntraeger, 2014/07/08
  - Re: [Qemu-devel] another locking issue in current dataplane code?, Ming Lei, 2014/07/08
    - Re: [Qemu-devel] another locking issue in current dataplane code?, Christian Borntraeger, 2014/07/08
    - Re: [Qemu-devel] another locking issue in current dataplane code?, Christian Borntraeger, 2014/07/08
    - Re: [Qemu-devel] another locking issue in current dataplane code?, Christian Borntraeger, 2014/07/09
    - Re: [Qemu-devel] another locking issue in current dataplane code?, Christian Borntraeger, 2014/07/08
    - Re: [Qemu-devel] another locking issue in current dataplane code?, Christian Borntraeger <=

Prev by Date: [Qemu-devel] Which method executes the translated blocks (TBs)?
Next by Date: [Qemu-devel] [PATCH 080/156] bochs: Check extent_size header field (CVE-2014-0142)
Previous by thread: Re: [Qemu-devel] another locking issue in current dataplane code?
Next by thread: [Qemu-devel] Strange behaviour with MSR?
Index(es):
- Date
- Thread