Re: [Qemu-devel] [Bug 1224444] Re: virtio-serial loses writes when used

qemu-devel
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [Bug 1224444] Re: virtio-serial loses writes when used

From:	Laszlo Ersek \(Red Hat\)
Subject:	Re: [Qemu-devel] [Bug 1224444] Re: virtio-serial loses writes when used over virtio-mmio
Date:	Tue, 17 Sep 2013 16:54:14 -0000
On 09/17/13 11:51, Richard Jones wrote:
>> There's at least three cases here I guess (KVM + eventfd, KVM without
>> eventfd (enforceable eg. with the "ioeventfd" property for virtio
>> devices), and TCG). We're probably talking about the third case.
> 
> To clarify on this point: I have reproduced this bug on two different ARM
> machines, one using KVM and one using TCG.
> 
> In both cases they are ./configure'd without any special ioeventfd-related
> options, which appears to mean CONFIG_EVENTFD=y (in both cases).
> 
> In both cases I'm using a single vCPU.
> 

I think I have a theory now; it's quite convoluted.

The problem is a deadlock in ppoll() that is *masked* by unrelated file
descriptor traffic in all of the apparently working cases.

I wrote some ad-hoc debug patches, and this is the log leading up to the
hang:

  io_watch_poll_prepare: chardev:channel0 was_active:0 now_active:0
  qemu_poll_ns: timeout=4281013151888
  poll entry #0 fd 3
  poll entry #1 fd 5
  poll entry #2 fd 0
  poll entry #3 fd 11
  poll entry #4 fd 4
  trying to open virtio-serial channel 
'/dev/virtio-ports/org.libguestfs.channel.0'
  opened the socket, sock = 3
  udevadm settle
  libguestfs: recv_from_daemon: received GUESTFS_LAUNCH_FLAG
  libguestfs: [21734ms] appliance is up
  Guest launched OK.
  libguestfs: writing the data to the socket (size = 64)
  sent magic GUESTFS_LAUNCH_FLAG
  main_loop waiting for next request
  libguestfs: data written OK
  <HANG>

Setup call tree for the backend (ie. the UNIX domain socket):

   1  qemu_chr_open_socket() [qemu-char.c]
   2    unix_connect_opts() [util/qemu-sockets.c]
   3      qemu_socket()
   4      connect()
   5    qemu_chr_open_socket_fd() [qemu-char.c]
   6      io_channel_from_socket()
   7        g_io_channel_unix_new()
   8      tcp_chr_connect()
   9        io_add_watch_poll()
  10          g_source_new()
  11          g_source_attach()

This part connects to libguestfs's UNIX domain socket (the new socket
file descriptor, returned on line 3, is fd 6), and it registers a few
callbacks. Notably, the above doesn't try to add fd 6 to the set of
polled file descriptors.


Then, the setup call tree for the frontend (the virtio-serial port) is
as follows:

  12  virtconsole_initfn() [hw/char/virtio-console.c]
  13    qemu_chr_add_handlers() [qemu-char.c]

This reaches into the chardev (ie. the backend referenced by the
frontend, label "channel0"), and sets further callbacks.


The following seems to lead up to the hang:

  14 os_host_main_loop_wait() [main-loop.c]
  15   glib_pollfds_fill()
  16     g_main_context_prepare()
  17       io_watch_poll_prepare() [qemu-char.c]
  18         chr_can_read() [hw/char/virtio-console.c]
  19           virtio_serial_guest_ready() [hw/char/virtio-serial-bus.c]
  20
  21             if (use_multiport(port->vser) && !port->guest_connected) {
  22                 return 0;
  23             }
  24
  25             virtqueue_get_avail_bytes()
  26         g_io_create_watch() // conditionally
  27   qemu_poll_ns() [qemu-timer.c]
  28     ppoll()

Line 15: glib_pollfds_fill() prepares the array of file descriptors for
polling. As first step,

Line 16: it calls g_main_context_prepare(). This GLib function runs the
"prepare" callbacks for the GSource's in the main context.

The GSource for fd 6 has been allocated on line 10 above, and its
"prepare" callback has been set to io_watch_poll_prepare() there. It is
called on line 17.

Line 17: io_watch_poll_prepare() is a crucial function. It decides
whether fd 6 (the backend fd) will be added to the set of pollfds or
not.

It checks whether the frontend has become writeable (ie. it must have
been unwriteable up to now, but it must be writeable now). If so, a
(persistent) watch is created (on line 26), which is the action that
includes fd 6 in the set of pollfds after all. If there is no change in
the status of the frontend, the watch is not changed.

io_watch_poll_prepare() checks for the writeability of the frontend (ie.
virtio serial port) by the "fd_can_read" callback. This has been set to
chr_can_read() on line 13, inside virtconsole_initfn().

So, the frontend-writeability check happens in chr_can_read(), which
simply calls:

Line 19: virtio_serial_guest_ready(). This function *normally* checks
for the available room in the virtqueue (the guest receives serial port
data from the host by submitting "receive requests" that must be filled
in by the host); see line 25.

However, virtio_serial_guest_ready() first verifies whether the guest
has connected to the virtio-serial port at all. If not, then the
function will report the frontend unwriteable (lines 21-23).


Now, right before the hang, the guest hasn't yet connected to the
virtio-serial port. Therefore line 22 fires (= virtio-serial port is
unwriteable), which in turn results in *no* watch being created for the
backend. Consequently, the UNIX domain socket (fd 6) is not added to the
set of pollfds:

  io_watch_poll_prepare: chardev:channel0 was_active:0 now_active:0
  qemu_poll_ns: timeout=4281013151888
  poll entry #0 fd 3
  poll entry #1 fd 5
  poll entry #2 fd 0
  poll entry #3 fd 11
  poll entry #4 fd 4

At this point the IO thread is blocked in ppoll().

Then, the guest connects to the serial port, and sends data.

  trying to open virtio-serial channel 
'/dev/virtio-ports/org.libguestfs.channel.0'
  opened the socket, sock = 3
  udevadm settle

As discussed before, this guest-to-host transfer is handled by the VCPU
thread, and the data is written to fd 6 (the UNIX domain socket). The
host-side libguestfs component reads it, and answers.

  libguestfs: recv_from_daemon: received GUESTFS_LAUNCH_FLAG
  libguestfs: [21734ms] appliance is up
  Guest launched OK.
  libguestfs: writing the data to the socket (size = 64)
  sent magic GUESTFS_LAUNCH_FLAG
  main_loop waiting for next request  # note, this is a libguestfs message!
  libguestfs: data written OK
  <HANG>

Unfortunately, ppoll() is not watching out for fd 6 at all, hence this
deadlocks.


What about the successful cases though? A good proportion of the
attempts succeed.

This is explained by the fact that *other* file descriptors can break
out the IO-thread from ppoll().

- The most common example is KVM *with* eventfd support. The KVM eventfd
  (the host notifier) is part of the pollfd set, and whenever the guest
  sends some data, KVM kicks the eventfd, and ppoll() returns. This
  masks the problem universally.

- In the other two cases, we either have KVM *without* eventfd support,
  or TCG. Rich reproduced the hang under both, and he's seen successful
  (actually: masked deadlock) cases as well, on both.

  In these setups the file descriptor traffic that masks the problem is
  not from a KVM eventfd, hence the wakeup is quite random. There is
  sometimes a perceivable pause between ppoll() going to sleep and
  waking up. At other times there's no other fd traffic, and the
  deadlock persists.

In my testing on Rich's ARM box, the unrelated fd that breaks out the IO-thread
from ppoll() is the eventfd that belongs to the AIO thread pool. It is fd 11,
and it is allocated in:

  0x00512d3c in event_notifier_init (e=0x1f4ad80, active=0) at 
util/event_notifier-posix.c:34
  34          ret = eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC);
  (gdb) n
  39          if (ret >= 0) {
  (gdb) print ret
  $2 = 11
  (gdb) where
  #0  event_notifier_init (e=0x1f4ad80, active=0) at 
util/event_notifier-posix.c:39
  #1  0x00300b4c in thread_pool_init_one (pool=0x1f4ad80, ctx=0x1f39e18) at 
thread-pool.c:295
  #2  0x00300c6c in thread_pool_new (ctx=0x1f39e18) at thread-pool.c:313
  #3  0x0001686c in aio_get_thread_pool (ctx=0x1f39e18) at async.c:238
  #4  0x0006f9d8 in paio_submit (bs=0x1f4a020, fd=10, sector_num=0, 
qiov=0xbe88c164, nb_sectors=4,
      cb=0x389d8 <bdrv_co_io_em_complete>, opaque=0xb6505ec4, type=1) at 
block/raw-posix.c:799
  #5  0x0006fb84 in raw_aio_submit (bs=0x1f4a020, sector_num=0, 
qiov=0xbe88c164, nb_sectors=4,
      cb=0x389d8 <bdrv_co_io_em_complete>, opaque=0xb6505ec4, type=1) at 
block/raw-posix.c:828
  #6  0x0006fc28 in raw_aio_readv (bs=0x1f4a020, sector_num=0, qiov=0xbe88c164, 
nb_sectors=4,
      cb=0x389d8 <bdrv_co_io_em_complete>, opaque=0xb6505ec4) at 
block/raw-posix.c:836
  #7  0x00038b2c in bdrv_co_io_em (bs=0x1f4a020, sector_num=0, nb_sectors=4, 
iov=0xbe88c164, is_write=false)
      at block.c:3957
  #8  0x00038bf0 in bdrv_co_readv_em (bs=0x1f4a020, sector_num=0, nb_sectors=4, 
iov=0xbe88c164) at block.c:3974
  #9  0x000349d4 in bdrv_co_do_readv (bs=0x1f4a020, sector_num=0, nb_sectors=4, 
qiov=0xbe88c164, flags=(unknown: 0))
      at block.c:2619
  #10 0x00033804 in bdrv_rw_co_entry (opaque=0xbe88c0e8) at block.c:2236
  #11 0x0009ba54 in coroutine_trampoline (i0=32811528, i1=0) at 
coroutine-ucontext.c:118
  #12 0x492fd160 in setcontext () from /usr/lib/libc.so.6
  #13 0x492fd160 in setcontext () from /usr/lib/libc.so.6
  Backtrace stopped: previous frame identical to this frame (corrupt stack?)

And the transitory hang looks like:

  io_watch_poll_prepare: chardev:channel0 was_active:0 now_active:0
  qemu_poll_ns: timeout=4281193192443
  poll entry #0 fd 3
  poll entry #1 fd 5
  poll entry #2 fd 0
  poll entry #3 fd 11
  poll entry #4 fd 4

Again, at this point IO thread is blocked in ppoll(),

  trying to open virtio-serial channel   
'/dev/virtio-ports/org.libguestfs.channel.0'
  opened the socket, sock = 3
  udevadm settle

the guest transferred out some data,

  libguestfs: recv_from_daemon: received GUESTFS_LAUNCH_FLAG
  libguestfs: [20921ms] appliance is up
  Guest launched OK.
  libguestfs: writing the data to the socket (size = 64)
  sent magic GUESTFS_LAUNCH_FLAG
  main_loop waiting for next request
  libguestfs: data written OK

and the host side libguestfs has responded. The IO-thread is  blocked in
ppoll(), guaranteed, and it doesn't notice readiness of fd 6 for
reading.

However, the (completely unrelated) AIO thread-pool eventfd is kicked at
that point, and poll returns:

  ppoll(): 1, errno=Success
  poll entry #0 fd 3 events 25 revents 0
  poll entry #1 fd 5 events 1 revents 0
  poll entry #2 fd 0 events 1 revents 0
  poll entry #3 fd 11 events 1 revents 1
  poll entry #4 fd 4 events 1 revents 0

Which in turn allows the IO-thread to run os_host_main_loop_wait()
again, and *now* we're seeing the activation of fd 6 (its frontend, the
virtio-serial port has been connected by the guest in the meantime and
is now writeable):

  io_watch_poll_prepare: chardev:channel0 was_active:0 now_active:1
  qemu_poll_ns: timeout=0
  poll entry #0 fd 3
  poll entry #1 fd 5
  poll entry #2 fd 0
  poll entry #3 fd 11
  poll entry #4 fd 4
  poll entry #5 fd 6

And stuff works as expected from here on.


The VCPU thread needs to interrupt the IO-thread's ppoll() call
explicitly.

Basically, when the chardev's attached frontend (in this case, the
virtio serial port) experiences a change that would cause it to report
writeability in io_watch_poll_prepare() -- lines 17-18 --, it must
interrupt ppoll().

The following call tree seems relevant, but I'm not sure if it would be
appropriate. When the guest message

  trying to open virtio-serial channel '/dev/virtio-
ports/org.libguestfs.channel.0'

is printed, the following call chain is executed in the VCPU thread:

  #0  qemu_chr_fe_set_open (chr=0xf22190, fe_open=1) at qemu-char.c:3404
  #1  0x001079dc in set_guest_connected (port=0x1134f00, guest_connected=1) at 
hw/char/virtio-console.c:83
  #2  0x003cfd94 in handle_control_message (vser=0x1124360, buf=0xb50005f8, 
len=8)
      at /home/rjones/d/qemu/hw/char/virtio-serial-bus.c:379
  #3  0x003d0020 in control_out (vdev=0x1124360, vq=0x11246b0) at 
/home/rjones/d/qemu/hw/char/virtio-serial-bus.c:416
  #4  0x0044afe4 in virtio_queue_notify_vq (vq=0x11246b0) at 
/home/rjones/d/qemu/hw/virtio/virtio.c:720
  #5  0x0044b054 in virtio_queue_notify (vdev=0x1124360, n=3) at 
/home/rjones/d/qemu/hw/virtio/virtio.c:726
  #6  0x00271f30 in virtio_mmio_write (opaque=0x11278c8, offset=80, value=3, 
size=4) at hw/virtio/virtio-mmio.c:264
  #7  0x00456aac in memory_region_write_accessor (mr=0x1128ba8, addr=80, 
value=0xb5972b08, size=4, shift=0,
      mask=4294967295) at /home/rjones/d/qemu/memory.c:440
  #8  0x00456c90 in access_with_adjusted_size (addr=80, value=0xb5972b08, 
size=4, access_size_min=1,
      access_size_max=4, access=0x4569d0 <memory_region_write_accessor>, 
mr=0x1128ba8)
      at /home/rjones/d/qemu/memory.c:477
  #9  0x0045955c in memory_region_dispatch_write (mr=0x1128ba8, addr=80, 
data=3, size=4)
      at /home/rjones/d/qemu/memory.c:984
  #10 0x0045cee0 in io_mem_write (mr=0x1128ba8, addr=80, val=3, size=4) at 
/home/rjones/d/qemu/memory.c:1748
  #11 0x0035d8dc in address_space_rw (as=0xa982f8 <address_space_memory>, 
addr=471008336, buf=0xb6f3d028 "\003",
      len=4, is_write=true) at /home/rjones/d/qemu/exec.c:1954
  #12 0x0035ddf0 in cpu_physical_memory_rw (addr=471008336, buf=0xb6f3d028 
"\003", len=4, is_write=1)
      at /home/rjones/d/qemu/exec.c:2033
  #13 0x00453000 in kvm_cpu_exec (cpu=0x1097020) at 
/home/rjones/d/qemu/kvm-all.c:1665
  #14 0x0034ca94 in qemu_kvm_cpu_thread_fn (arg=0x1097020) at 
/home/rjones/d/qemu/cpus.c:802
  #15 0x494c6bc0 in start_thread () from /usr/lib/libpthread.so.0

Unfortunately, the leaf (ie. qemu_chr_fe_set_open()) doesn't do anything
here; the only chardev that sets the "chr_set_fe_open" callback is the
spicevmc backend.

I think the "socket" chardev might want to implement "chr_set_fe_open",
kicking a (new) global eventfd, sending some signal to the IO-thread, or
interrupting ppoll() in some other way. A new global eventfd just for
this purpose seems quite the kludge, but it shouldn't be hard to
implement. It needs no handler at all.

Thanks
Laszlo

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1224444

Title:
  virtio-serial loses writes when used over virtio-mmio

Status in QEMU:
  New

Bug description:
  virtio-serial appears to lose writes, but only when used on top of
  virtio-mmio.  The scenario is this:

  /home/rjones/d/qemu/arm-softmmu/qemu-system-arm \
      -global virtio-blk-device.scsi=off \
      -nodefconfig \
      -nodefaults \
      -nographic \
      -M vexpress-a15 \
      -machine accel=kvm:tcg \
      -m 500 \
      -no-reboot \
      -kernel /home/rjones/d/libguestfs/tmp/.guestfs-1001/kernel.27944 \
      -dtb /home/rjones/d/libguestfs/tmp/.guestfs-1001/dtb.27944 \
      -initrd /home/rjones/d/libguestfs/tmp/.guestfs-1001/initrd.27944 \
      -device virtio-scsi-device,id=scsi \
      -drive 
file=/home/rjones/d/libguestfs/tmp/libguestfsLa9dE2/scratch.1,cache=unsafe,format=raw,id=hd0,if=none
 \
      -device scsi-hd,drive=hd0 \
      -drive 
file=/home/rjones/d/libguestfs/tmp/.guestfs-1001/root.27944,snapshot=on,id=appliance,cache=unsafe,if=none
 \
      -device scsi-hd,drive=appliance \
      -device virtio-serial-device \
      -serial stdio \
      -chardev 
socket,path=/home/rjones/d/libguestfs/tmp/libguestfsLa9dE2/guestfsd.sock,id=channel0
 \
      -device virtserialport,chardev=channel0,name=org.libguestfs.channel.0 \
      -append 'panic=1 mem=500M console=ttyAMA0 udevtimeout=600 no_timer_check 
acpi=off printk.time=1 cgroup_disable=memory root=/dev/sdb selinux=0 
guestfs_verbose=1 TERM=xterm-256color'

  After the guest starts up, a daemon writes 4 bytes to a virtio-serial
  socket.  The host side reads these 4 bytes correctly and writes a 64
  byte message.  The guest never sees this message.

  I enabled virtio-mmio debugging, and this is what is printed (## = my
  comment):

  ## guest opens the socket:
  trying to open virtio-serial channel 
'/dev/virtio-ports/org.libguestfs.channel.0'
  virtio_mmio: virtio_mmio_write offset 0x50 value 0x3
  opened the socket, sock = 3
  udevadm settle
  ## guest writes 4 bytes to the socket:
  virtio_mmio: virtio_mmio_write offset 0x50 value 0x5
  virtio_mmio: virtio_mmio setting IRQ 1
  virtio_mmio: virtio_mmio_read offset 0x60
  virtio_mmio: virtio_mmio_write offset 0x64 value 0x1
  virtio_mmio: virtio_mmio setting IRQ 0
  sent magic GUESTFS_LAUNCH_FLAG
  ## host reads 4 bytes successfully:
  main_loop libguestfs: recv_from_daemon: received GUESTFS_LAUNCH_FLAG
  libguestfs: [14605ms] appliance is up
  Guest launched OK.
  ## host writes 64 bytes to socket:
  libguestfs: writing the data to the socket (size = 64)
  waiting for next request
  libguestfs: data written OK
  ## hangs here forever with guest in read() call never receiving any data

  I am using qemu from git today (2d1fe1873a984).

  Some notes:

  - It's not 100% reproducible.  Sometimes everything works fine, although it 
fails "often" (eg > 2/3rds of the time).
  - KVM is being used.
  - We've long used virtio-serial on x86 and have never seen anything like this.

  This is what the output looks like when it doesn't fail:

  trying to open virtio-serial channel 
'/dev/virtio-ports/org.libguestfs.channel.0
  '
  virtio_mmio: virtio_mmio_write offset 0x50 value 0x3
  opened the socket, sock = 3
  udevadm settle
  virtio_mmio: virtio_mmio_write offset 0x50 value 0x5
  virtio_mmio: virtio_mmio setting IRQ 1
  virtio_mmio: virtio_mmio_read offset 0x60
  virtio_mmio: virtio_mmio_write offset 0x64 value 0x1
  virtio_mmio: virtio_mmio setting IRQ 0
  sent magic GUESTlibguestfs: recv_from_daemon: received GUESTFS_LAUNCH_FLAG
  libguestfs: [14710ms] appliance is up
  Guest launched OK.
  libguestfs: writing the data to the socket (size = 64)
  FS_LAUNCH_FLAG
  main_loop waiting for next request
  libguestfs: data written OK
  virtio_mmio: virtio_mmio_write offset 0x50 value 0x2
  virtio_mmio: virtio_mmio setting IRQ 1
  virtio_mmio: virtio_mmio setting IRQ 1
  virtio_mmio: virtio_mmio_write offset 0x50 value 0x2
  virtio_mmio: virtio_mmio_read offset 0x60
  virtio_mmio: virtio_mmio setting IRQ 1
  virtio_mmio: virtio_mmio_write offset 0x64 value 0x1
  virtio_mmio: virtio_mmio setting IRQ 0
  virtio_mmio: virtio_mmio_read offset 0x60
  virtio_mmio: virtio_mmio_write offset 0x64 value 0x0
  virtio_mmio: virtio_mmio setting IRQ 0
  virtio_mmio: virtio_mmio_read offset 0x60
  virtio_mmio: virtio_mmio_write offset 0x64 value 0x1
  [... more virtio-mmio lines omitted ...]
  virtio_mmio: virtio_mmio_write offset 0x64 value 0x0
  virtio_mmio: virtio_mmio setting IRQ 1
  virtio_mmio: virtio_mmio_read offset 0x60
  virtio_mmio: virtio_mmio_write offset 0x64 value 0x1
  virtio_mmio: virtio_mmio setting IRQ 0
  guestfsd: main_loop: new request, len 0x3c
  virtio_mmio: virtio_mmio_write offset 0x50 value 0x4
  0000: 20 00 f5 f5 00 00 00 04 00 00 00 d2 00 00 00 00 | 
...............|virtio_mmio: virtio_mmio_write offset 0x50 value 0x2
  virtio_mmio: virtio_mmio setting IRQ 1

  virtio_mmio: virtio_mmio_read offset 0x60
  virtio_mmio: virtio_mmio_write offset 0x64 value 0x1
  virtio_mmio: virtio_mmio setting IRQ 0
  0010: 00 12 34 00 00 00 00 00 00 00 00 00 00 00 00 00 |..4.............|
  0020: 00 00 00 00 00 00 00 00 00 00 00 08 2f 64 65 76 |............/dev|
  0030: 2f 73 64 61 00 00 00 03 6d 62 72 00             |/sda....mbr.    |
  virtio_mmio: virtio_mmio_write offset 0x50 value 0x2
  virtio_mmio: virtio_mmio setting IRQ 1
  virtio_mmio: virtio_mmio_read offset 0x60
  virtio_mmio: virtio_mmio_write offset 0x64 value 0x1
  virtio_mmio: virtio_mmio setting IRQ 0
  virtio_mmio: virtio_mmio_write offset 0x50 value 0x2
  virtio_mmio: virtio_mmio setting IRQ 1
  virtio_mmio: virtio_mmio_read offset 0x60
  virtio_mmio: virtio_mmio_write offset 0x64 value 0x1
  virtio_mmio: virtio_mmio setting IRQ 0

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1224444/+subscriptions
[Prev in Thread]
Current Thread
[Next in Thread]
[Qemu-devel] [PATCH] cpu-exec: Fix compiler warning (-Werror=clobbered), Stefan Weil, 2013/09/17
Prev by Date: Re: [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize
Next by Date: [Qemu-devel] [PATCH] cpu-exec: Fix compiler warning (-Werror=clobbered)
Previous by thread: [Qemu-devel] [PATCH] block: Fix compiler warning (-Werror=uninitialized)
Index(es):
- Date
- Thread