qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [External] Re: [RFC PATCH 0/9] Support for Virtio-fs daemon crash re


From: Jiachen Zhang
Subject: Re: [External] Re: [RFC PATCH 0/9] Support for Virtio-fs daemon crash reconnection
Date: Fri, 18 Dec 2020 17:39:34 +0800



On Wed, Dec 16, 2020 at 11:36 PM Marc-André Lureau <marcandre.lureau@gmail.com> wrote:
Hi

On Tue, Dec 15, 2020 at 8:22 PM Jiachen Zhang <zhangjiachen.jaycee@bytedance.com> wrote:
Hi, all

We implement virtio-fs crash reconnection in this patchset. The crash
reconnection of virtiofsd here is completely transparent to guest, no
remount in guest is needed, even the inflight requests can be handled
normally after reconnection. We are looking forward to any comments.

Thanks,
Jiachen


OVERVIEW:

To support virtio-fs crash reconnection, we need to support the recovery
of 1) inflight FUSE request, and 2) virtiofsd internal status information.

Fortunately, QEMU's vhost-user reconnection framework already supports
inflight I/O tracking by using VHOST_USER_GET_INFLIGHT_FD and
VHOST_USER_SET_INFLIGHT_FD (see 5ad204bf2 and 5f9ff1eff for details).
As the FUSE requests are transferred by virtqueue I/O requests, by using
the vhost-user inflight I/O tracking, we can recover the inflight FUSE
requests.

To support virtiofsd internal status recovery, we introduce 4 new
vhost-user message types. As shown in the following diagram, two of them
are used to persist shared lo_maps and opened fds to QEMU, the other two
message types are used to restore the status when reconnecting.

                               VHOST_USER_SLAVE_SHM
                               VHOST_USER_SLAVE_FD
    +--------------+       Persist       +--------------------+
    |              <---------------------+                    |
    |     QEMU     |                     |  Virtio-fs Daemon  |
    |              +--------------------->                    |
    +--------------+       Restore       +--------------------+
            VHOST_USER_SET_SHM
            VHOST_USER_SET_FD

Although the 4 newly added message types are to support virtiofsd
reconnection in this patchset, it might be potential in other situation.
So we keep in mind to make them more general when add them to vhost
related source files. VHOST_USER_SLAVE_SHM and VHOST_USER_SET_SHM can be
used for memory sharing between a vhost-user daemon and QEMU,
VHOST_USER_SLAVE_FD and VHOST_USER_SET_FD would be useful if we want to
shared opened fds between QEMU process and vhost-user daemon process.

Before adding new messages to the already complex vhost-user protocol, can we evaluate other options?

First thing that came to my mind is that the memory state could be saved to disk or with a POSIX shared memory object.
 
Eventually, the protocol could just pass around the fds, and not make a special treatment for shared memory.

Then I remember systemd has a pretty good API & protocol for this sort of thing: sd_notify(3) (afaik, it is quite easy to implement a minimal handler)

You can store fds with FDSTORE=1 (with an optional associated FDNAME). sd_listen_fds() & others to get them back (note: passed by inheritance only I think). systemd seems to not make shm a special case either, just treat it like an opened fd to restore.

If we consider backend processes are going to be managed by libvirt or even a systemd service, is it a better alternative? sd_notify() offers a number of interesting features as well to monitor services.



Thanks for the suggestions. Actually, we choose to save all state information to QEMU because a virtiofsd has the same lifecycle as its QEMU master. However, saving things to a file do avoid communication with QEMU, and we no longer need to increase the complexity of vhost-user protocol. The suggestion to save fds to the systemd is also very reasonable if we don't consider the lifecycle issues, we will try it.

All the best,
Jiachen




USAGE and NOTES:

- The commits are rebased to a recent QEMU master commit b4d939133dca0fa2b.

- ",reconnect=1" should be added to the "-chardev socket" of vhost-user-fs-pci
in the QEMU command line, for example:

    qemu-system-x86_64 ... \
    -chardev socket,id=char0,path=/tmp/vhostqemu,reconnect=1 \
    -device vhost-user-fs-pci,queue-size=1024,chardev=char0,tag=myfs \
    ...

- We add new options for virtiofsd to enable or disable crash reconnection.
And some options are not supported by crash reconnection. So add following
options to virtiofsd to enable reconnection:

    virtiofsd ... -o reconnect -o no_mount_ns -o no_flock -o no_posix_lock
    -o no_xattr ...

- The reasons why virtiofsd-side locking, extended attributes, and mount
namespace are not supported is explained in the commit message of the 6th
patch (virtiofsd: Add two new options for crash reconnection).

- The 9th patch is a work-around that will not affect the overall correctness.
We remove the qsort related codes because we found that when resubmit_num is
larger than 64, seccomp will kill the virtiofsd process.

- Support for dax version virtiofsd is very possible and requires almost no
additional change to this patchset.


Jiachen Zhang (9):
  vhost-user-fs: Add support for reconnection of vhost-user-fs backend
  vhost: Add vhost-user message types for sending shared memory and file
    fds
  vhost-user-fs: Support virtiofsd crash reconnection
  libvhost-user: Add vhost-user message types for sending shared memory
    and file fds
  virtiofsd: Convert the struct lo_map array to a more flatten layout
  virtiofsd: Add two new options for crash reconnection
  virtiofsd: Persist/restore lo_map and opened fds to/from QEMU
  virtiofsd: Ensure crash consistency after reconnection
  virtiofsd: (work around) Comment qsort in inflight I/O tracking

 contrib/libvhost-user/libvhost-user.c | 106 +++-
 contrib/libvhost-user/libvhost-user.h |  70 +++
 docs/interop/vhost-user.rst           |  41 ++
 hw/virtio/vhost-user-fs.c             | 334 ++++++++++-
 hw/virtio/vhost-user.c                | 123 ++++
 hw/virtio/vhost.c                     |  42 ++
 include/hw/virtio/vhost-backend.h     |   6 +
 include/hw/virtio/vhost-user-fs.h     |  16 +-
 include/hw/virtio/vhost.h             |  42 ++
 tools/virtiofsd/fuse_lowlevel.c       |  24 +-
 tools/virtiofsd/fuse_virtio.c         |  44 ++
 tools/virtiofsd/fuse_virtio.h         |   1 +
 tools/virtiofsd/helper.c              |   9 +
 tools/virtiofsd/passthrough_helpers.h |   2 +-
 tools/virtiofsd/passthrough_ll.c      | 830 ++++++++++++++++++--------
 tools/virtiofsd/passthrough_seccomp.c |   1 +
 16 files changed, 1413 insertions(+), 278 deletions(-)

--
2.20.1




--
Marc-André Lureau

reply via email to

[Prev in Thread] Current Thread [Next in Thread]