qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Use of unshare(CLONE_FS) in virtiofsd


From: Florian Weimer
Subject: Use of unshare(CLONE_FS) in virtiofsd
Date: Fri, 04 Nov 2022 08:50:45 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)

I've got a proposed extension for glibc's pthread_create which allows
the creation of threads with a dedicated current working
directory/umask/chroot:

  [PATCH 0/2] Introduce per-thread file system properties on Linux
  <https://sourceware.org/pipermail/libc-alpha/2022-October/142640.html>

I expect that glibc integration will work around the seccomp issue
mentioned in a comment (also brought up by the Samba people for their
use) because glibc will perform the unshare directly during the clone
system call, and not via a separate system call.

I see that unshare(CLONE_FS) was introduced in this commit:

commit bdfd66788349acc43cd3f1298718ad491663cfcc
Author: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Date:   Thu Feb 27 14:59:27 2020 +0900

    virtiofsd: Fix xattr operations
    
    Current virtiofsd has problems about xattr operations and
    they does not work properly for directory/symlink/special file.
    
    The fundamental cause is that virtiofsd uses openat() + f...xattr()
    systemcalls for xattr operation but we should not open symlink/special
    file in the daemon. Therefore the function is restricted.
    
    Fix this problem by:
     1. during setup of each thread, call unshare(CLONE_FS)
     2. in xattr operations (i.e. lo_getxattr), if inode is not a regular
        file or directory, use fchdir(proc_loot_fd) + ...xattr() +
        fchdir(root.fd) instead of openat() + f...xattr()
    
        (Note: for a regular file/directory openat() + f...xattr()
         is still used for performance reason)
    
    With this patch, xfstests generic/062 passes on virtiofs.
    
    This fix is suggested by Miklos Szeredi and Stefan Hajnoczi.
    The original discussion can be found here:
      https://www.redhat.com/archives/virtio-fs/2019-October/msg00046.html
    
    Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
    Message-Id: <20200227055927.24566-3-misono.tomohiro@jp.fujitsu.com>
    Acked-by: Vivek Goyal <vgoyal@redhat.com>
    Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
    Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Now the question has come up on the libc-coord list why the *at
interfaces are not used in such cases:

  <https://www.openwall.com/lists/libc-coord/2022/10/24/3>

Clearly the kernel lacks support for fgetxattrat today.  The usual
recommendation for emulating it is to use openat with O_PATH, and then
use getxattr on the virtual /proc/self/fd path.  This needs an
additional system call (openat, getxattr, close instead of fchdir,
getxattr), but it avoids the unshare(CLONE_FS) call behind libc's back.
The directory entries in /proc/self/fd present as symbolic links, but
are not implemented as such by the kernel: there is no separate pathname
lookup for already-open O_PATH descriptors, so there is no race.

Thoughts?

Thanks,
Florian




reply via email to

[Prev in Thread] Current Thread [Next in Thread]