[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[PULL 05/22] util/userfaultfd: Support /dev/userfaultfd
From: |
Xxx Xx |
Subject: |
[PULL 05/22] util/userfaultfd: Support /dev/userfaultfd |
Date: |
Mon, 13 Feb 2023 03:28:54 +0100 |
From: Peter Xu <peterx@redhat.com>
Teach QEMU to use /dev/userfaultfd when it existed and fallback to the
system call if either it's not there or doesn't have enough permission.
Firstly, as long as the app has permission to access /dev/userfaultfd, it
always have the ability to trap kernel faults which QEMU mostly wants.
Meanwhile, in some context (e.g. containers) the userfaultfd syscall can be
forbidden, so it can be the major way to use postcopy in a restricted
environment with strict seccomp setup.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
util/userfaultfd.c | 32 ++++++++++++++++++++++++++++++++
util/trace-events | 1 +
2 files changed, 33 insertions(+)
diff --git a/util/userfaultfd.c b/util/userfaultfd.c
index 4953b3137d..fdff4867e8 100644
--- a/util/userfaultfd.c
+++ b/util/userfaultfd.c
@@ -18,10 +18,42 @@
#include <poll.h>
#include <sys/syscall.h>
#include <sys/ioctl.h>
+#include <fcntl.h>
+
+typedef enum {
+ UFFD_UNINITIALIZED = 0,
+ UFFD_USE_DEV_PATH,
+ UFFD_USE_SYSCALL,
+} uffd_open_mode;
int uffd_open(int flags)
{
#if defined(__NR_userfaultfd)
+ static uffd_open_mode open_mode;
+ static int uffd_dev;
+
+ /* Detect how to generate uffd desc when run the 1st time */
+ if (open_mode == UFFD_UNINITIALIZED) {
+ /*
+ * Make /dev/userfaultfd the default approach because it has better
+ * permission controls, meanwhile allows kernel faults without any
+ * privilege requirement (e.g. SYS_CAP_PTRACE).
+ */
+ uffd_dev = open("/dev/userfaultfd", O_RDWR | O_CLOEXEC);
+ if (uffd_dev >= 0) {
+ open_mode = UFFD_USE_DEV_PATH;
+ } else {
+ /* Fallback to the system call */
+ open_mode = UFFD_USE_SYSCALL;
+ }
+ trace_uffd_detect_open_mode(open_mode);
+ }
+
+ if (open_mode == UFFD_USE_DEV_PATH) {
+ assert(uffd_dev >= 0);
+ return ioctl(uffd_dev, USERFAULTFD_IOC_NEW, flags);
+ }
+
return syscall(__NR_userfaultfd, flags);
#else
return -EINVAL;
diff --git a/util/trace-events b/util/trace-events
index c8f53d7d9f..16f78d8fe5 100644
--- a/util/trace-events
+++ b/util/trace-events
@@ -93,6 +93,7 @@ qemu_vfio_region_info(const char *desc, uint64_t region_ofs,
uint64_t region_siz
qemu_vfio_pci_map_bar(int index, uint64_t region_ofs, uint64_t region_size,
int ofs, void *host) "map region bar#%d addr 0x%"PRIx64" size 0x%"PRIx64" ofs
0x%x host %p"
#userfaultfd.c
+uffd_detect_open_mode(int mode) "%d"
uffd_query_features_nosys(int err) "errno: %i"
uffd_query_features_api_failed(int err) "errno: %i"
uffd_create_fd_nosys(int err) "errno: %i"
--
2.39.1
- [PULL 00/22] Migration 20230213 patches, Xxx Xx, 2023/02/12
- [PULL 02/22] multifd: cleanup the function multifd_channel_connect, Xxx Xx, 2023/02/12
- [PULL 01/22] migration: Remove spurious files, Xxx Xx, 2023/02/12
- [PULL 03/22] multifd: Remove some redundant code, Xxx Xx, 2023/02/12
- [PULL 04/22] linux-headers: Update to v6.1, Xxx Xx, 2023/02/12
- [PULL 05/22] util/userfaultfd: Support /dev/userfaultfd,
Xxx Xx <=
- [PULL 08/22] migration: Split ram_bytes_total_common() in two functions, Xxx Xx, 2023/02/12
- [PULL 07/22] migration: Make find_dirty_block() return a single parameter, Xxx Xx, 2023/02/12
- [PULL 06/22] migration: Simplify ram_find_and_save_block(), Xxx Xx, 2023/02/12
- [PULL 10/22] migration: Make ram_save_target_page() a pointer, Xxx Xx, 2023/02/12
- [PULL 11/22] migration: I messed state_pending_exact/estimate, Xxx Xx, 2023/02/12
- [PULL 12/22] AVX512 support for xbzrle_encode_buffer, Xxx Xx, 2023/02/12
- [PULL 09/22] migration: Calculate ram size once, Xxx Xx, 2023/02/12
- [PULL 13/22] Update bench-code for addressing CI problem, Xxx Xx, 2023/02/12
- [PULL 14/22] migration: Rework multi-channel checks on URI, Xxx Xx, 2023/02/12
- [PULL 15/22] migration: Cleanup postcopy_preempt_setup(), Xxx Xx, 2023/02/12