[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH] linux-user: add option to intercept execve() sy
From: |
Laurent Vivier |
Subject: |
Re: [Qemu-devel] [PATCH] linux-user: add option to intercept execve() syscalls |
Date: |
Thu, 21 Jan 2016 00:34:55 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.0 |
Hi Petros,
Le 18/01/2016 05:33, Petros Angelatos a écrit :
> From: Petros Angelatos <address@hidden>
>
> In order for one to use QEMU user mode emulation under a chroot, it is
> required to use binfmt_misc. This can be avoided by QEMU never doing a
> raw execve() to the host system.
Are there some reasons to not use binfmt_misc when we are able to do
chroot ?
Moreover binfmt_misc allows to execute binaries that cannot be read, I
think it is not possible with an userspace solution. And binfmt_misc
also allows to use credential and security tokens from the binaries, not
from the interpreter (See [1]), it is useful to run commands like "sudo".
With this solution, you can't mix several interpreters in your chroot: I
guess LXC has templates to create (ubuntu) containers where some
binaries are statically linked native ones to manage syscalls (like
netlink) that are not supported by qemu linux-user.
That said, I have nothing against the idea but I don't understand what
it is useful for... do you have some use cases ?
I think it is better to use kernel mechanisms when they are available...
Laurent
[1] https://patchwork.ozlabs.org/patch/215941/
https://www.kernel.org/doc/Documentation/binfmt_misc.txt
> Introduce a new option, -execve=path, that sets the absolute path to the
> QEMU interpreter and enables execve() interception. When a guest process
> tries to call execve(), qemu_execve() is called instead.
>
> qemu_execve() will prepend the interpreter set with -execve, similar to
> what binfmt_misc would do, and then pass the modified execve() to the
> host.
>
> It is necessary to parse hashbang scripts in that function otherwise
> the kernel will try to run the interpreter of a script without QEMU and
> get an invalid exec format error.
>
> Signed-off-by: Petros Angelatos <address@hidden>
> ---
> linux-user/main.c | 8 ++++
> linux-user/qemu.h | 1 +
> linux-user/syscall.c | 111
> ++++++++++++++++++++++++++++++++++++++++++++++++++-
> 3 files changed, 119 insertions(+), 1 deletion(-)
>
> diff --git a/linux-user/main.c b/linux-user/main.c
> index ee12035..5951279 100644
> --- a/linux-user/main.c
> +++ b/linux-user/main.c
> @@ -79,6 +79,7 @@ static void usage(int exitcode);
>
> static const char *interp_prefix = CONFIG_QEMU_INTERP_PREFIX;
> const char *qemu_uname_release;
> +const char *qemu_execve_path;
>
> /* XXX: on x86 MAP_GROWSDOWN only works if ESP <= address + 32, so
> we allocate a bigger stack. Need a better solution, for example
> @@ -3828,6 +3829,11 @@ static void handle_arg_guest_base(const char *arg)
> have_guest_base = 1;
> }
>
> +static void handle_arg_execve(const char *arg)
> +{
> + qemu_execve_path = strdup(arg);
> +}
> +
> static void handle_arg_reserved_va(const char *arg)
> {
> char *p;
> @@ -3913,6 +3919,8 @@ static const struct qemu_argument arg_table[] = {
> "uname", "set qemu uname release string to 'uname'"},
> {"B", "QEMU_GUEST_BASE", true, handle_arg_guest_base,
> "address", "set guest_base address to 'address'"},
> + {"execve", "QEMU_EXECVE", true, handle_arg_execve,
> + "path", "use interpreter at 'path' when a process calls
> execve()"},
> {"R", "QEMU_RESERVED_VA", true, handle_arg_reserved_va,
> "size", "reserve 'size' bytes for guest virtual address space"},
> {"d", "QEMU_LOG", true, handle_arg_log,
> diff --git a/linux-user/qemu.h b/linux-user/qemu.h
> index bd90cc3..0d9b058 100644
> --- a/linux-user/qemu.h
> +++ b/linux-user/qemu.h
> @@ -140,6 +140,7 @@ void init_task_state(TaskState *ts);
> void task_settid(TaskState *);
> void stop_all_tasks(void);
> extern const char *qemu_uname_release;
> +extern const char *qemu_execve_path;
> extern unsigned long mmap_min_addr;
>
> /* ??? See if we can avoid exposing so much of the loader internals. */
> diff --git a/linux-user/syscall.c b/linux-user/syscall.c
> index 0cbace4..d0b5442 100644
> --- a/linux-user/syscall.c
> +++ b/linux-user/syscall.c
> @@ -5854,6 +5854,109 @@ static target_timer_t get_timer_id(abi_long arg)
> return timerid;
> }
>
> +#define BINPRM_BUF_SIZE 128
> +
> +/* qemu_execve() Must return target values and target errnos. */
> +static abi_long qemu_execve(char *filename, char *argv[],
> + char *envp[])
> +{
> + char *i_arg = NULL, *i_name = NULL;
> + char **new_argp;
> + int argc, fd, ret, i, offset = 3;
> + char *cp;
> + char buf[BINPRM_BUF_SIZE];
> +
> + for (argc = 0; argv[argc] != NULL; argc++) {
> + /* nothing */ ;
> + }
> +
> + fd = open(filename, O_RDONLY);
> + if (fd == -1) {
> + return -ENOENT;
> + }
> +
> + ret = read(fd, buf, BINPRM_BUF_SIZE);
> + if (ret == -1) {
> + close(fd);
> + return -ENOENT;
> + }
> +
> + close(fd);
> +
> + /* adapted from the kernel
> + *
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/fs/binfmt_script.c
> + */
> + if ((buf[0] == '#') && (buf[1] == '!')) {
> + /*
> + * This section does the #! interpretation.
> + * Sorta complicated, but hopefully it will work. -TYT
> + */
> +
> + buf[BINPRM_BUF_SIZE - 1] = '\0';
> + cp = strchr(buf, '\n');
> + if (cp == NULL) {
> + cp = buf+BINPRM_BUF_SIZE-1;
> + }
> + *cp = '\0';
> + while (cp > buf) {
> + cp--;
> + if ((*cp == ' ') || (*cp == '\t')) {
> + *cp = '\0';
> + } else {
> + break;
> + }
> + }
> + for (cp = buf+2; (*cp == ' ') || (*cp == '\t'); cp++) {
> + /* nothing */ ;
> + }
> + if (*cp == '\0') {
> + return -ENOEXEC; /* No interpreter name found */
> + }
> + i_name = cp;
> + i_arg = NULL;
> + for ( ; *cp && (*cp != ' ') && (*cp != '\t'); cp++) {
> + /* nothing */ ;
> + }
> + while ((*cp == ' ') || (*cp == '\t')) {
> + *cp++ = '\0';
> + }
> + if (*cp) {
> + i_arg = cp;
> + }
> +
> + if (i_arg) {
> + offset = 5;
> + } else {
> + offset = 4;
> + }
> + }
> +
> + new_argp = alloca((argc + offset + 1) * sizeof(void *));
> +
> + /* Copy the original arguments with offset */
> + for (i = 0; i < argc; i++) {
> + new_argp[i + offset] = argv[i];
> + }
> +
> + new_argp[0] = strdup(qemu_execve_path);
> + new_argp[1] = strdup("-0");
> + new_argp[offset] = filename;
> + new_argp[argc + offset] = NULL;
> +
> + if (i_name) {
> + new_argp[2] = i_name;
> + new_argp[3] = i_name;
> +
> + if (i_arg) {
> + new_argp[4] = i_arg;
> + }
> + } else {
> + new_argp[2] = argv[0];
> + }
> +
> + return get_errno(execve(qemu_execve_path, new_argp, envp));
> +}
> +
> /* do_syscall() should always have a single exit point at the end so
> that actions, such as logging of syscall results, can be performed.
> All errnos that do_syscall() returns must be -TARGET_<errcode>. */
> @@ -6113,7 +6216,13 @@ abi_long do_syscall(void *cpu_env, int num, abi_long
> arg1,
>
> if (!(p = lock_user_string(arg1)))
> goto execve_efault;
> - ret = get_errno(execve(p, argp, envp));
> +
> + if (qemu_execve_path && *qemu_execve_path) {
> + ret = get_errno(qemu_execve(p, argp, envp));
> + } else {
> + ret = get_errno(execve(p, argp, envp));
> + }
> +
> unlock_user(p, arg1, 0);
>
> goto execve_end;
>