qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] linux-user: add option to intercept execve() sy


From: Laurent Vivier
Subject: Re: [Qemu-devel] [PATCH] linux-user: add option to intercept execve() syscalls
Date: Thu, 21 Jan 2016 00:34:55 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.0

Hi Petros,

Le 18/01/2016 05:33, Petros Angelatos a écrit :
> From: Petros Angelatos <address@hidden>
> 
> In order for one to use QEMU user mode emulation under a chroot, it is
> required to use binfmt_misc. This can be avoided by QEMU never doing a
> raw execve() to the host system.

Are there some reasons to not use binfmt_misc when we are able to do
chroot ?

Moreover binfmt_misc allows to execute binaries that cannot be read, I
think it is not possible with an userspace solution. And binfmt_misc
also allows to use credential and security tokens from the binaries, not
from the interpreter (See [1]), it is useful to run commands like "sudo".

With this solution, you can't mix several interpreters in your chroot: I
guess LXC has templates to create (ubuntu) containers where some
binaries are statically linked native ones to manage syscalls (like
netlink) that are not supported by qemu linux-user.

That said, I have nothing against the idea but I don't understand what
it is useful for... do you have some use cases ?

I think it is better to use kernel mechanisms when they are available...

Laurent
[1] https://patchwork.ozlabs.org/patch/215941/
    https://www.kernel.org/doc/Documentation/binfmt_misc.txt

> Introduce a new option, -execve=path, that sets the absolute path to the
> QEMU interpreter and enables execve() interception. When a guest process
> tries to call execve(), qemu_execve() is called instead.
> 
> qemu_execve() will prepend the interpreter set with -execve, similar to
> what binfmt_misc would do, and then pass the modified execve() to the
> host.
> 
> It is necessary to parse hashbang scripts in that function otherwise
> the kernel will try to run the interpreter of a script without QEMU and
> get an invalid exec format error.
> 
> Signed-off-by: Petros Angelatos <address@hidden>
> ---
>  linux-user/main.c    |   8 ++++
>  linux-user/qemu.h    |   1 +
>  linux-user/syscall.c | 111 
> ++++++++++++++++++++++++++++++++++++++++++++++++++-
>  3 files changed, 119 insertions(+), 1 deletion(-)
> 
> diff --git a/linux-user/main.c b/linux-user/main.c
> index ee12035..5951279 100644
> --- a/linux-user/main.c
> +++ b/linux-user/main.c
> @@ -79,6 +79,7 @@ static void usage(int exitcode);
>  
>  static const char *interp_prefix = CONFIG_QEMU_INTERP_PREFIX;
>  const char *qemu_uname_release;
> +const char *qemu_execve_path;
>  
>  /* XXX: on x86 MAP_GROWSDOWN only works if ESP <= address + 32, so
>     we allocate a bigger stack. Need a better solution, for example
> @@ -3828,6 +3829,11 @@ static void handle_arg_guest_base(const char *arg)
>      have_guest_base = 1;
>  }
>  
> +static void handle_arg_execve(const char *arg)
> +{
> +    qemu_execve_path = strdup(arg);
> +}
> +
>  static void handle_arg_reserved_va(const char *arg)
>  {
>      char *p;
> @@ -3913,6 +3919,8 @@ static const struct qemu_argument arg_table[] = {
>       "uname",      "set qemu uname release string to 'uname'"},
>      {"B",          "QEMU_GUEST_BASE",  true,  handle_arg_guest_base,
>       "address",    "set guest_base address to 'address'"},
> +    {"execve",     "QEMU_EXECVE",      true,   handle_arg_execve,
> +     "path",       "use interpreter at 'path' when a process calls 
> execve()"},
>      {"R",          "QEMU_RESERVED_VA", true,  handle_arg_reserved_va,
>       "size",       "reserve 'size' bytes for guest virtual address space"},
>      {"d",          "QEMU_LOG",         true,  handle_arg_log,
> diff --git a/linux-user/qemu.h b/linux-user/qemu.h
> index bd90cc3..0d9b058 100644
> --- a/linux-user/qemu.h
> +++ b/linux-user/qemu.h
> @@ -140,6 +140,7 @@ void init_task_state(TaskState *ts);
>  void task_settid(TaskState *);
>  void stop_all_tasks(void);
>  extern const char *qemu_uname_release;
> +extern const char *qemu_execve_path;
>  extern unsigned long mmap_min_addr;
>  
>  /* ??? See if we can avoid exposing so much of the loader internals.  */
> diff --git a/linux-user/syscall.c b/linux-user/syscall.c
> index 0cbace4..d0b5442 100644
> --- a/linux-user/syscall.c
> +++ b/linux-user/syscall.c
> @@ -5854,6 +5854,109 @@ static target_timer_t get_timer_id(abi_long arg)
>      return timerid;
>  }
>  
> +#define BINPRM_BUF_SIZE 128
> +
> +/* qemu_execve() Must return target values and target errnos. */
> +static abi_long qemu_execve(char *filename, char *argv[],
> +                  char *envp[])
> +{
> +    char *i_arg = NULL, *i_name = NULL;
> +    char **new_argp;
> +    int argc, fd, ret, i, offset = 3;
> +    char *cp;
> +    char buf[BINPRM_BUF_SIZE];
> +
> +    for (argc = 0; argv[argc] != NULL; argc++) {
> +        /* nothing */ ;
> +    }
> +
> +    fd = open(filename, O_RDONLY);
> +    if (fd == -1) {
> +        return -ENOENT;
> +    }
> +
> +    ret = read(fd, buf, BINPRM_BUF_SIZE);
> +    if (ret == -1) {
> +        close(fd);
> +        return -ENOENT;
> +    }
> +
> +    close(fd);
> +
> +    /* adapted from the kernel
> +     * 
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/fs/binfmt_script.c
> +     */
> +    if ((buf[0] == '#') && (buf[1] == '!')) {
> +        /*
> +         * This section does the #! interpretation.
> +         * Sorta complicated, but hopefully it will work.  -TYT
> +         */
> +
> +        buf[BINPRM_BUF_SIZE - 1] = '\0';
> +        cp = strchr(buf, '\n');
> +        if (cp == NULL) {
> +            cp = buf+BINPRM_BUF_SIZE-1;
> +        }
> +        *cp = '\0';
> +        while (cp > buf) {
> +            cp--;
> +            if ((*cp == ' ') || (*cp == '\t')) {
> +                *cp = '\0';
> +            } else {
> +                break;
> +            }
> +        }
> +        for (cp = buf+2; (*cp == ' ') || (*cp == '\t'); cp++) {
> +            /* nothing */ ;
> +        }
> +        if (*cp == '\0') {
> +            return -ENOEXEC; /* No interpreter name found */
> +        }
> +        i_name = cp;
> +        i_arg = NULL;
> +        for ( ; *cp && (*cp != ' ') && (*cp != '\t'); cp++) {
> +            /* nothing */ ;
> +        }
> +        while ((*cp == ' ') || (*cp == '\t')) {
> +            *cp++ = '\0';
> +        }
> +        if (*cp) {
> +            i_arg = cp;
> +        }
> +
> +        if (i_arg) {
> +            offset = 5;
> +        } else {
> +            offset = 4;
> +        }
> +    }
> +
> +    new_argp = alloca((argc + offset + 1) * sizeof(void *));
> +
> +    /* Copy the original arguments with offset */
> +    for (i = 0; i < argc; i++) {
> +        new_argp[i + offset] = argv[i];
> +    }
> +
> +    new_argp[0] = strdup(qemu_execve_path);
> +    new_argp[1] = strdup("-0");
> +    new_argp[offset] = filename;
> +    new_argp[argc + offset] = NULL;
> +
> +    if (i_name) {
> +        new_argp[2] = i_name;
> +        new_argp[3] = i_name;
> +
> +        if (i_arg) {
> +            new_argp[4] = i_arg;
> +        }
> +    } else {
> +        new_argp[2] = argv[0];
> +    }
> +
> +    return get_errno(execve(qemu_execve_path, new_argp, envp));
> +}
> +
>  /* do_syscall() should always have a single exit point at the end so
>     that actions, such as logging of syscall results, can be performed.
>     All errnos that do_syscall() returns must be -TARGET_<errcode>. */
> @@ -6113,7 +6216,13 @@ abi_long do_syscall(void *cpu_env, int num, abi_long 
> arg1,
>  
>              if (!(p = lock_user_string(arg1)))
>                  goto execve_efault;
> -            ret = get_errno(execve(p, argp, envp));
> +
> +            if (qemu_execve_path && *qemu_execve_path) {
> +                ret = get_errno(qemu_execve(p, argp, envp));
> +            } else {
> +                ret = get_errno(execve(p, argp, envp));
> +            }
> +
>              unlock_user(p, arg1, 0);
>  
>              goto execve_end;
> 



reply via email to

[Prev in Thread] Current Thread [Next in Thread]