qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH for-5.1 2/3] virtiofsd: add container-friendly -o chroot sand


From: Stefan Hajnoczi
Subject: Re: [PATCH for-5.1 2/3] virtiofsd: add container-friendly -o chroot sandboxing option
Date: Thu, 23 Jul 2020 13:28:50 +0100

On Wed, Jul 22, 2020 at 06:58:20PM +0100, Dr. David Alan Gilbert wrote:
> * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > virtiofsd cannot run in an unprivileged container because CAP_SYS_ADMIN
> > is required to create namespaces.
> > 
> > Introduce a weaker sandbox that is sufficient in container environments
> > because the container runtime already sets up namespaces. Use chroot to
> > restrict path traversal to the shared directory.
> > 
> > virtiofsd loses the following:
> > 
> > 1. Mount namespace. The process chroots to the shared directory but
> >    leaves the mounts in place. Seccomp rejects mount(2)/umount(2)
> >    syscalls.
> 
> OK, I'm guessing the behaviour of what happens if the host adds another
> mount afterwards might be different?

Running inside a container with -o chroot:
 - The container has its own mount namespace and it is therefore not
   affected, modulo shared subtrees (see mount(8)).

Running outside a container with -o chroot:
 - Path traversal can only reach mounts that are made within the shared
   directory tree. Technically other mounts are still there but it is
   not possible to reach them via file system paths.

> > 2. Pid namespace. This should be fine because virtiofsd is the only
> >    process running in the container.
> 
> Is it ? Isn't the qemu and any other vhost-user processes also in the
> same container?

No. QEMU, virtiofsd, and other vhost-user processes should run in their
own containers. Application container images are typically designed to
run a single program per container. It's technically possible to launch
multiple programs but that is considered bad practice for application
containers.

Kubernetes:
Containers in a pod do not share a single pid namespace by default.
Pods do share a single network namespace so they can communicate via
UNIX domain sockets.

> > 3. Network namespace. This should be fine because seccomp already
> >    rejects the connect(2) syscall, but an additional layer of security
> >    is lost. Container runtime-specific network security policies can be
> >    used drop network traffic (except for the vhost-user UNIX domain
> >    socket).
> 
> Should this be tied to the same flag - this feels different from the
> chroot specific problem.

Good point. Daniel Berrange has suggested another command-line syntax
that makes sandbox configuration more modular. I'll try to implement
something like that.

Stefan

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]