[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd
From: |
Chao Peng |
Subject: |
Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd |
Date: |
Mon, 26 Sep 2022 22:23:30 +0800 |
On Fri, Sep 23, 2022 at 04:19:46PM +0100, Fuad Tabba wrote:
> > Regarding pKVM's use case, with the shim approach I believe this can be
> > done by
> > allowing userspace mmap() the "hidden" memfd, but with a ton of restrictions
> > piled on top.
> >
> > My first thought was to make the uAPI a set of KVM ioctls so that KVM could
> > tightly
> > tightly control usage without taking on too much complexity in the kernel,
> > but
> > working through things, routing the behavior through the shim itself might
> > not be
> > all that horrific.
> >
> > IIRC, we discarded the idea of allowing userspace to map the "private" fd
> > because
> > things got too complex, but with the shim it doesn't seem _that_ bad.
> >
> > E.g. on the memfd side:
> >
> > 1. The entire memfd must be mapped, and at most one mapping is allowed,
> > i.e.
> > mapping is all or nothing.
> >
> > 2. Acquiring a reference via get_pfn() is disallowed if there's a mapping
> > for
> > the restricted memfd.
> >
> > 3. Add notifier hooks to allow downstream users to further restrict
> > things.
> >
> > 4. Disallow splitting VMAs, e.g. to force userspace to munmap()
> > everything in
> > one shot.
> >
> > 5. Require that there are no outstanding references at munmap(). Or if
> > this
> > can't be guaranteed by userspace, maybe add some way for userspace to
> > wait
> > until it's ok to convert to private? E.g. so that get_pfn() doesn't
> > need
> > to do an expensive check every time.
> >
> > static int memfd_restricted_mmap(struct file *file, struct vm_area_struct
> > *vma)
> > {
> > if (vma->vm_pgoff)
> > return -EINVAL;
> >
> > if ((vma->vm_end - vma->vm_start) != <file size>)
> > return -EINVAL;
> >
> > mutex_lock(&data->lock);
> >
> > if (data->has_mapping) {
> > r = -EINVAL;
> > goto err;
> > }
> > list_for_each_entry(notifier, &data->notifiers, list) {
> > r = notifier->ops->mmap_start(notifier, ...);
> > if (r)
> > goto abort;
> > }
> >
> > notifier->ops->mmap_end(notifier, ...);
> > mutex_unlock(&data->lock);
> > return 0;
> >
> > abort:
> > list_for_each_entry_continue_reverse(notifier &data->notifiers,
> > list)
> > notifier->ops->mmap_abort(notifier, ...);
> > err:
> > mutex_unlock(&data->lock);
> > return r;
> > }
> >
> > static void memfd_restricted_close(struct vm_area_struct *vma)
> > {
> > mutex_lock(...);
> >
> > /*
> > * Destroy the memfd and disable all future accesses if there are
> > * outstanding refcounts (or other unsatisfied restrictions?).
> > */
> > if (<outstanding references> || ???)
> > memfd_restricted_destroy(...);
> > else
> > data->has_mapping = false;
> >
> > mutex_unlock(...);
> > }
> >
> > static int memfd_restricted_may_split(struct vm_area_struct *area,
> > unsigned long addr)
> > {
> > return -EINVAL;
> > }
> >
> > static int memfd_restricted_mapping_mremap(struct vm_area_struct *new_vma)
> > {
> > return -EINVAL;
> > }
> >
> > Then on the KVM side, its mmap_start() + mmap_end() sequence would:
> >
> > 1. Not be supported for TDX or SEV-SNP because they don't allow adding
> > non-zero
> > memory into the guest (after pre-boot phase).
> >
> > 2. Be mutually exclusive with shared<=>private conversions, and is
> > allowed if
> > and only if the entire gfn range of the associated memslot is shared.
>
> In general I think that this would work with pKVM. However, limiting
> private<->shared conversions to the granularity of a whole memslot
> might be difficult to handle in pKVM, since the guest doesn't have the
> concept of memslots. For example, in pKVM right now, when a guest
> shares back its restricted DMA pool with the host it does so at the
> page-level. pKVM would also need a way to make an fd accessible again
> when shared back, which I think isn't possible with this patch.
But does pKVM really want to mmap/munmap a new region at the page-level,
that can cause VMA fragmentation if the conversion is frequent as I see.
Even with a KVM ioctl for mapping as mentioned below, I think there will
be the same issue.
>
> You were initially considering a KVM ioctl for mapping, which might be
> better suited for this since KVM knows which pages are shared and
> which ones are private. So routing things through KVM might simplify
> things and allow it to enforce all the necessary restrictions (e.g.,
> private memory cannot be mapped). What do you think?
>
> Thanks,
> /fuad
- [PATCH v8 0/8] KVM: mm: fd-based approach for supporting KVM, Chao Peng, 2022/09/15
- [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd, Chao Peng, 2022/09/15
- Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd, David Hildenbrand, 2022/09/19
- Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd, Fuad Tabba, 2022/09/23
- Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd,
Chao Peng <=
- Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd, Fuad Tabba, 2022/09/26
- Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd, Sean Christopherson, 2022/09/27
- Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd, Fuad Tabba, 2022/09/30
- Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd, Kirill A . Shutemov, 2022/09/22
- Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd, David Hildenbrand, 2022/09/26
- Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd, Kirill A. Shutemov, 2022/09/26
- Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd, David Hildenbrand, 2022/09/26
- Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd, Sean Christopherson, 2022/09/27
- Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd, Kirill A. Shutemov, 2022/09/28
RE: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd, Wang, Wei W, 2022/09/22