qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v10 0/9] KVM: mm: fd-based approach for supporting KVM


From: Kirill A. Shutemov
Subject: Re: [PATCH v10 0/9] KVM: mm: fd-based approach for supporting KVM
Date: Thu, 13 Apr 2023 19:04:05 +0300

On Wed, Apr 12, 2023 at 06:07:28PM -0700, Sean Christopherson wrote:
> On Wed, Jan 25, 2023, Kirill A. Shutemov wrote:
> > On Wed, Jan 25, 2023 at 12:20:26AM +0000, Sean Christopherson wrote:
> > > On Tue, Jan 24, 2023, Liam Merwick wrote:
> > > > On 14/01/2023 00:37, Sean Christopherson wrote:
> > > > > On Fri, Dec 02, 2022, Chao Peng wrote:
> > > > > > This patch series implements KVM guest private memory for 
> > > > > > confidential
> > > > > > computing scenarios like Intel TDX[1]. If a TDX host accesses
> > > > > > TDX-protected guest memory, machine check can happen which can 
> > > > > > further
> > > > > > crash the running host system, this is terrible for multi-tenant
> > > > > > configurations. The host accesses include those from KVM userspace 
> > > > > > like
> > > > > > QEMU. This series addresses KVM userspace induced crash by 
> > > > > > introducing
> > > > > > new mm and KVM interfaces so KVM userspace can still manage guest 
> > > > > > memory
> > > > > > via a fd-based approach, but it can never access the guest memory
> > > > > > content.
> > > > > > 
> > > > > > The patch series touches both core mm and KVM code. I appreciate
> > > > > > Andrew/Hugh and Paolo/Sean can review and pick these patches. Any 
> > > > > > other
> > > > > > reviews are always welcome.
> > > > > >    - 01: mm change, target for mm tree
> > > > > >    - 02-09: KVM change, target for KVM tree
> > > > > 
> > > > > A version with all of my feedback, plus reworked versions of Vishal's 
> > > > > selftest,
> > > > > is available here:
> > > > > 
> > > > >    git@github.com:sean-jc/linux.git x86/upm_base_support
> > > > > 
> > > > > It compiles and passes the selftest, but it's otherwise barely 
> > > > > tested.  There are
> > > > > a few todos (2 I think?) and many of the commits need changelogs, 
> > > > > i.e. it's still
> > > > > a WIP.
> > > > > 
> > > > 
> > > > When running LTP (https://github.com/linux-test-project/ltp) on the v10
> > > > bits (and also with Sean's branch above) I encounter the following NULL
> > > > pointer dereference with testcases/kernel/syscalls/madvise/madvise01
> > > > (100% reproducible).
> > > > 
> > > > It appears that in restrictedmem_error_page()
> > > > inode->i_mapping->private_data is NULL in the
> > > > list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list) but I
> > > > don't know why.
> > > 
> > > Kirill, can you take a look?  Or pass the buck to someone who can? :-)
> > 
> > The patch below should help.
> > 
> > diff --git a/mm/restrictedmem.c b/mm/restrictedmem.c
> > index 15c52301eeb9..39ada985c7c0 100644
> > --- a/mm/restrictedmem.c
> > +++ b/mm/restrictedmem.c
> > @@ -307,14 +307,29 @@ void restrictedmem_error_page(struct page *page, 
> > struct address_space *mapping)
> >  
> >     spin_lock(&sb->s_inode_list_lock);
> >     list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list) {
> > -           struct restrictedmem *rm = inode->i_mapping->private_data;
> >             struct restrictedmem_notifier *notifier;
> > -           struct file *memfd = rm->memfd;
> > +           struct restrictedmem *rm;
> >             unsigned long index;
> > +           struct file *memfd;
> >  
> > -           if (memfd->f_mapping != mapping)
> > +           if (atomic_read(&inode->i_count))
> 
> Kirill, should this be
> 
>               if (!atomic_read(&inode->i_count))
>                       continue;
> 
> i.e. skip unreferenced inodes, not skip referenced inodes?

Ouch. Yes.

But looking at other instances of s_inodes usage, I think we can drop the
check altogether. inode cannot be completely free until it is removed from
s_inodes list.

While there, replace list_for_each_entry_safe() with
list_for_each_entry() as we don't remove anything from the list.

diff --git a/mm/restrictedmem.c b/mm/restrictedmem.c
index 55e99e6c09a1..8e8a4420d3d1 100644
--- a/mm/restrictedmem.c
+++ b/mm/restrictedmem.c
@@ -194,22 +194,19 @@ static int restricted_error_remove_page(struct 
address_space *mapping,
                                        struct page *page)
 {
        struct super_block *sb = restrictedmem_mnt->mnt_sb;
-       struct inode *inode, *next;
+       struct inode *inode;
        pgoff_t start, end;
 
        start = page->index;
        end = start + thp_nr_pages(page);
 
        spin_lock(&sb->s_inode_list_lock);
-       list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list) {
+       list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
                struct restrictedmem_notifier *notifier;
                struct restrictedmem *rm;
                unsigned long index;
                struct file *memfd;
 
-               if (atomic_read(&inode->i_count))
-                       continue;
-
                spin_lock(&inode->i_lock);
                if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE)) {
                        spin_unlock(&inode->i_lock);
-- 
  Kiryl Shutsemau / Kirill A. Shutemov



reply via email to

[Prev in Thread] Current Thread [Next in Thread]