[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] hw/misc: Add a virtual pci device to dynamically attach memo
From: |
david.dai |
Subject: |
Re: [PATCH] hw/misc: Add a virtual pci device to dynamically attach memory to QEMU |
Date: |
Sat, 9 Oct 2021 17:42:33 +0800 |
On Thu, Sep 30, 2021 at 12:33:30PM +0200, David Hildenbrand (david@redhat.com)
wrote:
>
>
> On 30.09.21 11:40, david.dai wrote:
> > On Wed, Sep 29, 2021 at 11:30:53AM +0200, David Hildenbrand
> > (david@redhat.com) wrote:
> > >
> > > On 27.09.21 14:28, david.dai wrote:
> > > > On Mon, Sep 27, 2021 at 11:07:43AM +0200, David Hildenbrand
> > > > (david@redhat.com) wrote:
> > > > >
> > > > > CAUTION: This email originated from outside of the organization. Do
> > > > > not
> > > > > click links or open attachments unless you recognize the sender and
> > > > > know the
> > > > > content is safe.
> > > > >
> > > > >
> > > > > On 27.09.21 10:27, Stefan Hajnoczi wrote:
> > > > > > On Sun, Sep 26, 2021 at 10:16:14AM +0800, David Dai wrote:
> > > > > > > Add a virtual pci to QEMU, the pci device is used to dynamically
> > > > > > > attach memory
> > > > > > > to VM, so driver in guest can apply host memory in fly without
> > > > > > > virtualization
> > > > > > > management software's help, such as libvirt/manager. The attached
> > > > > > > memory is
> > > > >
> > > > > We do have virtio-mem to dynamically attach memory to a VM. It could
> > > > > be
> > > > > extended by a mechanism for the VM to request more/less memory, that's
> > > > > already a planned feature. But yeah, virito-mem memory is exposed as
> > > > > ordinary system RAM, not only via a BAR to mostly be managed by user
> > > > > space
> > > > > completely.
> > >
> > > There is a virtio-pmem spec proposal to expose the memory region via a PCI
> > > BAR. We could do something similar for virtio-mem, however, we would have
> > > to
> > > wire that new model up differently in QEMU (it's no longer a "memory
> > > device"
> > > like a DIMM then).
> > >
> > > > >
> > > >
> > > > I wish virtio-mem can solve our problem, but it is a dynamic allocation
> > > > mechanism
> > > > for system RAM in virtualization. In heterogeneous computing
> > > > environments, the
> > > > attached memory usually comes from computing device, it should be
> > > > managed separately.
> > > > we doesn't hope Linux MM controls it.
> > >
> > > If that heterogeneous memory would have a dedicated node (which usually is
> > > the case IIRC) , and you let it manage by the Linux kernel (dax/kmem), you
> > > can bind the memory backend of virtio-mem to that special NUMA node. So
> > > all
> > > memory managed by that virtio-mem device would come from that
> > > heterogeneous
> > > memory.
> > >
> >
> > Yes, CXL type 2, 3 devices expose memory to host as a dedicated node, the
> > node
> > is marked as soft_reserved_memory, dax/kmem can take over the node to
> > create a
> > dax devcie. This dax device can be regarded as the memory backend of
> > virtio-mem
> >
> > I don't sure whether a dax device can be open by multiple VMs or host
> > applications.
>
> virito-mem currently relies on having a single sparse memory region (anon
> mmap, mmaped file, mmaped huge pages, mmap shmem) per VM. Although we can
> share memory with other processes, sharing with other VMs is not intended.
> Instead of actually mmaping parts dynamically (which can be quite
> expensive), virtio-mem relies on punching holes into the backend and
> dynamically allocating memory/file blocks/... on access.
>
> So the easy way to make it work is:
>
> a) Exposing the CXL memory to the buddy via dax/kmem, esulting in device
> memory getting managed by the buddy on a separate NUMA node.
>
Linux kernel buddy system? how to guarantee other applications don't apply
memory
from it
>
> b) (optional) allocate huge pages on that separate NUMA node.
> c) Use ordinary memory-device-ram or memory-device-memfd (for huge pages),
> *bidning* the memory backend to that special NUMA node.
>
"-object memory-backend/device-ram or memory-device-memfd, id=mem0, size=768G"
How to bind backend memory to NUMA node
>
> This will dynamically allocate memory from that special NUMA node, resulting
> in the virtio-mem device completely being backed by that device memory,
> being able to dynamically resize the memory allocation.
>
>
> Exposing an actual devdax to the virtio-mem device, shared by multiple VMs
> isn't really what we want and won't work without major design changes. Also,
> I'm not so sure it's a very clean design: exposing memory belonging to other
> VMs to unrelated QEMU processes. This sounds like a serious security hole:
> if you managed to escalate to the QEMU process from inside the VM, you can
> access unrelated VM memory quite happily. You want an abstraction
> in-between, that makes sure each VM/QEMU process only sees private memory:
> for example, the buddy via dax/kmem.
>
Hi David
Thanks for your suggestion, also sorry for my delayed reply due to my long
vacation.
How does current virtio-mem dynamically attach memory to guest, via page fault?
Thanks,
David
> --
> Thanks,
>
> David / dhildenb
>
>
- Re: [PATCH] hw/misc: Add a virtual pci device to dynamically attach memory to QEMU,
david.dai <=