qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v1 1/2] vhost: Defer filtering memory sections until building


From: David Hildenbrand
Subject: Re: [PATCH v1 1/2] vhost: Defer filtering memory sections until building the vhost memory structure
Date: Tue, 7 Mar 2023 13:46:36 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.8.0

On 07.03.23 11:51, Igor Mammedov wrote:
On Thu, 16 Feb 2023 12:47:51 +0100
David Hildenbrand <david@redhat.com> wrote:

Having multiple devices, some filtering memslots and some not filtering
memslots, messes up the "used_memslot" accounting. If we'd have a device
the filters out less memory sections after a device that filters out more,
we'd be in trouble, because our memslot checks stop working reliably.
For example, hotplugging a device that filters out less memslots might end
up passing the checks based on max vs. used memslots, but can run out of
memslots when getting notified about all memory sections.

an hypothetical example of such case would be appreciated
(I don't really get how above can happen, perhaps more detailed explanation
would help)

Thanks for asking! AFAIKT, it's mostly about hot-adding first a vhost devices
that filters (and messes up used_memslots), and then messing with memslots that
get filtered out,

$ sudo rmmod vhost
$ sudo modprobe vhost max_mem_regions=4

// startup guest with virtio-net device
...

// hotplug a NVDIMM, resulting in used_memslots=4
echo "object_add memory-backend-ram,id=mem0,size=128M" | sudo nc -U /var/tmp/mon_src; 
echo ""
echo "device_add nvdimm,id=nvdimm0,memdev=mem0" | sudo nc -U /var/tmp/mon_src

// hotplug vhost-user device that overwrites "used_memslots=3"
echo "device_add 
vhost-user-fs-pci,queue-size=1024,chardev=char0,tag=myfs,bus=root" | sudo nc -U 
/var/tmp/mon_src

// hotplug another NVDIMM
echo "object_add memory-backend-ram,id=mem1,size=128M" | sudo nc -U /var/tmp/mon_src; 
echo ""
echo "device_add pc-dimm,id=nvdimm1,memdev=mem1" | sudo nc -U /var/tmp/mon_src

// vvhost will fail to update the memslots
vhost_set_mem_table failed: Argument list too long (7)


So we tricked used_memslots to be smaller than it actually has to be, because
we're ignoring the memslots filtered out by the vhost-user device.


Now, this is all far from relevant in practice as of now I think, and
usually would indicate user errors already (memory that's not shared with
vhost-user?).

It might gets more relevant when virtio-mem dynamically adds/removes memslots 
and
relies on precise tracking of used vs. free memslots.


But maybe I should just ignore that case and live a happy life instead, it's
certainly hard to even trigger right now :)

Further, it will be helpful in memory device context in the near future
to know that a RAM memory region section will consume a memslot, and be
accounted for in the used vs. free memslots, such that we can implement
reservation of memslots for memory devices properly. Whether a device
filters this out and would theoretically still have a free memslot is
then hidden internally, making overall vhost memslot accounting easier.

Let's filter the memslots when creating the vhost memory array,
accounting all RAM && !ROM memory regions as "used_memslots" even if
vhost_user isn't interested in anonymous RAM regions, because it needs
an fd.

When a device actually filters out regions (which should happen rarely
in practice), we might detect a layout change although only filtered
regions changed. We won't bother about optimizing that for now.

Note: we cannot simply filter out the region and count them as
"filtered" to add them to used, because filtered regions could get
merged and result in a smaller effective number of memslots. Further,
we won't touch the hmp/qmp virtio introspection output.
What output exactly you are talking about?

hw/virtio/virtio-qmp.c:qmp_x_query_virtio_status

Prints hdev->n_mem_sections and hdev->n_tmp_sections. I won't be
touching that (debug) output.


PS:
If we drop vhost_dev::memm the bulk of this patch would go away

Yes, unfortunately we can't I think.


side questions:
do we have MemorySection merging on qemu's kvm side?

Yes, we properly merge in flatview_simplify(). It's all about handling holes
in huge pages IIUC.

also does KVM merge merge memslots?

No, for good reasons not. Mapping more than we're instructed to map via a 
notifier
sounds is kind-of hacky already. But I guess there is no easy way around it 
(e.g., if
mapping that part of memory doesn't work, we'd have to bounce the reads/writes
through QEMU instead).

--
Thanks,

David / dhildenb




reply via email to

[Prev in Thread] Current Thread [Next in Thread]