qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH][RESEND v3 1/3] hapvdimm: add a virtual DIMM device for memor


From: David Hildenbrand
Subject: Re: [PATCH][RESEND v3 1/3] hapvdimm: add a virtual DIMM device for memory hot-add protocols
Date: Tue, 28 Feb 2023 23:12:25 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.8.0

On 28.02.23 22:27, Maciej S. Szmigiero wrote:
On 28.02.2023 16:02, David Hildenbrand wrote:

That was more or less the approach that v1 of this driver took:
The QEMU manager inserted virtual DIMMs (Hyper-V DM memory devices,
whatever one calls them) explicitly via the machine hotplug handler
(using the device_add command).

At that time you said [1] that:
1) I dislike that an external entity has to do vDIMM adaptions /
ballooning adaptions when rebooting or when wanting to resize a guest.

because:
Once you have the current approach upstream (vDIMMs, ballooning),
there is no easy way to change that later (requires deprecating, etc.).

That's why this version hides these vDIMMs.

Note that I don't have really strong feelings about letting the user hotplug 
devices. My comment was in general about user interactions when adding/removing 
memory or when rebooting the VM. As soon as you use individual memory blocks 
and/or devices, we end up with a similar user experience as we have already 
with DIMMS+virtio-balloon (bad IMHO).

Hiding the devices internally might make it a little bit easier to use, but 
it's still the same underlying concept: to add more memory you have to figure 
out whether to deflate the balloon or whether to add a new memory backend.

Well, the logic here is pretty simple: deflate the balloon first
(including deflating it by zero bytes if not inflated), then, if any
memory size remains to add, hot-add the reminder.


Yes, but if you have 1 GiB deflated and want to add 2 GiB, things are already getting more involved if you get what I mean.

I was going through the exact same model back when I was designing virtio-mem, and eventually added with a way where you can just tell QEMU the requested size an be done with it.

We can't get rid of ballooning altogether because otherwise going
below the boot memory size wouldn't be possible.

Right, more on that below.


What memory backends will remain when we reboot?

In this driver version, none will remain inserted
(virtio-mem also seems to unplug all blocks unconditionally when the
VM is rebooted).


There is a very important difference: virtio-mem only temporarily unplugs that memory. As the guest boots up it re-adds the requested amount of memory without any user interaction. That was added for two main reasons

(a) We can easily defragment the virtio-mem device that way.
(b) If the rebooted guest doesn't load the virtio-mem driver, it
    wouldn't be able to make use of that memory. Like, rebooting into
    Windows right now ;)

So if you hotplugged some memory using virtio-mem and reboot, that memory will automatically be re-added.

In version 1, all memory backeds were re-inserted once the guest
re-connected to the DM protocol after a reboot.

As I wrote in my response to Daniel moments ago, there are some issues
with automatic re-insertion if the guest never re-connects to the DM
protocol - that's why I've removed this functionality from this
driver version.

I think we might be able to to better, but that's just my idea how it could look like. I'll describe it below.

[...]

However, I'm not sure what is exactly gained by this approach.

These sub-devices still need to implement the TYPE_MEMORY_DEVICE interface

No, they wouldn't unless I am missing something. Only the hv-balloon device 
would be a TYPE_MEMORY_DEVICE.
In case of virtio-mem if one wants to add even more memory than the
"current" backing memory device allows there's always a possibility of
adding yet another virtio-mem-pci device with an additional backing
memory device.

We could, but that's not the way I envision virtio-mem. The thing is, already when starting QEMU we have to make decisions about the maximum VM size when setting the maxmem option. Consequently, we cannot grow a VM until infinity, we already have to plan ahead to some degree.

So what my goal is with virito-mem, is the following (it already works, we just have to work on reduction of metadata and memory overcommit handling -- mostly internal optimizations):

qemu-kvm ... \
-m 4G,maxmem=1048G \
-object memory-backend-ram,id=mem0,size=1T, ... \
-device virtio-mem-pci,id=vmem0,memdev=mem0,requested-size=0

So we an grow the guest up to 1T if we like. There is no way we could add more memory to that VM because we're already hitting the limit of maxmem.

It gets more complicated with multiple NUMA nodes, NVDIMMS, etc, but the main goal is to make it possible to have the maximum size be ridiculously large (while optimizing it internally!) that one doesn't have to even worry about adding a new device.

I think the same model would work for hv as well, at least with my limited knowledge about it ;)


If there would be just the main hv-balloon device (implementing
TYPE_MEMORY_DEVICE) then this would not be possible, since one can't
have multiple DM VMBus devices.

Hence, intermediate sub-devices are necessary (each one implementing
TYPE_MEMORY_DEVICE), which do not sit on the VMBus, in order to allow
adding new backing memory devices (as virtio-mem allows).

Not necessarily, I think, as discussed.


so they are accounted for properly (the alternative would be to patch
the relevant QEMU code all over the place - that's probably why
virtio-mem also implements this interface instead).

Please elaborate, I don't understand what you are trying to say here. Memory 
devices provide hooks, and the hooks exist for a reason -- because memory 
devices are no longer simple DIMMs/NVDIMMs. And virtio-mem + virtio-omem was 
responsible for adding some of these hooks.

I was referring to the necessity of implementing TYPE_MEMORY_DEVICE at
all in hv-balloon driver - if it didn't implement this interface then it
couldn't benefit from the logic in hw/mem/memory-device.c, so it would
need to be open-coded inside the driver and every call to functions
provided by that file from QEMU would need to be patched to account for
the memory provided by this driver.

Ah, yes, one device has to be a memory device. I was just asking if you really need multiple ones.




One still needs some QMP command to add a raw memory backend to
the chosen "container" hv-balloon sub-device.

If you go with multiple memory backends, yes.


Since now the QEMU manager (user) is aware of the presence of these
"container" sub-devices, and has to manage them, changing the QEMU
interface in the future is more complex (as you said in [1]).>
Can you elaborate? Yes, when you design the feature around "multiple memory 
backends", you'll have to have an interface to add such. Well, and to query them 
during migration. And, maybe also to detect when to remove some (migration)?


As I wrote above, multiple backing memory devices are necessary so the
guest can be expanded above the initially provided backing memory device,
much like virtio-mem already allows.

And then you have to either:
1) Let the hv-balloon driver transparently manage the lifetime of these
sub-devices, like this version of the patch set does, OR:

2) Make the QEMU manager (user) insert and remove these sub-devices
explicitly, like the version 1 of this driver did.

Let's me raise this idea:

qemu-kvm ... \
-m 4G,maxmem=1048G \
-object memory-backend-ram,id=mem0,size=1T, ... \
-device hv-balloon,id=vmem0,memdev=mem0

We'd do the same internal optimizations as we're doing (and the ones I am working on) for virtio-mem.

The above would result in a VM with 4G. With virtio-mem, we resize devices, with the balloon, you resize the logical VM size.

So the single (existing?) user interface would be the existing balloon cmd. Note that we set the logical VM size here, not the size of the balloon.

info balloon -> 4G
balloon 2G [will inflate]
info balloon -> 2G
balloon 128G [will deflate, then hotplug]
info balloon -> 128G
balloon 8G [will deflate]
info balloon -> 8G
...

How memory is added (deflate first, then expose some new memory via the memdev, ...) is left to the hv-balloon device, the user doesn't have to bother. We set the logical VM size and hv-balloon will do it's thing to eventually reach that goal.

Reboot? Logically unplug all memory and as the guest boots up, re-add the memory after the guest booted up.

The only thing we can't do is the following: when going below 4G, we cannot resize boot memory.


But I recall that that's *exactly* how the HV version I played with ~2 years ago worked: always start up with some initial memory ("startup memory"). After the VM is up for some seconds, we either add more memory (requested > startup) or request the VM to inflate memory (requested < startup).


Even migration could eventually be fairly simple, because virtio-mem already solved it to some degree. The only catch is, that for boot memory, we'd also have to detect discarded ranges. But that would be something to think about in the future.

--
Thanks,

David / dhildenb




reply via email to

[Prev in Thread] Current Thread [Next in Thread]