qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: NVME hotplug support ?


From: Damien Hedde
Subject: Re: NVME hotplug support ?
Date: Mon, 5 Feb 2024 14:33:30 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.11.0


On 1/29/24 16:35, Hannes Reinecke wrote:
On 1/29/24 14:13, Damien Hedde wrote:


On 1/24/24 08:47, Hannes Reinecke wrote:
On 1/24/24 07:52, Philippe Mathieu-Daudé wrote:
Hi Hannes,

[+Markus as QOM/QDev rubber duck]

On 23/1/24 13:40, Hannes Reinecke wrote:
On 1/23/24 11:59, Damien Hedde wrote:
Hi all,

We are currently looking into hotplugging nvme devices and it is currently not possible:
When nvme was introduced 2 years ago, the feature was disabled.
commit cc6fb6bc506e6c47ed604fcb7b7413dff0b7d845
Author: Klaus Jensen
Date:   Tue Jul 6 10:48:40 2021 +0200

    hw/nvme: mark nvme-subsys non-hotpluggable
    We currently lack the infrastructure to handle subsystem hotplugging, so
    disable it.

Do someone know what's lacking or anyone have some tips/idea of what we should develop to add the support ?

Problem is that the object model is messed up. In qemu namespaces are attached to controllers, which in turn are children of the PCI device.
There are subsystems, but these just reference the controller.

So if you hotunplug the PCI device you detach/destroy the controller and detach the namespaces from the controller. But if you hotplug the PCI device again the NVMe controller will be attached to the PCI device, but the namespace are still be detached.

Klaus said he was going to fix that, and I dimly remember some patches
floating around. But apparently it never went anywhere.

Fundamental problem is that the NVMe hierarchy as per spec is incompatible with the qemu object model; qemu requires a strict
tree model where every object has exactly _one_ parent.

The modelling problem is not clear to me.
Do you have an example of how the NVMe hierarchy should be?

Sure.

As per NVMe spec we have this hierarchy:

      --->  subsys ---
     |                |
     |                V
controller      namespaces

There can be several controllers, and several
namespaces.
The initiator (ie the linux 'nvme' driver) connects
to a controller, queries the subsystem for the attached
namespaces, and presents each namespace as a block device.

For Qemu we have the problem that every device _must_ be
a direct descendant of the parent (expressed by the fact
that each 'parent' object is embedded in the device object).

So if we were to present a NVMe PCI device, the controller
must be derived from the PCI device:

pci -> controller

but now we have to express the NVMe hierarchy, too:

pci -> ctrl1 -> subsys1 -> namespace1

which actually works.
We can easily attach several namespaces:

pci -> ctrl1 ->subsys1 -> namespace2

For a single controller and a single subsystem.
However, as mentioned above, there can be _several_
controllers attached to the same subsystem.
So we can express the second controller:

pci -> ctrl2

but we cannot attach the controller to 'subsys1'
as then 'subsys1' would need to be derived from
'ctrl2', and not (as it is now) from 'ctrl1'.

The most logical step would be to have 'subsystems'
their own entity, independent of any controllers.
But then the block devices (which are derived from
the namespaces) could not be traced back
to the PCI device, and a PCI hotplug would not
'automatically' disconnect the nvme block devices.

Plus the subsystem would be independent from the NVMe
PCI devices, so you could have a subsystem with
no controllers attached. And one would wonder who
should be responsible for cleaning up that.


Thanks for the details !

My use case is the simple one with no nvme subsystem/namespaces:
- hotplug a pci nvme device (nvme controller) as in the following CLI (which automatically put the drive into a default namespace)

./qemu-system-aarch64 -nographic -M virt \
    -drive file=nvme0.disk,if=none,id=nvme-drive0 \
    -device nvme,serial=nvme0,id=nvmedev0,drive=nvme-drive0

In the simple tree approach where subsystems and namespaces are not shared by controllers. We could delete the whole nvme hiearchy under the controller while unplugging it ?

In your first message, you said
  > So if you hotunplug the PCI device you detach/destroy the controller
  > and detach the namespaces from the controller.
  > But if you hotplug the PCI device again the NVMe controller will be
  > attached to the PCI device, but the namespace are still be detached.

Do you mean that if we unplug the pci device we HAVE to keep some nvme objects so that if we plug the device back we can recover them ? Or just that it's hard to unplug nvme objects if they are not real qom children of pci device ?

Key point for trying on PCI hotplug with qemu is to attach the PCI device to it's own PCI root port. Cf the mail from Klaus Jensen for details.

Cheers,

Hannes

Thanks a lot from both of you. I missed that.

Damien








reply via email to

[Prev in Thread] Current Thread [Next in Thread]