qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: NVME hotplug support ?


From: Klaus Jensen
Subject: Re: NVME hotplug support ?
Date: Mon, 29 Jan 2024 14:37:28 +0100

On Jan 29 14:13, Damien Hedde wrote:
> 
> 
> On 1/24/24 08:47, Hannes Reinecke wrote:
> > On 1/24/24 07:52, Philippe Mathieu-Daudé wrote:
> > > Hi Hannes,
> > > 
> > > [+Markus as QOM/QDev rubber duck]
> > > 
> > > On 23/1/24 13:40, Hannes Reinecke wrote:
> > > > On 1/23/24 11:59, Damien Hedde wrote:
> > > > > Hi all,
> > > > > 
> > > > > We are currently looking into hotplugging nvme devices and
> > > > > it is currently not possible:
> > > > > When nvme was introduced 2 years ago, the feature was disabled.
> > > > > > commit cc6fb6bc506e6c47ed604fcb7b7413dff0b7d845
> > > > > > Author: Klaus Jensen
> > > > > > Date:   Tue Jul 6 10:48:40 2021 +0200
> > > > > > 
> > > > > >     hw/nvme: mark nvme-subsys non-hotpluggable
> > > > > >     We currently lack the infrastructure to handle
> > > > > > subsystem hotplugging, so
> > > > > >     disable it.
> > > > > 
> > > > > Do someone know what's lacking or anyone have some tips/idea
> > > > > of what we should develop to add the support ?
> > > > > 
> > > > Problem is that the object model is messed up. In qemu
> > > > namespaces are attached to controllers, which in turn are
> > > > children of the PCI device.
> > > > There are subsystems, but these just reference the controller.
> > > > 
> > > > So if you hotunplug the PCI device you detach/destroy the
> > > > controller and detach the namespaces from the controller.
> > > > But if you hotplug the PCI device again the NVMe controller will
> > > > be attached to the PCI device, but the namespace are still be
> > > > detached.
> > > > 
> > > > Klaus said he was going to fix that, and I dimly remember some patches
> > > > floating around. But apparently it never went anywhere.
> > > > 
> > > > Fundamental problem is that the NVMe hierarchy as per spec is
> > > > incompatible with the qemu object model; qemu requires a strict
> > > > tree model where every object has exactly _one_ parent.
> > > 
> > > The modelling problem is not clear to me.
> > > Do you have an example of how the NVMe hierarchy should be?
> > > 
> > Sure.
> > 
> > As per NVMe spec we have this hierarchy:
> > 
> >       --->  subsys ---
> >      |                |
> >      |                V
> > controller      namespaces
> > 
> > There can be several controllers, and several
> > namespaces.
> > The initiator (ie the linux 'nvme' driver) connects
> > to a controller, queries the subsystem for the attached
> > namespaces, and presents each namespace as a block device.
> > 
> > For Qemu we have the problem that every device _must_ be
> > a direct descendant of the parent (expressed by the fact
> > that each 'parent' object is embedded in the device object).
> > 
> > So if we were to present a NVMe PCI device, the controller
> > must be derived from the PCI device:
> > 
> > pci -> controller
> > 
> > but now we have to express the NVMe hierarchy, too:
> > 
> > pci -> ctrl1 -> subsys1 -> namespace1
> > 
> > which actually works.
> > We can easily attach several namespaces:
> > 
> > pci -> ctrl1 ->subsys1 -> namespace2
> > 
> > For a single controller and a single subsystem.
> > However, as mentioned above, there can be _several_
> > controllers attached to the same subsystem.
> > So we can express the second controller:
> > 
> > pci -> ctrl2
> > 
> > but we cannot attach the controller to 'subsys1'
> > as then 'subsys1' would need to be derived from
> > 'ctrl2', and not (as it is now) from 'ctrl1'.
> > 
> > The most logical step would be to have 'subsystems'
> > their own entity, independent of any controllers.
> > But then the block devices (which are derived from
> > the namespaces) could not be traced back
> > to the PCI device, and a PCI hotplug would not
> > 'automatically' disconnect the nvme block devices.
> > 
> > Plus the subsystem would be independent from the NVMe
> > PCI devices, so you could have a subsystem with
> > no controllers attached. And one would wonder who
> > should be responsible for cleaning up that.
> > 
> 
> Thanks for the details !
> 
> My use case is the simple one with no nvme subsystem/namespaces:
> - hotplug a pci nvme device (nvme controller) as in the following CLI (which
> automatically put the drive into a default namespace)
> 
> ./qemu-system-aarch64 -nographic -M virt \
>    -drive file=nvme0.disk,if=none,id=nvme-drive0 \
>    -device nvme,serial=nvme0,id=nvmedev0,drive=nvme-drive0
> 

AFAIK, you just need a pci root port to plug the device into.

  -drive file=nvme0.disk,if=none,id=nvme-drive0 \
  -device "pcie-root-port,id=pcie_root_port0,chassis=1,slot=0" \
  -device nvme,serial=nvme0,id=nvmedev0,drive=nvme-drive0

Then, you can use the qemu monitor to `device_del nvmedev0` and add it
with `device_add nvme,serial=nvme0,id=nvmedev0,drive=nvme-drive0`. The
"drive" (blockdev) will stick around after the device_del.

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]