qemu-stable
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v6 01/15] hw/nvme: Use pcie_sriov_num_vfs()


From: Kevin Wolf
Subject: Re: [PATCH v6 01/15] hw/nvme: Use pcie_sriov_num_vfs()
Date: Tue, 20 Feb 2024 15:53:45 +0100

Am 20.02.2024 um 15:29 hat Kevin Wolf geschrieben:
> Am 20.02.2024 um 13:24 hat Akihiko Odaki geschrieben:
> > nvme_sriov_pre_write_ctrl() used to directly inspect SR-IOV
> > configurations to know the number of VFs being disabled due to SR-IOV
> > configuration writes, but the logic was flawed and resulted in
> > out-of-bound memory access.
> > 
> > It assumed PCI_SRIOV_NUM_VF always has the number of currently enabled
> > VFs, but it actually doesn't in the following cases:
> > - PCI_SRIOV_NUM_VF has been set but PCI_SRIOV_CTRL_VFE has never been.
> > - PCI_SRIOV_NUM_VF was written after PCI_SRIOV_CTRL_VFE was set.
> > - VFs were only partially enabled because of realization failure.
> > 
> > It is a responsibility of pcie_sriov to interpret SR-IOV configurations
> > and pcie_sriov does it correctly, so use pcie_sriov_num_vfs(), which it
> > provides, to get the number of enabled VFs before and after SR-IOV
> > configuration writes.
> > 
> > Cc: qemu-stable@nongnu.org
> > Fixes: CVE-2024-26328
> > Fixes: 11871f53ef8e ("hw/nvme: Add support for the Virtualization 
> > Management command")
> > Suggested-by: Michael S. Tsirkin <mst@redhat.com>
> > Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> > ---
> >  hw/nvme/ctrl.c | 26 ++++++++------------------
> >  1 file changed, 8 insertions(+), 18 deletions(-)
> > 
> > diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
> > index f026245d1e9e..7a56e7b79b4d 100644
> > --- a/hw/nvme/ctrl.c
> > +++ b/hw/nvme/ctrl.c
> > @@ -8466,36 +8466,26 @@ static void nvme_pci_reset(DeviceState *qdev)
> >      nvme_ctrl_reset(n, NVME_RESET_FUNCTION);
> >  }
> >  
> > -static void nvme_sriov_pre_write_ctrl(PCIDevice *dev, uint32_t address,
> > -                                      uint32_t val, int len)
> > +static void nvme_sriov_post_write_config(PCIDevice *dev, uint16_t 
> > old_num_vfs)
> >  {
> >      NvmeCtrl *n = NVME(dev);
> >      NvmeSecCtrlEntry *sctrl;
> > -    uint16_t sriov_cap = dev->exp.sriov_cap;
> > -    uint32_t off = address - sriov_cap;
> > -    int i, num_vfs;
> > +    int i;
> >  
> > -    if (!sriov_cap) {
> > -        return;
> > -    }
> > -
> > -    if (range_covers_byte(off, len, PCI_SRIOV_CTRL)) {
> > -        if (!(val & PCI_SRIOV_CTRL_VFE)) {
> > -            num_vfs = pci_get_word(dev->config + sriov_cap + 
> > PCI_SRIOV_NUM_VF);
> > -            for (i = 0; i < num_vfs; i++) {
> > -                sctrl = &n->sec_ctrl_list.sec[i];
> > -                nvme_virt_set_state(n, le16_to_cpu(sctrl->scid), false);
> > -            }
> > -        }
> > +    for (i = pcie_sriov_num_vfs(dev); i < old_num_vfs; i++) {
> > +        sctrl = &n->sec_ctrl_list.sec[i];
> > +        nvme_virt_set_state(n, le16_to_cpu(sctrl->scid), false);
> >      }
> >  }
> 
> Maybe I'm missing something, but if the concern is that 'i' could run
> beyond the end of the array, I don't see anything that limits
> pcie_sriov_num_vfs() to the static size of 127 that n->sec_ctrl_list.sec
> has. register_vfs() seems to just take whatever 16 bit value the guest
> wrote without imposing additional restrictions.
> 
> If there is some mechanism that makes register_vf() fail if we exceed
> the limit, maybe an assertion with a comment would be in order because
> it doesn't seem obvious. I couldn't find any code that enforces it,
> sriov_max_vfs only ever seems to be used as a hint for the guest.
> 
> If not, do we need another check that fails gracefully in the error
> case?

Ok, I see now that patch 2 fixes this. But then the commit message is
wrong because it implies that this patch is the only thing you need to
fix the problem with nvme. You can't say "Fixes: CVE-2024-26328" if half
of the fix is still missing.

Maybe asserting old_num_vfs < n->sec_ctrl_list.numcntl would still be a
good idea. But looking at this one, it seems to me that numcntl isn't
completely correct either:

    list->numcntl = cpu_to_le16(max_vfs);

Both list->numcntl and max_vfs are uint8_t, so I think this will always
be 0 on big endian hosts?

Kevin




reply via email to

[Prev in Thread] Current Thread [Next in Thread]