qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 1/5] nvme: PCI/e configuration from specificatio


From: Gersner
Subject: Re: [Qemu-devel] [PATCH 1/5] nvme: PCI/e configuration from specification
Date: Wed, 12 Sep 2018 22:53:25 +0300

Hi Daniel,

Sorry for the long round-trips, we had a busy month. We have implemented
all the changes. Waiting for a final clarification.

Should the new patches be posted on this thread or a new one?

Thanks for you time.

Gersner.

On Thu, Aug 30, 2018 at 6:45 PM Daniel Verkamp <address@hidden> wrote:

> Hi Shimi,
>
> On Sun, Aug 26, 2018 at 2:50 PM Gersner <address@hidden> wrote:
> >
> > Hi Daniel,
> > Thanks for taking a look. Comments are inline.
> >
> > Gersner.
> >
> > On Sun, Jul 15, 2018 at 9:21 AM Daniel Verkamp <address@hidden> wrote:
> >>
> >> On Fri, Jun 22, 2018 at 4:22 AM, Shimi Gersner <address@hidden>
> wrote:
> >> > PCI/e configuration currently does not meets specifications.
> >> >
> >> > Patch includes various configuration changes to support specifications
> >> > - BAR2 to return zero when read and CMD.IOSE is not set.
> >> > - Expose NVME configuration through IO space (Optional).
> >> > - PCI Power Management v1.2.
> >> > - PCIe Function Level Reset.
> >> > - Disable QEMUs default use of PCIe Link Status (DLLLA).
> >> > - PCIe missing AOC compliance flag.
> >> > - Mask PCIe End-to-End TLP as RO (Unspecified by specification).
> >> [...]
> >> >      n->num_namespaces = 1;
> >> >      n->num_queues = 64;
> >> > -    n->reg_size = pow2ceil(0x1004 + 2 * (n->num_queues + 1) * 4);
> >> > +    n->reg_size = pow2ceil(0x2000 + 2 * (n->num_queues + 1) * 4);
> >>
> >> The existing math looks OK to me (maybe already 4 bytes larger than
> >> necessary?). The controller registers should be 0x1000 bytes for the
> >> fixed register set, then 2 * 4 bytes for each doorbell pair (+ 1 is
> >> for the admin queue, and CAP.DSTRD is set to 0, so there is no extra
> >> padding between queues). The result is also rounded up to a power of
> >> two, so the result will typically be 8 KB.  What is the rationale for
> >> this change?
> >
> > You are correct, this change shouldn't be here. It was made during tests
> against the
> > nvme compliance which failed here.
>
> If the NVMe compliance test requires it, that is probably sufficient
> reason to change it - although it would be interesting to hear their
> rationale.
>
> > The specification states that bits 0 to 13 are RO, which implicitly
> suggests
> > that the minimal size of this BAR should be at least 16K as this is a
> standard
> > way to determine the BAR size on PCI (AFAIK). On the contrary it doesn't
> > fit very well with the memory mapped laid out on 3.1 Register Definition
> > of the 1.3b nvme spec. Any idea?
>
> You're right, the MLBAR section of the NVMe spec does seem to indicate
> that up to bit 13 should be reserved/RO.  This is also probably a good
> enough reason to make the change, as long as this reason is provided.
>
> >
> > Additionally it does look like it has an extra 4 bytes.
> >
> >>
> >>
> >> >      n->ns_size = bs_size / (uint64_t)n->num_namespaces;
> >> >
> >> >      n->namespaces = g_new0(NvmeNamespace, n->num_namespaces);
> >> > @@ -1245,6 +1295,10 @@ static void nvme_realize(PCIDevice *pci_dev,
> Error **errp)
> >> >      pci_register_bar(&n->parent_obj, 0,
> >> >          PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_64,
> >> >          &n->iomem);
> >> > +
> >> > +    // Expose the NVME memory through Address Space IO (Optional by
> spec)
> >> > +    pci_register_bar(&n->parent_obj, 2, PCI_BASE_ADDRESS_SPACE_IO,
> &n->iomem);
> >>
> >> This looks like it will register the whole 4KB+ NVMe register set
> >> (n->iomem) as an I/O space BAR, but this doesn't match the spec (see
> >> NVMe 1.3c section 3.2, "Index/Data Pair registers"). The I/O BAR (if
> >> implemented) should just contain two 4-byte registers, Index and Data,
> >> that can be used to indirectly access the NVMe register set.  (Just
> >> for curiosity, do you know of any software that uses this feature? It
> >> could be useful for testing the implementation.)
> >
> > Very nice catch. We indeed totally miss-interpreted the specification
> here.
> >
> > We use the compliance suit for sanity, but it doesn't actually use this
> feature,
> > just verifies the configuration of the registers.
> >
> > We will go with rolling back this feature as this is optional.
>
> This is probably the right move; I don't know of any real hardware
> devices that implement it (and therefore I doubt any software will
> require it).  However, it should not be too difficult to add an
> implementation of this feature; if you are interested, take a look at
> the ahci_idp_read/ahci_idp_write functions - the NVMe index/data pair
> should be very similar.
>
Nice, It does seem easy to implement. We decided to go without due to lack
of real-world software we could test against the new facility.


>
> > Question - I'm having hard time to determine from the specification - As
> > this is optional, how device drivers determine if it is available? Does
> it
> > simply the CMD.IOSE flag from the PCI? And if so, what is
> > the standard way in QEMU to disable the flag completely?
>
> Based on the wording in NVMe 1.3 section 3.2, it seems like the only
> indication of support for Index/Data Pair is whether BAR2 is filled
> out.  I believe PCI defines unused BARs to be all 0s, so software
> should be able to use this to determine whether the feature is
> provided by a particular NVMe PCIe device, and removing the
> pci_register_bar() call should be enough to accomplish this from the
> QEMU side.
>
Agree, that was indeed what I also understood from the spec. On the contrary
for some reason the compliance
<https://github.com/nvmecompliance/tnvme/blob/30a19c6829b571b2c7bdf854daf0351bfe038032/GrpPciRegisters/allPciRegs_r10b.cpp#L382>
tests assume that IOSE==1 means
that the Index/Data pair is enabled. They probably had a good reason.
I'm not even sure who turns the IOSE flag up, I found no relevant code in
QEMU.


>
> Thanks,
> -- Daniel
>


reply via email to

[Prev in Thread] Current Thread [Next in Thread]