qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [Qemu-devel] [PATCH 1/5] nvme: PCI/e configuration from


From: Daniel Verkamp
Subject: Re: [Qemu-block] [Qemu-devel] [PATCH 1/5] nvme: PCI/e configuration from specification
Date: Thu, 30 Aug 2018 08:45:05 -0700

Hi Shimi,

On Sun, Aug 26, 2018 at 2:50 PM Gersner <address@hidden> wrote:
>
> Hi Daniel,
> Thanks for taking a look. Comments are inline.
>
> Gersner.
>
> On Sun, Jul 15, 2018 at 9:21 AM Daniel Verkamp <address@hidden> wrote:
>>
>> On Fri, Jun 22, 2018 at 4:22 AM, Shimi Gersner <address@hidden> wrote:
>> > PCI/e configuration currently does not meets specifications.
>> >
>> > Patch includes various configuration changes to support specifications
>> > - BAR2 to return zero when read and CMD.IOSE is not set.
>> > - Expose NVME configuration through IO space (Optional).
>> > - PCI Power Management v1.2.
>> > - PCIe Function Level Reset.
>> > - Disable QEMUs default use of PCIe Link Status (DLLLA).
>> > - PCIe missing AOC compliance flag.
>> > - Mask PCIe End-to-End TLP as RO (Unspecified by specification).
>> [...]
>> >      n->num_namespaces = 1;
>> >      n->num_queues = 64;
>> > -    n->reg_size = pow2ceil(0x1004 + 2 * (n->num_queues + 1) * 4);
>> > +    n->reg_size = pow2ceil(0x2000 + 2 * (n->num_queues + 1) * 4);
>>
>> The existing math looks OK to me (maybe already 4 bytes larger than
>> necessary?). The controller registers should be 0x1000 bytes for the
>> fixed register set, then 2 * 4 bytes for each doorbell pair (+ 1 is
>> for the admin queue, and CAP.DSTRD is set to 0, so there is no extra
>> padding between queues). The result is also rounded up to a power of
>> two, so the result will typically be 8 KB.  What is the rationale for
>> this change?
>
> You are correct, this change shouldn't be here. It was made during tests 
> against the
> nvme compliance which failed here.

If the NVMe compliance test requires it, that is probably sufficient
reason to change it - although it would be interesting to hear their
rationale.

> The specification states that bits 0 to 13 are RO, which implicitly suggests
> that the minimal size of this BAR should be at least 16K as this is a standard
> way to determine the BAR size on PCI (AFAIK). On the contrary it doesn't
> fit very well with the memory mapped laid out on 3.1 Register Definition
> of the 1.3b nvme spec. Any idea?

You're right, the MLBAR section of the NVMe spec does seem to indicate
that up to bit 13 should be reserved/RO.  This is also probably a good
enough reason to make the change, as long as this reason is provided.

>
> Additionally it does look like it has an extra 4 bytes.
>
>>
>>
>> >      n->ns_size = bs_size / (uint64_t)n->num_namespaces;
>> >
>> >      n->namespaces = g_new0(NvmeNamespace, n->num_namespaces);
>> > @@ -1245,6 +1295,10 @@ static void nvme_realize(PCIDevice *pci_dev, Error 
>> > **errp)
>> >      pci_register_bar(&n->parent_obj, 0,
>> >          PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_64,
>> >          &n->iomem);
>> > +
>> > +    // Expose the NVME memory through Address Space IO (Optional by spec)
>> > +    pci_register_bar(&n->parent_obj, 2, PCI_BASE_ADDRESS_SPACE_IO, 
>> > &n->iomem);
>>
>> This looks like it will register the whole 4KB+ NVMe register set
>> (n->iomem) as an I/O space BAR, but this doesn't match the spec (see
>> NVMe 1.3c section 3.2, "Index/Data Pair registers"). The I/O BAR (if
>> implemented) should just contain two 4-byte registers, Index and Data,
>> that can be used to indirectly access the NVMe register set.  (Just
>> for curiosity, do you know of any software that uses this feature? It
>> could be useful for testing the implementation.)
>
> Very nice catch. We indeed totally miss-interpreted the specification here.
>
> We use the compliance suit for sanity, but it doesn't actually use this 
> feature,
> just verifies the configuration of the registers.
>
> We will go with rolling back this feature as this is optional.

This is probably the right move; I don't know of any real hardware
devices that implement it (and therefore I doubt any software will
require it).  However, it should not be too difficult to add an
implementation of this feature; if you are interested, take a look at
the ahci_idp_read/ahci_idp_write functions - the NVMe index/data pair
should be very similar.

> Question - I'm having hard time to determine from the specification - As
> this is optional, how device drivers determine if it is available? Does it
> simply the CMD.IOSE flag from the PCI? And if so, what is
> the standard way in QEMU to disable the flag completely?

Based on the wording in NVMe 1.3 section 3.2, it seems like the only
indication of support for Index/Data Pair is whether BAR2 is filled
out.  I believe PCI defines unused BARs to be all 0s, so software
should be able to use this to determine whether the feature is
provided by a particular NVMe PCIe device, and removing the
pci_register_bar() call should be enough to accomplish this from the
QEMU side.

Thanks,
-- Daniel



reply via email to

[Prev in Thread] Current Thread [Next in Thread]