qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 2/2] nvme: Add physical writes/reads from OCP log


From: Joel Granados
Subject: Re: [PATCH v3 2/2] nvme: Add physical writes/reads from OCP log
Date: Thu, 17 Nov 2022 16:20:59 +0100

On Thu, Nov 17, 2022 at 08:30:46AM +0100, Klaus Jensen wrote:
> On Nov 16 18:14, Joel Granados wrote:
> > In order to evaluate write amplification factor (WAF) within the storage
> > stack it is important to know the number of bytes written to the
> > controller. The existing SMART log value of Data Units Written is too
> > coarse (given in units of 500 Kb) and so we add the SMART health
> > information extended from the OCP specification (given in units of bytes)
> > 
> > We add a controller argument (ocp) that toggles on/off the SMART log
> > extended structure.  To accommodate different vendor specific specifications
> > like OCP, we add a multiplexing function (nvme_vendor_specific_log) which
> > will route to the different log functions based on arguments and log ids.
> > We only return the OCP extended SMART log when the command is 0xC0 and ocp
> > has been turned on in the args.
> > 
> > Though we add the whole nvme SMART log extended structure, we only populate
> > the physical_media_units_{read,written}, log_page_version and
> > log_page_uuid.
> > 
> > Signed-off-by: Joel Granados <j.granados@samsung.com>
> > ---
> >  docs/system/devices/nvme.rst |  7 +++++
> >  hw/nvme/ctrl.c               | 55 ++++++++++++++++++++++++++++++++++++
> >  hw/nvme/nvme.h               |  1 +
> >  include/block/nvme.h         | 36 +++++++++++++++++++++++
> >  4 files changed, 99 insertions(+)
> > 
> > diff --git a/docs/system/devices/nvme.rst b/docs/system/devices/nvme.rst
> > index 30f841ef62..1cc5e52c00 100644
> > --- a/docs/system/devices/nvme.rst
> > +++ b/docs/system/devices/nvme.rst
> > @@ -53,6 +53,13 @@ parameters.
> >    Vendor ID. Set this to ``on`` to revert to the unallocated Intel ID
> >    previously used.
> >  
> > +``ocp`` (default: ``off``)
> > +  The Open Compute Project defines the Datacenter NVMe SSD Specification 
> > that
> > +  sits on top of NVMe. It describes additional commands and NVMe behaviors
> > +  specific for the Datacenter. When this option is ``on`` OCP features 
> > such as
> > +  the SMART / Health information extended log become available in the
> > +  controller.
> > +
> >  Additional Namespaces
> >  ---------------------
> >  
> > diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
> > index bf291f7ffe..c7215a4ed1 100644
> > --- a/hw/nvme/ctrl.c
> > +++ b/hw/nvme/ctrl.c
> > @@ -4455,6 +4455,41 @@ static void nvme_set_blk_stats(NvmeNamespace *ns, 
> > struct nvme_stats *stats)
> >      stats->write_commands += s->nr_ops[BLOCK_ACCT_WRITE];
> >  }
> >  
> > +static uint16_t nvme_ocp_extended_smart_info(NvmeCtrl *n, uint8_t rae,
> > +                                             uint32_t buf_len, uint64_t 
> > off,
> > +                                             NvmeRequest *req)
> > +{
> > +    NvmeNamespace *ns = NULL;
> > +    NvmeSmartLogExtended smart_l = { 0 };
> > +    struct nvme_stats stats = { 0 };
> > +    uint32_t trans_len;
> > +
> > +    if (off >= sizeof(smart_l)) {
> > +        return NVME_INVALID_FIELD | NVME_DNR;
> > +    }
> > +
> > +    /* accumulate all stats from all namespaces */
> > +    for (int i = 1; i <= NVME_MAX_NAMESPACES; i++) {
> > +        ns = nvme_ns(n, i);
> > +        if (ns) {
> > +            nvme_set_blk_stats(ns, &stats);
> > +        }
> > +    }
> > +
> > +    smart_l.physical_media_units_written[0] = 
> > cpu_to_le32(stats.units_written);
> > +    smart_l.physical_media_units_read[0] = cpu_to_le32(stats.units_read);
> 
> These are uint64s, so should be cpu_to_le64().
Good catch

> 
> > +    smart_l.log_page_version = 0x0003;
> > +    smart_l.log_page_uuid[0] = 0xA4F2BFEA2810AFC5;
> > +    smart_l.log_page_uuid[1] = 0xAFD514C97C6F4F9C;
> 
> Technically the field is called the "Log Page GUID", not the UUID.
Yep, that was all me. My brain just automatically completed UUID. Sorry
about that.
> Perhaps this is a bit of Microsoft leaking in, or it is to differentiate
> it from the UUID Index functionality, who knows.
> 
> It looks like you byte swapped the two 64 bit parts, but not the
> individual bytes. It's super confusing when the spec just says "shall be
Individual bytes are also swapped. I actually tested this with nvme cli
and it successfully checks these 128 bytes. I got inspired by nvme-cli
(plugins/ocp/ocp-nvme.c) where the opc number appears.

> set to VALUE". Is that VALUE already in little endian? Sigh.
The spec says "All values in logs shall be little endian format". Which
still does not say if the value that appears in the pdf is in little or
big endian. Yes a bit confusing, see my final comment

> 
> Anyway, I think it is fair to assume that, so just make
> log_page_uuid/guid a uint8_t 16-array and do something like:
> 
>       static const uint8_t uuid[16] = {
>               0xAF, 0xD5, 0x14, 0xC9, 0x7C, 0x6F, 0x4F, 0x9C,
>               0xA4, 0xF2, 0xBF, 0xEA, 0x28, 0x10, 0xAF, 0xC5,
>       };

If I use this order the nvme-cli command will fail.
The order should be this one to be consistent with nvme-cli:

{
0xC5, 0xAF, 0x10, 0x28, 0xEA, 0xBF, 0xF2, 0xA4,
0x9C, 0x4F, 0x6F, 0x7C, 0xC9, 0x14, 0xD5, 0xAF
}

This means that nvme-cli interpreted the value that appeared on the spec
pdf as big endian and then changed it to little endian following the
spec. Another thing that could have been done is to interpret what appears in
the PDF as little endian just push it through without swapping it.
Is there something in the spec that can give clarity as to what endianess
the values in the pdf should be by default?

I see two options here:
1. We go with the interpretation of nvme-cli (big endian in pdf, little
   endian in code)
2. or we change nvme-cli (little endian in pdf, little endian in code)

What do you think?


> 
>       memcpy(smart_l.log_page_guid, uuid, sizeof(smart_l.log_page_guid));


Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]