qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [PATCH v9 08/12] hw/block/nvme: Support Zoned Namespace Command Set


From: Dmitry Fomichev
Subject: RE: [PATCH v9 08/12] hw/block/nvme: Support Zoned Namespace Command Set
Date: Fri, 6 Nov 2020 23:07:49 +0000

> -----Original Message-----
> From: Niklas Cassel <Niklas.Cassel@wdc.com>
> Sent: Friday, November 6, 2020 6:59 AM
> To: Dmitry Fomichev <Dmitry.Fomichev@wdc.com>
> Cc: Keith Busch <kbusch@kernel.org>; Klaus Jensen
> <k.jensen@samsung.com>; Kevin Wolf <kwolf@redhat.com>; Philippe
> Mathieu-Daudé <philmd@redhat.com>; Max Reitz <mreitz@redhat.com>;
> Maxim Levitsky <mlevitsk@redhat.com>; Fam Zheng <fam@euphon.net>;
> Alistair Francis <Alistair.Francis@wdc.com>; Matias Bjorling
> <Matias.Bjorling@wdc.com>; Damien Le Moal <Damien.LeMoal@wdc.com>;
> qemu-block@nongnu.org; qemu-devel@nongnu.org
> Subject: Re: [PATCH v9 08/12] hw/block/nvme: Support Zoned Namespace
> Command Set
> 
> On Thu, Nov 05, 2020 at 11:53:38AM +0900, Dmitry Fomichev wrote:
> > The emulation code has been changed to advertise NVM Command Set
> when
> > "zoned" device property is not set (default) and Zoned Namespace
> > Command Set otherwise.
> >
> > Define values and structures that are needed to support Zoned
> > Namespace Command Set (NVMe TP 4053) in PCI NVMe controller
> emulator.
> > Define trace events where needed in newly introduced code.
> >
> > In order to improve scalability, all open, closed and full zones
> > are organized in separate linked lists. Consequently, almost all
> > zone operations don't require scanning of the entire zone array
> > (which potentially can be quite large) - it is only necessary to
> > enumerate one or more zone lists.
> >
> > Handlers for three new NVMe commands introduced in Zoned Namespace
> > Command Set specification are added, namely for Zone Management
> > Receive, Zone Management Send and Zone Append.
> >
> > Device initialization code has been extended to create a proper
> > configuration for zoned operation using device properties.
> >
> > Read/Write command handler is modified to only allow writes at the
> > write pointer if the namespace is zoned. For Zone Append command,
> > writes implicitly happen at the write pointer and the starting write
> > pointer value is returned as the result of the command. Write Zeroes
> > handler is modified to add zoned checks that are identical to those
> > done as a part of Write flow.
> >
> > Subsequent commits in this series add ZDE support and checks for
> > active and open zone limits.
> >
> > Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
> > Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
> > Signed-off-by: Ajay Joshi <ajay.joshi@wdc.com>
> > Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
> > Signed-off-by: Matias Bjorling <matias.bjorling@wdc.com>
> > Signed-off-by: Aravind Ramesh <aravind.ramesh@wdc.com>
> > Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> > Signed-off-by: Adam Manzanares <adam.manzanares@wdc.com>
> > Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
> > Reviewed-by: Niklas Cassel <Niklas.Cassel@wdc.com>
> > ---
> >  hw/block/nvme-ns.h    |  54 +++
> >  hw/block/nvme.h       |   8 +
> >  hw/block/nvme-ns.c    | 173 ++++++++
> >  hw/block/nvme.c       | 971
> +++++++++++++++++++++++++++++++++++++++++-
> >  hw/block/trace-events |  18 +-
> >  5 files changed, 1209 insertions(+), 15 deletions(-)
> >
> 
> (snip)
> 
> > +static uint16_t nvme_zone_mgmt_recv(NvmeCtrl *n, NvmeRequest
> *req)
> > +{
> > +    NvmeCmd *cmd = (NvmeCmd *)&req->cmd;
> > +    NvmeNamespace *ns = req->ns;
> > +    /* cdw12 is zero-based number of dwords to return. Convert to bytes
> */
> > +    uint32_t data_size = (le32_to_cpu(cmd->cdw12) + 1) << 2;
> > +    uint32_t dw13 = le32_to_cpu(cmd->cdw13);
> > +    uint32_t zone_idx, zra, zrasf, partial;
> > +    uint64_t max_zones, nr_zones = 0;
> > +    uint16_t ret;
> > +    uint64_t slba;
> > +    NvmeZoneDescr *z;
> > +    NvmeZone *zs;
> > +    NvmeZoneReportHeader *header;
> > +    void *buf, *buf_p;
> > +    size_t zone_entry_sz;
> > +
> > +    req->status = NVME_SUCCESS;
> > +
> > +    ret = nvme_get_mgmt_zone_slba_idx(ns, cmd, &slba, &zone_idx);
> > +    if (ret) {
> > +        return ret;
> > +    }
> > +
> > +    zra = dw13 & 0xff;
> > +    if (zra != NVME_ZONE_REPORT) {
> > +        return NVME_INVALID_FIELD | NVME_DNR;
> > +    }
> > +
> > +    zrasf = (dw13 >> 8) & 0xff;
> > +    if (zrasf > NVME_ZONE_REPORT_OFFLINE) {
> > +        return NVME_INVALID_FIELD | NVME_DNR;
> > +    }
> > +
> > +    if (data_size < sizeof(NvmeZoneReportHeader)) {
> > +        return NVME_INVALID_FIELD | NVME_DNR;
> > +    }
> > +
> > +    ret = nvme_map_dptr(n, data_size, req);
> 
> nvme_map_dptr() call was not here in v8 patch set.
> 
> On v7 I commented that you were missing a call to nvme_check_mdts().
> I think you still need to call nvme_check_mdts in order to verify
> that data_size < mdts, no?

Ugh, I've added nvme_map_dptr() instead of nvme_check_mdts() :o
Will send the corrected version shortly...

Cheers,
DF

> 
> This function already has a call do nvme_dma(). nvme_dma() already
> calls nvme_map_dptr().
> If you use nvme_dma(), you cannot use nvme_map_dptr().
> 
> It will call nvme_map_addr() (which calls qemu_sglist_add()) on the
> same buffer twice, causing the qsg->size to be twice what the user
> sent in. Which will cause the:
> 
>         if (unlikely(residual)) {
> 
> check in nvme_dma() to fail.
> 
> 
> Looking at nvme_read()/nvme_write(), they use nvme_map_dptr()
> (without any call to nvme_dma()), and then use dma_blk_read() or
> dma_blk_write(). (and they both call nvme_check_mdts())
> 
> 
> Kind regards,
> Niklas
> 
> > +    if (ret) {
> > +        return ret;
> > +    }
> > +
> > +    partial = (dw13 >> 16) & 0x01;
> > +
> > +    zone_entry_sz = sizeof(NvmeZoneDescr);
> > +
> > +    max_zones = (data_size - sizeof(NvmeZoneReportHeader)) /
> zone_entry_sz;
> > +    buf = g_malloc0(data_size);
> > +
> > +    header = (NvmeZoneReportHeader *)buf;
> > +    buf_p = buf + sizeof(NvmeZoneReportHeader);
> > +
> > +    while (zone_idx < ns->num_zones && nr_zones < max_zones) {
> > +        zs = &ns->zone_array[zone_idx];
> > +
> > +        if (!nvme_zone_matches_filter(zrasf, zs)) {
> > +            zone_idx++;
> > +            continue;
> > +        }
> > +
> > +        z = (NvmeZoneDescr *)buf_p;
> > +        buf_p += sizeof(NvmeZoneDescr);
> > +        nr_zones++;
> > +
> > +        z->zt = zs->d.zt;
> > +        z->zs = zs->d.zs;
> > +        z->zcap = cpu_to_le64(zs->d.zcap);
> > +        z->zslba = cpu_to_le64(zs->d.zslba);
> > +        z->za = zs->d.za;
> > +
> > +        if (nvme_wp_is_valid(zs)) {
> > +            z->wp = cpu_to_le64(zs->d.wp);
> > +        } else {
> > +            z->wp = cpu_to_le64(~0ULL);
> > +        }
> > +
> > +        zone_idx++;
> > +    }
> > +
> > +    if (!partial) {
> > +        for (; zone_idx < ns->num_zones; zone_idx++) {
> > +            zs = &ns->zone_array[zone_idx];
> > +            if (nvme_zone_matches_filter(zrasf, zs)) {
> > +                nr_zones++;
> > +            }
> > +        }
> > +    }
> > +    header->nr_zones = cpu_to_le64(nr_zones);
> > +
> > +    ret = nvme_dma(n, (uint8_t *)buf, data_size,
> > +                   DMA_DIRECTION_FROM_DEVICE, req);
> > +
> > +    g_free(buf);
> > +
> > +    return ret;
> > +}
> > +



reply via email to

[Prev in Thread] Current Thread [Next in Thread]