qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Outreachy project task: Adding QEMU block layer APIs resembling Linu


From: Stefan Hajnoczi
Subject: Re: Outreachy project task: Adding QEMU block layer APIs resembling Linux ZBD ioctls.
Date: Wed, 1 Jun 2022 12:43:17 +0100

On Wed, 1 Jun 2022 at 06:47, Damien Le Moal
<damien.lemoal@opensource.wdc.com> wrote:
>
> On 6/1/22 11:57, Sam Li wrote:
> > Hi Stefan,
> >
> > Stefan Hajnoczi <stefanha@gmail.com> 于2022年5月30日周一 19:19写道:
> >
> >
> >>
> >> On Mon, 30 May 2022 at 06:09, Sam Li <faithilikerun@gmail.com> wrote:
> >>>
> >>> Hi everyone,
> >>> I'm Sam Li, working on the Outreachy project which is to add zoned
> >>> device support to QEMU's virtio-blk emulation.
> >>>
> >>> For the first goal, adding QEMU block layer APIs resembling Linux ZBD
> >>> ioctls, I think the naive approach would be to introduce a new stable
> >>> struct zbd_zone descriptor for the library function interface. More
> >>> specifically, what I'd like to add to the BlockDriver struct are:
> >>> 1. zbd_info as zone block device information: includes numbers of
> >>> zones, size of logical blocks, and physical blocks.
> >>> 2. zbd_zone_type and zbd_zone_state
> >>> 3. zbd_dev_model: host-managed zbd, host-aware zbd
> >>> With those basic structs, we can start to implement new functions as
> >>> bdrv*() APIs for BLOCK*ZONE ioctls.
> >>>
> >>> I'll start to finish this task based on the above description. If
> >>> there is any problem or something I may miss in the design, please let
> >>> me know.
> >>
> >> Hi Sam,
> >> Can you propose function prototypes for the new BlockDriver callbacks
> >> needed for zoned devices?
> >
> > I have made some modifications based on Damien's device in design part
> > 1 and added the function prototypes in design part 2. If there is any
> > problem or part I missed, please let me know.
> >
> > Design of Block Layer APIs in BlockDriver:
> > 1. introduce a new stable struct zbd_zone descriptor for the library
> > function interface.
> >   a. zbd_info as zone block device information: includes numbers of
> > zones, size of blocks, write granularity in byte(minimal write size
> > and alignment
> >     - write granularity: 512e SMRs: writes in units of physical block
> > size, 4096 bytes; NVMe ZNS write granularity is equal to the block
> > size.
> >     - zone descriptor: start, length, capacity, write pointer, zone type
> >   b. zbd_zone_type
> >     - zone type: conventional, sequential write required, sequential
> > write preferred
> >   c. zbd_dev_model: host-managed zbd, host-aware zbd
>
> This explanation is a little hard to understand. It seems to be mixing up
> device level information and per-zone information. I think it would be a
> lot simpler to write a struct definition to directly illustrate what you
> are planning.
>
> It is something like this ?
>
> struct zbd_zone {
>         enum zone_type  type;
>         enum zone_cond  cond;
>         uint64_t        start;
>         uint32_t        length;
>         uint32_t        cap;
>         uint64_t        wp;
> };
>
> strcut zbd_dev {
>         enum zone_model model;
>         uint32_t        block_size;
>         uint32_t        write_granularity;
>         uint32_t        nr_zones
>         struct zbd_zone *zones; /* array of zones */
> };
>
> If yes, then my comments are as follows.
>
> For the device struct: It may be good to have also the maximum number of
> open zones and the maximum number of active zones.
>
> For the zone struct: You may need to add a read-write lock per zone to be
> able to write lock zones to ensure a sequential write pattern (virtio
> devices can be multi-queue and so writes may be coming in from different
> contexts) and to correctly emulate zone append operations with an atomic
> update of the wp field.
>
> These need to be integrated into the generic block driver interface in
> include/block/block_int-common.h or include/block/block-common.h.

QEMU's block layer has a few ways of exposing information about block devices:

    int (*bdrv_get_info)(BlockDriverState *bs, BlockDriverInfo *bdi);
    ImageInfoSpecific *(*bdrv_get_specific_info)(BlockDriverState *bs,
Error **errp);

These fetch information from the BlockDriver and are good when a small
amount of data is reported occassionally and consumed by the caller.

For data that is continuously accessed or that could be large, it may
be necessary for the data to reside inside BlockDriverState so that it
can be accessed in place (without copying):

    void (*bdrv_refresh_limits)(BlockDriverState *bs, Error **errp);

QEMU uses this for the BlockLimits struct (BlockDriverState::bl) that
is continuously accessed by the block layer while processing I/O
requests. The "refresh" function updates the data in case the
underlying storage device has changed somehow. If no update function
is necessary then data can simply be populated during .bdrv_open() and
no new BlockDriver callback needs to be added.

So in the simplest case BlockDriverState can be extended with a struct
zbd_dev field that is populated during .bdrv_open(). If the
BlockDriver doesn't support zones then the zbd_dev.nr_zones field is 0
or the model field indicates that this is not a zoned storage device.

However, a BlockBackend (not BlockDriverState!) API will be needed to
expose this data to users like the hw/block/virtio-blk.c emulation
code or the qemu-io-cmds.c utility that can be used for testing. A
BlockBackend has a root pointer to a BlockDriverState graph (for
example, qcow2 on top of file-posix). It will be necessary to
propagate zoned storage information from the leaf BlockDriverState all
the way up to the BlockBackend. In simple cases the BB root points
directly to the file-posix BDS that has Linux ZBD support but the
design needs to account for additional BDS graph nodes.

In order to make progress on this interface I suggest looking at the
virtio-blk spec extension for zoned storage and thinking what the
BlockBackend API should look like that hw/block/virtio-blk.c will use.
Then it may be easier to decide how to report zone information from
BlockDriverState.

Stefan



reply via email to

[Prev in Thread] Current Thread [Next in Thread]