qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC v6 2/4] virtio-blk: add zoned storage emulation for zoned devic


From: Daniel P . Berrangé
Subject: Re: [RFC v6 2/4] virtio-blk: add zoned storage emulation for zoned devices
Date: Mon, 30 Jan 2023 18:30:16 +0000
User-agent: Mutt/2.2.9 (2022-11-12)

On Mon, Jan 30, 2023 at 10:17:48AM -0500, Stefan Hajnoczi wrote:
> On Mon, 30 Jan 2023 at 07:33, Daniel P. Berrangé <berrange@redhat.com> wrote:
> >
> > On Sun, Jan 29, 2023 at 06:39:49PM +0800, Sam Li wrote:
> > > This patch extends virtio-blk emulation to handle zoned device commands
> > > by calling the new block layer APIs to perform zoned device I/O on
> > > behalf of the guest. It supports Report Zone, four zone oparations (open,
> > > close, finish, reset), and Append Zone.
> > >
> > > The VIRTIO_BLK_F_ZONED feature bit will only be set if the host does
> > > support zoned block devices. Regular block devices(conventional zones)
> > > will not be set.
> > >
> > > The guest os can use blktests, fio to test those commands on zoned 
> > > devices.
> > > Furthermore, using zonefs to test zone append write is also supported.
> > >
> > > Signed-off-by: Sam Li <faithilikerun@gmail.com>
> > > ---
> > >  hw/block/virtio-blk-common.c |   2 +
> > >  hw/block/virtio-blk.c        | 394 +++++++++++++++++++++++++++++++++++
> > >  2 files changed, 396 insertions(+)
> > >
> >
> > > @@ -949,6 +1311,30 @@ static void virtio_blk_update_config(VirtIODevice 
> > > *vdev, uint8_t *config)
> > >          blkcfg.write_zeroes_may_unmap = 1;
> > >          virtio_stl_p(vdev, &blkcfg.max_write_zeroes_seg, 1);
> > >      }
> > > +    if (bs->bl.zoned != BLK_Z_NONE) {
> > > +        switch (bs->bl.zoned) {
> > > +        case BLK_Z_HM:
> > > +            blkcfg.zoned.model = VIRTIO_BLK_Z_HM;
> > > +            break;
> > > +        case BLK_Z_HA:
> > > +            blkcfg.zoned.model = VIRTIO_BLK_Z_HA;
> > > +            break;
> > > +        default:
> > > +            g_assert_not_reached();
> > > +        }
> > > +
> > > +        virtio_stl_p(vdev, &blkcfg.zoned.zone_sectors,
> > > +                     bs->bl.zone_size / 512);
> > > +        virtio_stl_p(vdev, &blkcfg.zoned.max_active_zones,
> > > +                     bs->bl.max_active_zones);
> > > +        virtio_stl_p(vdev, &blkcfg.zoned.max_open_zones,
> > > +                     bs->bl.max_open_zones);
> > > +        virtio_stl_p(vdev, &blkcfg.zoned.write_granularity, blk_size);
> > > +        virtio_stl_p(vdev, &blkcfg.zoned.max_append_sectors,
> > > +                     bs->bl.max_append_sectors);
> >
> > So these are all ABI sensitive frontend device settings, but they are
> > not exposed as tunables on the virtio-blk device, instead they are
> > implicitly set from the backend.
> >
> > We have done this kind of thing before in QEMU, but several times it
> > has bitten QEMU maintainers/users, as having a backend affect the
> > frontend ABI is not to typical. It wouldn't be immediately obvious
> > when starting QEMU on a target host that the live migration would
> > be breaking ABI if the target host wasn't using a zoned device with
> > exact same settings.
> >
> > This also limits mgmt flexibility across live migration, if the
> > mgmt app wants/needs to change the storage backend. eg maybe they
> > need to evacuate the host for an emergency, but don't have spare
> > hosts with same kind of storage. It might be desirable to migrate
> > and switch to a plain block device or raw/qcow2 file, rather than
> > let the VM die.
> >
> > Can we make these virtio setting be explicitly controlled on the
> > virtio-blk device.  If not specified explicitly they could be
> > auto-populated from the backend for ease of use, but if specified
> > then simply validate the backend is a match. libvirt would then
> > make sure these are always explicitly set on the frontend.
> 
> I think this is a good idea, especially if we streamline the
> file-posix.c driver by merging --blockdev zoned_host_device into
> --blockdev host_device. It won't be obvious from the command-line
> whether this is a zoned or non-zoned device. There should be a
> --device virtio-blk-pci,drive=drive0,zoned=on option that fails when
> drive0 isn't zoned. It should probably be on/off/auto where auto is
> the default and doesn't check anything, on requires a zoned device,
> and off requires a non-zoned device. That will prevent accidental
> migration between zoned/non-zoned devices.
> 
> I want to point out that virtio-blk doesn't have checks for the disk
> size or other details, so what you're suggesting for zone_sectors, etc
> is stricter than what QEMU does today. Since the virtio-blk parameters
> you're proposing are optional, I think it doesn't hurt though.

Yeah, it is slightly different than some of the parameters handling.
I guess you could say that with disk capacity, matching size is a
fairly obvious constraint/expectation to manage, and also long standing. 

With disk capacity, you can add the 'raw' driver on top of any block
driver stack, to apply an arbitrary offset+size, to make the storage
smaller than it otherwise is on disk. Conceptually than could have
been done on the frontend device(s) too, but I guess it made more
sense to do it in the block layer to give consistent enforcement
of the limits across frontends. It is fuzzy whether such a use of
the 'raw' driver is really considered backend config,  as opposed to
frontend config but to me it feels likle frontend config.

You could possibly come up with the concept of a 'zoned' format that
can be layered on top of a block driver stack to add zoned I/O constraints
for sake of compatibility, where none otherwise exists in the physical
storage. Possibly useful if multiple frontends all support zoned storage,
to avoid duplicating the constraints across all ?

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|




reply via email to

[Prev in Thread] Current Thread [Next in Thread]