qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Zoned storage support in libvirt


From: Stefan Hajnoczi
Subject: Re: Zoned storage support in libvirt
Date: Mon, 30 Jan 2023 14:29:07 -0500

On Mon, Jan 30, 2023 at 12:53:22PM +0000, Daniel P. Berrangé wrote:
> On Mon, Jan 30, 2023 at 09:30:40PM +0900, Damien Le Moal wrote:
> > On 1/30/23 21:21, Daniel P. Berrangé wrote:
> > > On Wed, Jan 11, 2023 at 10:24:30AM -0500, Stefan Hajnoczi wrote:
> > >> On Tue, Jan 10, 2023 at 03:29:47PM +0000, Daniel P. Berrangé wrote:
> > >>> On Tue, Jan 10, 2023 at 10:19:51AM -0500, Stefan Hajnoczi wrote:
> > >>>> Hi Peter,
> > >>>> Zoned storage support
> > >>>> (https://zonedstorage.io/docs/introduction/zoned-storage) is being 
> > >>>> added
> > >>>> to QEMU. Given a zoned host block device, the QEMU syntax will look 
> > >>>> like
> > >>>> this:
> > >>>>
> > >>>>   --blockdev zoned_host_device,node-name=drive0,filename=/dev/$BDEV,...
> > >>>>   --device virtio-blk-pci,drive=drive0
> > >>>>
> > >>>> Note that regular --blockdev host_device will not work.
> > >>>>
> > >>>> For now the virtio-blk device is the only one that supports zoned
> > >>>> blockdevs.
> > >>>
> > >>> Does the virtio-blk device expowsed guest ABI differ at all
> > >>> when connected zoned_host_device instead of host_device ?
> > >>
> > >> Yes. There is a VIRTIO feature bit, some configuration space fields,
> > >> etc. virtio-blk-pci detects when the blockdev is zoned and enables the
> > >> feature bit.
> > > 
> > > I get a general sense of unease when frontend device ABI sensitive
> > > features  get secretly toggled based on features exposed by the
> > > backend.
> > > 
> > > When trying to validate ABI compatibility of guest configs, libvirt
> > > would generally compare frontend properties to look for differences.
> > > 
> > > There are a small set of cases where backends affect frontend
> > > features, but it is not that common to see.
> > > 
> > > Consider what happens if we have a guest running no zoned storage,
> > > and we need to evacuate the host to a machine without zoned
> > > storage available. Could we replace the stroage backend on the
> > > target host with a raw/qcow2  backend but keep pretending it is
> > > zoned storage to the guest. The guest would continue making its
> > > I/O ops be batched for the zoned storage, which would be redundant
> > > for raw/qcow2, but presumbly should still work.  If this is possible
> > > it would suggest the need to have explicit settings for zoned storage
> > > on the virtio-blk frontend.  QEMU would "merely"  validate that these
> > > settings are turned on, if the host storage is zoned too.
> > > 
> > >>>> This brings to mind a few questions:
> > >>>>
> > >>>> 1. Does libvirt need domain XML syntax for zoned storage? 
> > >>>> Alternatively,
> > >>>>    it could probe /sys/block/$BDEV/queue/zoned and generate the correct
> > >>>>    QEMU command-line arguments for zoned devices when the contents of
> > >>>>    the file are not "none".
> > >>>>
> > >>>> 2. Should QEMU --blockdev host_device detected zoned devices so that
> > >>>>    --blockdev zoned_host_device is not necessary? That way libvirt 
> > >>>> would
> > >>>>    automatically support zoned storage without any domain XML syntax or
> > >>>>    libvirt code changes.
> > >>>>
> > >>>>    The drawbacks I see when QEMU detects zoned storage automatically:
> > >>>>    - You can't easiy tell if a blockdev is zoned from the command-line.
> > >>>>    - It's possible to mismatch zoned and non-zoned devices across live
> > >>>>      migration.
> > >>>
> > >>> What happens with existing QEMU impls if you use --blockdev host_device
> > >>> pointing to a /dev/$BDEV that is a zoned device ?  If it succeeds and
> > >>> works correctly, then we likely need to continue to support that. This
> > >>> would push towards needing a new XML element.
> > >>
> > >> Pointing host_device at a zoned device doesn't result in useful behavior
> > >> because the guest is unaware that this is a zoned device. The guest
> > >> won't be able to access the device correctly (i.e. sequential writes
> > >> only). Write requests will fail eventually.
> > >>
> > >> I would consider zoned devices totally unsupported in QEMU today and we
> > >> don't need to worry about preserving any kind of backwards compatibility
> > >> with --blockdev host_device,filename=/dev/my_zoned_device.
> > > 
> > > So I guess I'm not so worried about host_device vs zoned_host_device,
> > > if we have explicit settings for controlled zoned behaviour on the
> > > virtio-blk frontend.
> > > 
> > > I feel like we should have something explicit somewhere though, as this
> > > is a pretty significant difference in the storage stack, that I think
> > > mgmt apps should be aware of, as it has implications for how you manage
> > > the VMs on an ongoing basis.
> > > 
> > > We could still have it "do what I mean" by default though. eg the
> > > virtio-blk setting defaults could imply "match the host", so we get
> > > effectively a tri-state  (zoned=on/off/auto)
> > 
> > What would zoned=on mean ? If the backend is not zoned, virtio will expose a
> > regular block device to the guest as it should.
> 
> Sorry, I should have expanded further, I didn't mean that alone. It would
> also need to expose the related settings of the virtio-blk device:
> 
> > +        virtio_stl_p(vdev, &blkcfg.zoned.zone_sectors,
> > +                     bs->bl.zone_size / 512);
> > +        virtio_stl_p(vdev, &blkcfg.zoned.max_active_zones,
> > +                     bs->bl.max_active_zones);
> > +        virtio_stl_p(vdev, &blkcfg.zoned.max_open_zones,
> > +                     bs->bl.max_open_zones);
> > +        virtio_stl_p(vdev, &blkcfg.zoned.write_granularity, blk_size);
> > +        virtio_stl_p(vdev, &blkcfg.zoned.max_append_sectors,
> > +                     bs->bl.max_append_sectors);
> 
> so eg
> 
>    -device 
> virtio-blk,zoned=on,zone_sectors=NN,max_active_zones=NN,max_open_zones=NN....
> 
> 
> So the guest would be honouring thuese zone constraints, even though they
> are not required by a raw/qcow2 file.
> 
> in this world
> 
>  -device virtio-blk,zoned=on
> 
> would be a short hand to say get the rest of the tunables from the backend
> device or error, if the backend doesn't support them.
> 
>  -device virtio-blk,zoned=auto
> 
> would be a short hand to say "do the right thing" regardless of whether the
> backend is zoned or non-zoned.
> 
> > For zoned=auto, same, I am not sure what that would achieve. If the backend 
> > is
> > zoned, it will be seen as zoned by the guest. If the backend is a regular 
> > disk,
> > it will be exposed as a regular disk. So what would this option achieve ?
> > 
> > And for zoned=off, I guess you would want to ignore a backend drive if it 
> > is zoned ?
> 
> It would explicitly report an error, since IIUC from Stefan's reply, this
> scenario would eventually end in I/O failures.

What you've described sounds good to me:
1. By default it exposes the device, no questions asked.
2. Management tools like libvirt can explicitly request
   zoned=on/off, zone_sectors=..., etc to prevent misconfiguration.

Best of both worlds.

Stefan

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]