qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC v4 11/11] virtio-blk: use BDRV_REQ_REGISTERED_BUF optimization


From: Stefan Hajnoczi
Subject: Re: [RFC v4 11/11] virtio-blk: use BDRV_REQ_REGISTERED_BUF optimization hint
Date: Tue, 30 Aug 2022 16:16:24 -0400

On Thu, Aug 25, 2022 at 09:43:16AM +0200, David Hildenbrand wrote:
> On 23.08.22 21:22, Stefan Hajnoczi wrote:
> > On Tue, Aug 23, 2022 at 10:01:59AM +0200, David Hildenbrand wrote:
> >> On 23.08.22 00:24, Stefan Hajnoczi wrote:
> >>> Register guest RAM using BlockRAMRegistrar and set the
> >>> BDRV_REQ_REGISTERED_BUF flag so block drivers can optimize memory
> >>> accesses in I/O requests.
> >>>
> >>> This is for vdpa-blk, vhost-user-blk, and other I/O interfaces that rely
> >>> on DMA mapping/unmapping.
> >>
> >> Can you explain why we're monitoring RAMRegistrar to hook into "guest
> >> RAM" and not go the usual path of the MemoryListener?
> > 
> > The requirements are similar to VFIO, which uses RAMBlockNotifier. We
> 
> Only VFIO NVME uses RAMBlockNotifier. Ordinary VFIO uses the MemoryListener.
> 
> Maybe the difference is that ordinary VFIO has to replicate the actual
> guest physical memory layout, and VFIO NVME is only interested in
> possible guest RAM inside guest physical memory.
> 
> > need to learn about all guest RAM because that's where I/O buffers are
> > located.
> > 
> > Do you think RAMBlockNotifier should be avoided?
> 
> I assume it depends on the use case. For saying "this might be used for
> I/O" it might be good enough I guess.
> 
> > 
> >> What will BDRV_REQ_REGISTERED_BUF actually do? Pin all guest memory in
> >> the worst case such as io_uring fixed buffers would do ( I hope not ).
> > 
> > BLK_REQ_REGISTERED_BUF is a hint that no bounce buffer is necessary
> > because the I/O buffer is located in memory that was previously
> > registered with bdrv_registered_buf().
> > 
> > The RAMBlockNotifier calls bdrv_register_buf() to let the libblkio
> > driver know about RAM. Some libblkio drivers ignore this hint, io_uring
> > may use the fixed buffers feature, vhost-user sends the shared memory
> > file descriptors to the vhost device server, and VFIO/vhost may pin
> > pages.
> > 
> > So the blkio block driver doesn't add anything new, it's the union of
> > VFIO/vhost/vhost-user/etc memory requirements.
> 
> The issue is if that backend pins memory inside any of these regions.
> Then, you're instantly incompatible to anything the relies on sparse
> RAMBlocks, such as memory ballooning or virtio-mem, and have to properly
> fence it.
> 
> In that case, you'd have to successfully trigger
> ram_block_discard_disable(true) first, before pinning. Who would do that
> now conditionally, just like e.g., VFIO does?
> 
> io_uring fixed buffers would be one such example that pins memory and is
> problematic. vfio (unless on s390x) is another example, as you point out.

Okay, I think libblkio needs to expose a bool property called
"mem-regions-pinned" so QEMU whether or not the registered buffers will
be pinned.

Then the QEMU BlockDriver can do:

  if (mem_regions_pinned) {
      if (ram_block_discard_disable(true) < 0) {
          ...fail to open block device...
      }
  }

Does that sound right?

Is "pinned" the best word to describe this or is there a more general
characteristic we are looking for?

> 
> This has to be treated with care. Another thing to consider is that
> different backends might only support a limited number of such regions.
> I assume there is a way for QEMU to query this limit upfront? It might
> be required for memory hot(un)plug to figure out how many memory slots
> we actually have (for ordinary DIMMs, and if we ever want to make this
> compatible to virtio-mem, it might be required as well when the backend
> pins memory).

Yes, libblkio reports the maximum number of blkio_mem_regions supported
by the device. The property is called "max-mem-regions".

The QEMU BlockDriver currently doesn't use this information. Are there
any QEMU APIs that should be called to propagate this value?

Stefan

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]