qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] [RFC PATCH 00/19] block: Support for 512b-on-4k emulation


From: Kevin Wolf
Subject: [Qemu-devel] [RFC PATCH 00/19] block: Support for 512b-on-4k emulation
Date: Fri, 6 Dec 2013 18:22:41 +0100

This patch series adds code to the block layer that allows performing
I/O requests in smaller granularities than required by the host backend
(most importantly, O_DIRECT restrictions). It achieves this for reads
by rounding the request to host-side block boundary, and for writes by
performing a read-modify-write cycle (and serialising requests
touching the same block so that the RMW doesn't write back stale data).

Originally I intended to reuse a lot of code from Paolo's previous
patch series, however as I tried to integrate pread/pwrite, which
already do a very similar thing (except for considering concurrency),
and because I wanted to implement zero-copy, most of this series ended
up being new code.

Zero-copy is possible in a common case because while XFS defauls to a
4k sector size and therefore 4k on-disk O_DIRECT alignment for 512E
disks, it still only has a 512 byte memory alignment requirement.
(Unfortunately the XFS_IOC_DIOINFO ioctl claims 4k even for memory, but
we know that the value is wrong and can probe it.)


This series does not cover 4k guests on a 512 byte host, and I'm not
sure yet what to do with this case. Paolos series contained a patch to
protect against "torn reads" (i.e. reads running in parallel with
writes, which return old data for one half of a sector and new data for
the other half) by serialising requests if the guest block size was
greater than the host block size.

One problem with this approach is that it assumes that a single host
block size even exists and can be compared against on the top level.
Different backing files can be stored on different storage, though, with
different block sizes.

Another problem is that block drivers can split requests internally
(imagine a qcow2 image with 512 byte clusters), which would have to be
detected as well.

Finally, it's unclear what to do with cache modes using the kernel page
cache. Technically, these have a required alignment of 1 byte, which is
always smaller than the guest alignment. We always have to expect short
writes, so we can't say "it's always the granularity of the request".
However, serialising _every_ request certainly doesn't seem reasonable;
we've never done it, and we've never got any bug reports.

Other non-file protocols may have the same problem.

(And all of this is ignoring that with multiple users of the block
device - e.g. guest device, NBD server, block jobs - there isn't even a
single guest block size, but it must be passed per request if done
properly.)


Anyway, so I'm hoping for a review of this series in order to get
512b-on-4k merged soon, and some help/discussion for the 4k-on-512
case.

Kevin Wolf (17):
  qemu_memalign: Allow small alignments
  block: Detect unaligned length in bdrv_qiov_is_aligned()
  block: Don't use guest sector size for qemu_blockalign()
  block: Introduce bdrv_aligned_preadv()
  block: Introduce bdrv_co_do_preadv()
  block: Introduce bdrv_aligned_pwritev()
  block: write: Handle COR dependency after I/O throttling
  block: Introduce bdrv_co_do_pwritev()
  block: Switch BdrvTrackedRequest to byte granularity
  block: Allow waiting for overlapping requests between begin/end
  block: Make zero-after-EOF work with larger alignment
  block: Generalise and optimise COR serialisation
  block: Make overlap range for serialisation dynamic
  block: Align requests in bdrv_co_do_pwritev()
  block: Change coroutine wrapper to byte granularity
  block: Make bdrv_pread() a  bdrv_prwv_co() wrapper
  block: Make bdrv_pwrite() a  bdrv_prwv_co() wrapper

Paolo Bonzini (2):
  block: rename buffer_alignment to guest_block_size
  raw: Probe required direct I/O alignment

 block.c                   | 572 ++++++++++++++++++++++++++++++----------------
 block/backup.c            |   7 +-
 block/raw-posix.c         | 102 +++++++--
 block/raw-win32.c         |  41 ++++
 hw/block/virtio-blk.c     |   2 +-
 hw/ide/core.c             |   2 +-
 hw/scsi/scsi-disk.c       |   2 +-
 hw/scsi/scsi-generic.c    |   2 +-
 include/block/block.h     |   3 +-
 include/block/block_int.h |  24 +-
 util/oslib-posix.c        |   5 +
 11 files changed, 539 insertions(+), 223 deletions(-)

-- 
1.8.1.4




reply via email to

[Prev in Thread] Current Thread [Next in Thread]