[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] [PATCH 00/17] Support mismatched host and guest logical blo

From: Paolo Bonzini
Subject: [Qemu-devel] [PATCH 00/17] Support mismatched host and guest logical block sizes
Date: Tue, 13 Dec 2011 13:37:03 +0100

Running with mismatched host and guest logical block sizes is going
to become more important as 4k-sector disks become more widespread.
This is because we need a 512 byte disk to boot from.

Mismatched block sizes have two problems:

1) with cache=none or with non-raw protocols, you just cannot do 512-byte
granularity output.  You need to do read-modify-write cycles like "hybrid"
512b-logical/4k-physical disks do.  (Note that actually only the iSCSI
protocol supports 4k logical blocks).

2) when host block size < guest block size, guests issue 4k-aligned
I/O and expect it to be atomic.  This problem cannot really be solved
completely, because power or I/O failures could leave a partially-written
block ("torn page").  However, at least you can serialize reads against
overlapping writes, which guarantees correctness as long as shutdown is
clean and there are no I/O errors.

Read-modify-write cycles are of course slower, and need to serialize
writes which makes the situation even worse.  However, the performance
impact of emulating 512-byte sectors is within noise when partitions are
aligned.  File system blocks are usually 4k or bigger, and OSes tend
to use 4k-aligned buffers.  So when partitions are aligned no misaligned
I/O is sent and no bounce buffer is necessary either.

The situation is much different if partitions are misaligned or if the
guest is using O_DIRECT with a 512-byte aligned buffer.  I benchmarked
only the former using iozone on a RHEL6 guest (2GB memory, 20GB ext4
partition with the whole 4k-sector disk assigned to the guest).  Graphs
aren't really pretty, but two points are more or less discernible (also
more or less obvious):

- writes incur a larger overhead than reads by 5-10%;

- for larger file sizes the penalty is smaller, probably because
the I/O scheduler can work better (with almost no penalty for reads);
for smaller file sizes, up to 1M or even more for some scenarios,
misalignment worsened performance by 10-25%.

The series is structured as follows.

Patches 1 to 6 clean up the handling of flag bits, so that non-raw
protocols can always request read-modify-write operation (even when
cache != none).

Patches 7 to 11 distinguish host and guest block sizes in the

Patches 12 to 15 reuse the request tracking mechanism to implement
RMW and to avoid torn pages.

Patch 16 passes down the host block size as physical block size so
that hopefully guest OSes try to align partitions.

Patch 17 adds an option to qemu-io that lets you test these scenarios
even without a 4k-sector disk.

Paolo Bonzini (17):
  block: do not rely on open_flags for bdrv_is_snapshot
  block: store actual flags in bs->open_flags
  block: pass protocol flags up to the format
  block: non-raw protocols never cache
  block: remove enable_write_cache
  block: move flag bits together
  raw: remove the aligned_buf
  block: rename buffer_alignment to guest_block_size
  block: add host_block_size
  raw: probe host_block_size
  iscsi: save host block size
  block: allow waiting only for overlapping writes
  block: allow waiting at arbitrary granularity
  block: protect against "torn reads" for guest_block_size > host_block_size
  block: align and serialize I/O when guest_block_size < host_block_size
  block: default physical block size to host block size
  qemu-io: add blocksize argument to open

 Makefile.objs     |    4 +-
 block.c           |  313 ++++++++++++++++++++++++++++++++++++++++++++++-------
 block.h           |   17 +---
 block/curl.c      |    1 +
 block/iscsi.c     |    2 +
 block/nbd.c       |    1 +
 block/raw-posix.c |   97 ++++++++++-------
 block/raw-win32.c |   42 +++++++
 block/rbd.c       |    1 +
 block/sheepdog.c  |    1 +
 block/vdi.c       |    1 +
 block_int.h       |   25 ++---
 hw/ide/core.c     |    2 +-
 hw/scsi-disk.c    |    2 +-
 hw/scsi-generic.c |    2 +-
 hw/virtio-blk.c   |    2 +-
 qemu-io.c         |   33 +++++-
 trace-events      |    1 +
 18 files changed, 429 insertions(+), 118 deletions(-)


reply via email to

[Prev in Thread] Current Thread [Next in Thread]