[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-devel] [PATCH 00/17] Support mismatched host and guest logical blo
[Qemu-devel] [PATCH 00/17] Support mismatched host and guest logical block sizes
Tue, 13 Dec 2011 13:37:03 +0100
Running with mismatched host and guest logical block sizes is going
to become more important as 4k-sector disks become more widespread.
This is because we need a 512 byte disk to boot from.
Mismatched block sizes have two problems:
1) with cache=none or with non-raw protocols, you just cannot do 512-byte
granularity output. You need to do read-modify-write cycles like "hybrid"
512b-logical/4k-physical disks do. (Note that actually only the iSCSI
protocol supports 4k logical blocks).
2) when host block size < guest block size, guests issue 4k-aligned
I/O and expect it to be atomic. This problem cannot really be solved
completely, because power or I/O failures could leave a partially-written
block ("torn page"). However, at least you can serialize reads against
overlapping writes, which guarantees correctness as long as shutdown is
clean and there are no I/O errors.
Read-modify-write cycles are of course slower, and need to serialize
writes which makes the situation even worse. However, the performance
impact of emulating 512-byte sectors is within noise when partitions are
aligned. File system blocks are usually 4k or bigger, and OSes tend
to use 4k-aligned buffers. So when partitions are aligned no misaligned
I/O is sent and no bounce buffer is necessary either.
The situation is much different if partitions are misaligned or if the
guest is using O_DIRECT with a 512-byte aligned buffer. I benchmarked
only the former using iozone on a RHEL6 guest (2GB memory, 20GB ext4
partition with the whole 4k-sector disk assigned to the guest). Graphs
aren't really pretty, but two points are more or less discernible (also
more or less obvious):
- writes incur a larger overhead than reads by 5-10%;
- for larger file sizes the penalty is smaller, probably because
the I/O scheduler can work better (with almost no penalty for reads);
for smaller file sizes, up to 1M or even more for some scenarios,
misalignment worsened performance by 10-25%.
The series is structured as follows.
Patches 1 to 6 clean up the handling of flag bits, so that non-raw
protocols can always request read-modify-write operation (even when
cache != none).
Patches 7 to 11 distinguish host and guest block sizes in the
Patches 12 to 15 reuse the request tracking mechanism to implement
RMW and to avoid torn pages.
Patch 16 passes down the host block size as physical block size so
that hopefully guest OSes try to align partitions.
Patch 17 adds an option to qemu-io that lets you test these scenarios
even without a 4k-sector disk.
Paolo Bonzini (17):
block: do not rely on open_flags for bdrv_is_snapshot
block: store actual flags in bs->open_flags
block: pass protocol flags up to the format
block: non-raw protocols never cache
block: remove enable_write_cache
block: move flag bits together
raw: remove the aligned_buf
block: rename buffer_alignment to guest_block_size
block: add host_block_size
raw: probe host_block_size
iscsi: save host block size
block: allow waiting only for overlapping writes
block: allow waiting at arbitrary granularity
block: protect against "torn reads" for guest_block_size > host_block_size
block: align and serialize I/O when guest_block_size < host_block_size
block: default physical block size to host block size
qemu-io: add blocksize argument to open
Makefile.objs | 4 +-
block.c | 313 ++++++++++++++++++++++++++++++++++++++++++++++-------
block.h | 17 +---
block/curl.c | 1 +
block/iscsi.c | 2 +
block/nbd.c | 1 +
block/raw-posix.c | 97 ++++++++++-------
block/raw-win32.c | 42 +++++++
block/rbd.c | 1 +
block/sheepdog.c | 1 +
block/vdi.c | 1 +
block_int.h | 25 ++---
hw/ide/core.c | 2 +-
hw/scsi-disk.c | 2 +-
hw/scsi-generic.c | 2 +-
hw/virtio-blk.c | 2 +-
qemu-io.c | 33 +++++-
trace-events | 1 +
18 files changed, 429 insertions(+), 118 deletions(-)
- [Qemu-devel] [PATCH 00/17] Support mismatched host and guest logical block sizes,
Paolo Bonzini <=
- [Qemu-devel] [PATCH 01/17] block: do not rely on open_flags for bdrv_is_snapshot, Paolo Bonzini, 2011/12/13
- [Qemu-devel] [PATCH 02/17] block: store actual flags in bs->open_flags, Paolo Bonzini, 2011/12/13
- [Qemu-devel] [PATCH 05/17] block: remove enable_write_cache, Paolo Bonzini, 2011/12/13
- [Qemu-devel] [PATCH 03/17] block: pass protocol flags up to the format, Paolo Bonzini, 2011/12/13
- [Qemu-devel] [PATCH 04/17] block: non-raw protocols never cache, Paolo Bonzini, 2011/12/13
- [Qemu-devel] [PATCH 06/17] block: move flag bits together, Paolo Bonzini, 2011/12/13
- [Qemu-devel] [PATCH 07/17] raw: remove the aligned_buf, Paolo Bonzini, 2011/12/13
- [Qemu-devel] [PATCH 08/17] block: rename buffer_alignment to guest_block_size, Paolo Bonzini, 2011/12/13
- [Qemu-devel] [PATCH 09/17] block: add host_block_size, Paolo Bonzini, 2011/12/13