qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] Summary: [PATCH RFC 0/5] disk deadlines


From: Denis V. Lunev
Subject: [Qemu-devel] Summary: [PATCH RFC 0/5] disk deadlines
Date: Thu, 10 Sep 2015 22:29:22 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0

On 09/08/2015 11:00 AM, Denis V. Lunev wrote:
Description of the problem:
Client and server interacts via Network File System (NFS) or using other
network storage like CEPH. The server contains an image of the Virtual
Machine (VM) with Linux inside. The disk is exposed as SATA or IDE
to VM. VM is started on the client as usual. In the case of network shortage
requests from the virtial disk can not be completed in predictable time.
If this request is f.e. ext3/4 journal write then the guest will reset
the controller and restart the request for the first time. On next such
event the guest will remount victim filesystem read-only. From the
end-user point of view this will look like a fatal crash with a manual
reboot required.

To avoid such situation this patchset introduces patch per-drive option
"disk-deadlines=on|off" which is unset by default. All disk requests
will become tracked if the option is enabled. If requests are not completed
in time some countermeasures applied (see below). The timeout could be
configured, default one is chosen by observations.

Test description that let reproduce the problem:
1) configure and start NFS server:
$sudo /etc/init.d/nfs-kernel-server restart
2) put Virtial Machine image with preinstalled Operating System on the server
3) on the client mount server folder that contains Virtial Machine image:
$sudo mount -t nfs -O uid=1000,iocharset=utf-8 server_ip:/path/to/folder/on/
server /path/to/folder/on/client
4) start Virtual Machine with QEMU on the client (for example):
$qemu-system-x86_64 -enable-kvm -vga std -balloon virtio -monitor stdio
  -drive 
file=/path/to/folder/on/client/vdisk.img,media=disk,if=ide,disk-deadlines=on
  -boot d -m 12288
5) inside of VM rum the following command:
$dd if=/dev/urandom of=testfile bs=10M count=300
AND stop the server (or disconnect network) by running:
$sudo /etc/init.d/nfs-kernel-server stop
6) inside of VM periodically run:
$dmesg
and check error messages.

One can get one of the error messages (just the main lines):
1) After server restarting Guest OS continues run as usual with
the following messages in dmesg:
   a) [ 1108.131474] nfs: server 10.30.23.163 not responding, still trying
      [ 1203.164903] INFO: task qemu-system-x86:3256 blocked for more
      than 120 seconds

   b) [ 581.184311] ata1.00: qc timeout (cmd 0xe7)
      [ 581.184321] ata1.00: FLUSH failed Emask 0x4
      [ 581.744271] ata1: soft resetting link
      [ 581.900346] ata1.01: NODEV after polling detection
      [ 581.900877] ata1.00: configured for MWDMA2
      [ 581.900879] ata1.00: retrying FLUSH 0xe7 Emask 0x4
      [ 581.901203] ata1.00: device reported invalid CHS sector 0
      [ 581.901213] ata1: EH complete
2) Guest OS remounts its Filesystem as read-only:
"remounting filesystem read-only"
3) Guest OS does not respond at all even after server restart

Tested on:
Virtual Machine - Linux 3.11.0 SMP x86_64 Ubuntu 13.10 saucy;
client -  Linux 3.11.10 SMP x86_64, Ubuntu 13.10 saucy;
server - Linux 3.13.0 SMP x86_64, Ubuntu 14.04.1 LTS.

How the given solution works?

If disk-deadlines option is enabled for a drive, one controls time completion
of this drive's requests. The method is as follows (further assume that this
option is enabled).

Every drive has its own red-black tree for keeping its requests.
Expiration time of the request is a key, cookie (as id of request) is an
appropriate node. Assume that every requests has 8 seconds to be completed.
If request was not accomplished in time for some reasons (server crash or smth
else), timer of this drive is fired and an appropriate callback requests to
stop Virtial Machine (VM).

VM remains stopped until all requests from the disk which caused VM's stopping
are completed. Furthermore, if there is another disks with 'disk-deadlines=on'
whose requests are waiting to be completed, do not start VM : wait completion
of all "late" requests from all disks.

Furthermore, all requests which caused VM stopping (or those that just were not
completed in time) could be printed using "info disk-deadlines" qemu monitor
option as follows:
$(qemu) info disk-deadlines

    disk_id  type       size total_time        start_time
.--------------------------------------------------------
   ide0-hd1 FLUSH         0b 46.403s     22232930059574ns
   ide0-hd1 FLUSH         0b 57.591s     22451499241285ns
   ide0-hd1 FLUSH         0b 103.482s    22574100547397ns

This set is sent in the hope that it might be useful.

Signed-off-by: Raushaniya Maksudova <address@hidden>
Signed-off-by: Denis V. Lunev <address@hidden>
CC: Stefan Hajnoczi <address@hidden>
CC: Kevin Wolf <address@hidden>

Raushaniya Maksudova (5):
   add QEMU style defines for __sync_add_and_fetch
   disk_deadlines: add request to resume Virtual Machine
   disk_deadlines: add disk-deadlines option per drive
   disk_deadlines: add control of requests time expiration
   disk_deadlines: add info disk-deadlines option

  block/Makefile.objs            |   1 +
  block/accounting.c             |   8 ++
  block/disk-deadlines.c         | 280 +++++++++++++++++++++++++++++++++++++++++
  blockdev.c                     |  20 +++
  hmp.c                          |  37 ++++++
  hmp.h                          |   1 +
  include/block/accounting.h     |   2 +
  include/block/disk-deadlines.h |  48 +++++++
  include/qemu/atomic.h          |   3 +
  include/sysemu/sysemu.h        |   1 +
  monitor.c                      |   7 ++
  qapi-schema.json               |  33 +++++
  stubs/vm-stop.c                |   5 +
  vl.c                           |  18 +++
  14 files changed, 464 insertions(+)
  create mode 100644 block/disk-deadlines.c
  create mode 100644 include/block/disk-deadlines.h


Discussion summary:
- the idea itself is OK
- there are some technical faults like using Linux specific API for synchronization
- libvirt should be notified when deadline happens
- deadline timeout should be configurable
- deadlines should not be added to guest statistics. New architectural and configuration approach is necessary.

There are 2 main options:
- filter driver
- code could be embedded into current main block code

- another question is how to configure deadlines. With block layer the approach is clear, this would be an option of the driver. On the other hand we could add 'io-timeout' option to generic block driver code and avoid any further options (-1 will mean default timeout, 0 will mean no timeout). Something like this.

I would spend a couple next days to analyse better architectural approach, but any suggestion would be welcome. At the moment I tend to think about integration into generic code. On the other hand we could implement it as filter but add it using simple option in generic code to avoid unnecessary complexity to end user. Though may be I am a bit confused and puzzled here.

Den



reply via email to

[Prev in Thread] Current Thread [Next in Thread]