[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-devel] Re: [RFC] KVM Fault Tolerance: Kemari for KVM
From: |
Avi Kivity |
Subject: |
[Qemu-devel] Re: [RFC] KVM Fault Tolerance: Kemari for KVM |
Date: |
Sun, 15 Nov 2009 12:35:25 +0200 |
User-agent: |
Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.4pre) Gecko/20091014 Fedora/3.0-2.8.b4.fc11 Thunderbird/3.0b4 |
On 11/09/2009 05:53 AM, Fernando Luis Vázquez Cao wrote:
Kemari runs paired virtual machines in an active-passive configuration
and achieves whole-system replication by continuously copying the
state of the system (dirty pages and the state of the virtual devices)
from the active node to the passive node. An interesting implication
of this is that during normal operation only the active node is
actually executing code.
Can you characterize the performance impact for various workloads? I
assume you are running continuously in log-dirty mode. Doesn't this
make memory intensive workloads suffer?
The synchronization process can be broken down as follows:
- Event tapping: On KVM all I/O generates a VMEXIT that is
synchronously handled by the Linux kernel monitor i.e. KVM (it is
worth noting that this applies to virtio devices too, because they
use MMIO and PIO just like a regular PCI device).
Some I/O (virtio-based) is asynchronous, but you still have well-known
tap points within qemu.
- Notification to qemu: Taking a page from live migration's
playbook, the synchronization process is user-space driven, which
means that qemu needs to be woken up at each synchronization
point. That is already the case for qemu-emulated devices, but we
also have in-kernel emulators. To compound the problem, even for
user-space emulated devices accesses to coalesced MMIO areas can
not be detected. As a consequence we need a mechanism to
communicate KVM-handled events to qemu.
Do you mean the ioapic, pic, and lapic? Perhaps its best to start with
those in userspace (-no-kvm-irqchip).
Why is access to those chips considered a synchronization point?
- Virtual machine synchronization: All the dirty pages since the
last synchronization point and the state of the virtual devices is
sent to the fallback node from the user-space qemu process. For this
the existing savevm infrastructure and KVM's dirty page tracking
capabilities can be reused. Regarding in-kernel devices, with the
likely advent of in-kernel virtio backends we need a generic way
to access their state from user-space, for which, again, the kvm_run
share memory area could be used.
I wonder if you can pipeline dirty memory synchronization. That is,
write-protect those pages that are dirty, start copying them to the
other side, and continue execution, copying memory if the guest faults
it again.
How many pages do you copy per synchronization point for reasonably
difficult workloads?
--
error compiling committee.c: too many arguments to function