Hello everyone,
I've been thinking at the current design of the fsfreeze feature used
by libvirt.
It currently relays on an userland agent in the guest talking to qemu
with some vmchannel communication. The guest agent would walk the
filesystems in the guest and call fsfreeze ioctl on them.
The fsfreeze is an optional feature, it's not required to do safe
snapshots, after fsfreeze (regardless if available or not) QEMU must
still block all I/O for all qemu blkdevices before the image is saved,
to allow safe snapshotting of non-linux guests. Then if a VM is
restarted in the snapshot it becomes identical to a fault tolerance
fallback with nfs or drdb in a highly available
configuration. Fsfreeze just provides some further (minor) benefit on
top of that (which probably won't be available for non-linux guests
any time soon).
The benefits this optional fsfreeze feature provides to the snapshot
are:
1) more peace of mind by not relaying on the kernel journal reply code
when snapshotting journaled/cow filesystems like ext4/btrfs/xfs
2) all dirty outstanding cache is flushed, which reduces the chances
of running into userland journaling data reply bugs if userland is
restarted on the snapshot
3) allows safe live snapshotting of not jorunaled fs like vfat/ext2 on
linux (not so common, and vfat on non-linux guest won't benefit)
4) allows to mount the snapshotted image readonly without requiring
metadata journal reply
Problem is that having a daemon in guest userland is not my
preference, considering it can be done with a virtio-fsfreeze.ko
kernel module in guest without requiring any userland modification to
the guest (and no interprocess communication through vmchannel
or similar way).
This means a kernel upgrade in the guest that adds the
virtio-fsfreeze.ko virtio paravirt driver would be enough to be able
to provide fsfreeze during snapshots.
A virtio-fsfreeze.ko would certainly be more developer friendly, you
could just build the kernel and even boot it with -kernel bzImage
(after building it with VIRTIO_FSFREEZE=y). Then it'd just work
without any daemon or vmchannel or any other change to the guest
userland.
I could see some advantage in not having to modify qemu if libvirt was
talking directly to the guest agent, so to avoid any knowledge into
qemu about FSFREEZE. But it's not even like that, I see FSFREEZE guest
agent patches floating around. So if qemu has to be modified and be
aware of the fsfreeze feature in the userland guest agent (and not
just asked to block all I/O which doesn't require any guest knowledge
and in turn it'd remain agnostic about fsfreeze) I think it'd be
better if the fsfreeze qemu code would just go into a virtio backend.
There is also an advantage in reliability as there's no more need to
worry about mlocking the memory of the userland guest agent, making
sure no lib is calling any I/O function to be able to defreeze the
filesystems later, making sure the oom killer or a wrong kill -9
$RANDOM isn't killing the agent by mistake while the I/O is blocked
and the copy is going. The guest kernel is a more reliable and natural
place to call fsfreeze through a virtio-fsfreeze guest driver without
having to spend time into worrying about the reliability of the
guest-agent feature. It'd surely also waste less memory in the guest
(not that the agent takes much memory but a few kbytes of .text of a
kernel module for this surely would takes a fraction of the mlocked
RAM the agent would take, the RAM saving is the least interesting
part of course).
If there was no hypervisor behind the kernel, it could only be the
userland starting a fsfreeze, so we shouldn't be fooled into thinking
userland is the best place where to start a fsfreeze invocation, it's
most certainly not, but on the host (without virt) there's no other
thing that could possibly ask for it. But here we have an hypervisor
behind the guest kernel that asks for it, so starting the fsfreeze
through a virtio-fsfreeze.ko kernel module loaded into the guest
kernel (or linked into the guest kernel) sounds a cleaner and more
reliable solution (maybe simpler too).
I'd be certainly a more friendly solution for developers to test or
run it, libvirt would talk only with qemu, and qemu would only talk
with the guest kernel without requiring any modification to the guest
userland. My feeling is that usually what feels much simpler to use
for developers tends to be a better solution (not guaranteed) and to
me a virtio-fsfreeze.ko solution would look much simpler to use.
There are drawbacks, like the fact respinning an update to the
fsfreeze code, would then require an upgrade of the guest kernel,
instead of a package update. But there are avantages too in terms of
coverage, as an updated kernel would also run on top of an older guest
userland that may not have a agent package to install through a
repository.
In any case if the virtio-fsfreeze.ko doesn't register into qemu
virtio-fsfreeze backend, the qemu monitor command should still just
work and allow snapshotting by just only blocking all I/O, that is
more than enough for a not-buggy guest capable of fault tolerance
against power loss.
I understand an agent may be needed for other features but I think
whenever a feature is better suited for not requiring userland guest
support, it shouldn't. To me requiring modifications to the guest
userland, looks the least transparent and most intrusive possible way
to implement a libvirt feature so it should be used when it has
advantages and I see mostly disadvantages here.