We observe that in Fedora 29 the qemu-img, before imaging the disk, it fully
zeroes it. Taking into account the disk size, the whole process now takes 35
minutes instead of 50 seconds. This causes the ironic-python-agent operation to
time-out. The Fedora 27 qemu-img doesn't do that.
Known issue; Nir and Rich have posted a previous thread on the topic, and
the conclusion is that we need to make qemu-img smarter about NOT requesting
pre-zeroing of devices where that is more expensive than just zeroing as we
go.
https://lists.gnu.org/archive/html/qemu-devel/2018-11/msg01182.html
Yes, we should be careful to avoid the fallback in this case.
However, how could this ever go from 50 seconds for writing the whole
image to 35 minutes?! Even if you end up writing the whole image twice
because you write zeros first and then overwrite them everywhere with
data, shouldn't the maximum be doubling the time, i.e. 100 seconds?
Why is the write_zeroes fallback _that_ slow? It will also hit guests
that request write_zeroes, so I feel this is worth investigating a bit
more nevertheless.
Can you check with strace which operation actually succeeds writing
zeros to /dev/sda? The first thing we try is fallocate with
FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE. This should always be fast,
so I suppose this fails in your case. The next thing is BLKZEROOUT,
which I think can do a fallback in the kernel. Does this return success?
Otherwise we have another fallback mechanism inside of QEMU, which would
use normal pwrite calls with a zeroed buffer.