- ret = bdrv_co_pwrite_zeroes(s->data_file, m->alloc_offset,
- m->nb_clusters * s->cluster_size,
+ ret = bdrv_co_pwrite_zeroes(s->data_file, start, len,
BDRV_REQ_NO_FALLBACK);
Good point. If we weren't using BDRV_REQ_NO_FALLBACK, then avoiding a
pre-zero pass over the middle is essential. But since we are insisting that
the pre-zero pass be fast or else immediately fail, the time spent in
pre-zeroing should not be a concern. Do you have benchmark numbers stating
otherwise?
I stumbled across this behaviour (write_zeros for 2 MB, then overwrite
almost everything) in the context of a different bug, and it just didn't
make much sense to me. Is there really a file system where fragmentation
is introduced by not zeroing the area first and then overwriting it?
I'm not insisting on making this change because the behaviour is
harmless if odd, but if we think that writing twice to some blocks is an
optimisation, maybe we should actually measure and document this.
Anyway, let's talk about the reported bug that made me look at the
strace that showed this behaviour because I feel it supports my last
point. It's a bit messy, but anyway:
https://bugzilla.redhat.com/show_bug.cgi?id=1666864
So initially, bad performance on a fragmented image file was reported.
Not much to do there, but then in comment 16, QA reported a performance
regression in this case between 4.0 and 4.2. And this change caused by
c8bb23cbdbe, i.e. the commit that introduced handle_alloc_space().
Turns out that BDRV_REQ_NO_FALLBACK doesn't always guarantee that it's
_really_ fast. fallocate(FALLOC_FL_ZERO_RANGE) causes some kind of flush
on XFS and buffered writes don't. So with the old code, qemu-img convert
to a file on a very full filesystem that will cause fragmentation, was
much faster with writing a zero buffer than with write_zeroes (because
it didn't flush the result).