Re: [Qemu-devel] [PATCH v4] block/vdi: Use bdrv

On Sat, May 9, 2015 at 11:54 AM phoeagon <address@hidden> wrote:

Thanks. Dbench does not logically allocate new disk space all the time, because it's a FS level benchmark that creates file and deletes them. Therefore it also depends on the guest FS, say, a btrfs guest FS allocates about 1.8x space of that from EXT4, due to its COW nature. It does cause the FS to allocate some space during about 1/3 of the test duration I think. But this does not mitigate it too much because a FS often writes in a stride rather than consecutively, which causes write amplification at allocation times.

So I tested it with qemu-img convert from a 400M raw file:
zheq-PC sdb # time ~/qemu-sync-test/bin/qemu-img convert -f raw -t unsafe -O vdi /run/shm/rand 1.vdi

real 0m0.402s
user 0m0.206s
sys 0m0.202s
zheq-PC sdb # time ~/qemu-sync-test/bin/qemu-img convert -f raw -t writeback -O vdi /run/shm/rand 1.vdi

real 0m8.678s
user 0m0.169s
sys 0m0.500s
zheq-PC sdb # time qemu-img convert -f raw -t writeback -O vdi /run/shm/rand 1.vdi

real 0m4.320s
user 0m0.148s
sys 0m0.471s
zheq-PC sdb # time qemu-img convert -f raw -t unsafe -O vdi /run/shm/rand 1.vdi
real 0m0.489s
user 0m0.173s
sys 0m0.325s

zheq-PC sdb # time qemu-img convert -f raw -O vdi /run/shm/rand 1.vdi

real 0m0.515s
user 0m0.168s
sys 0m0.357s
zheq-PC sdb # time ~/qemu-sync-test/bin/qemu-img convert -f raw -O vdi /run/shm/rand 1.vdi

real 0m0.431s
user 0m0.192s
sys 0m0.248s

Although 400M is not a giant file, it does show the trend.
As you can see when there's drastic allocation needs, and when there no extra buffering from a virtualized host, the throughput drops about 50%. But still it has no effect on "unsafe" mode, as predicted. Also I believe that expecting to use a half-converted image is seldom the use case, while host crash and power loss are not so unimaginable.
Looks like qemu-img convert is using "unsafe" as default as well, so even novice "qemu-img convert" users are not likely to find performance degradation.

I have not yet tried guest OS installation on top, but I guess a new flag for one-time faster OS installation is not likely useful, and "cache=unsafe" already does the trick.

On Sat, May 9, 2015 at 5:26 AM Stefan Weil <address@hidden> wrote:
Am 08.05.2015 um 15:55 schrieb Kevin Wolf:
> Am 08.05.2015 um 15:14 hat Max Reitz geschrieben:
>> On 07.05.2015 17:16, Zhe Qiu wrote:
>>> In reference to b0ad5a45...078a458e, metadata writes to
>>> qcow2/cow/qcow/vpc/vmdk are all synced prior to succeeding writes.
>>>
>>> Only when write is successful that bdrv_flush is called.
>>>
>>> Signed-off-by: Zhe Qiu <address@hidden>
>>> ---
>>> block/vdi.c | 3 +++
>>> 1 file changed, 3 insertions(+)
>> I missed Kevin's arguments before, but I think that adding this is
>> more correct than not having it; and when thinking about speed, this
>> is vdi, a format supported for compatibility.
> If you use it only as a convert target, you probably care more about
> speed than about leaks in case of a host crash.
>
>> So if we wanted to optimize it, we'd probably have to cache multiple
>> allocations, do them at once and then flush afterwards (like the
>> metadata cache we have in qcow2?)
> That would defeat the purpose of this patch which aims at having
> metadata and data written out almost at the same time. On the other
> hand, fully avoiding the problem instead of just making the window
> smaller would require a journal, which VDI just doesn't have.
>
> I'm not convinced of this patch, but I'll defer to Stefan Weil as the
> VDI maintainer.
>
> Kevin

Thanks for asking. I share your concerns regarding reduced performance
caused by bdrv_flush. Conversions to VDI will take longer (how much?),
and also installation of an OS on a new VDI disk image will be slower
because that are the typical scenarios where the disk usage grows.

@phoeagon: Did the benchmark which you used allocate additional disk
storage? If not or if it only allocated once and then spent some time
on already allocated blocks, that benchmark was not valid for this case.

On the other hand I don't see a need for the flushing because the kind
of failures (power failure) and their consequences seem to be acceptable
for typical VDI usage, namely either image conversion or tests with
existing images.

That's why I'd prefer not to use bdrv_flush here. Could we make
bdrv_flush optional (either generally or for cases like this one) so
both people who prefer speed and people who would want
bdrv_flush to decrease the likelihood of inconsistencies can be
satisfied?

Stefan

From:	phoeagon
Subject:	Re: [Qemu-devel] [PATCH v4] block/vdi: Use bdrv_flush after metadata updates
Date:	Sat, 09 May 2015 03:59:39 +0000

Re: [Qemu-devel] [PATCH v4] block/vdi: Use bdrv_flush after metadata upd