|
From: | phoeagon |
Subject: | Re: [Qemu-devel] [PATCH v4] block/vdi: Use bdrv_flush after metadata updates |
Date: | Sat, 09 May 2015 03:59:39 +0000 |
Thanks. Dbench does not logically allocate new disk space all the time, because it's a FS level benchmark that creates file and deletes them. Therefore it also depends on the guest FS, say, a btrfs guest FS allocates about 1.8x space of that from EXT4, due to its COW nature. It does cause the FS to allocate some space during about 1/3 of the test duration I think. But this does not mitigate it too much because a FS often writes in a stride rather than consecutively, which causes write amplification at allocation times.So I tested it with qemu-img convert from a 400M raw file:zheq-PC sdb # time ~/qemu-sync-test/bin/qemu-img convert -f raw -t unsafe -O vdi /run/shm/rand 1.vdireal 0m0.402suser 0m0.206ssys 0m0.202szheq-PC sdb # time ~/qemu-sync-test/bin/qemu-img convert -f raw -t writeback -O vdi /run/shm/rand 1.vdireal 0m8.678suser 0m0.169ssys 0m0.500szheq-PC sdb # time qemu-img convert -f raw -t writeback -O vdi /run/shm/rand 1.vdireal 0m4.320suser 0m0.148ssys 0m0.471szheq-PC sdb # time qemu-img convert -f raw -t unsafe -O vdi /run/shm/rand 1.vdireal 0m0.489suser 0m0.173ssys 0m0.325szheq-PC sdb # time qemu-img convert -f raw -O vdi /run/shm/rand 1.vdireal 0m0.515suser 0m0.168ssys 0m0.357szheq-PC sdb # time ~/qemu-sync-test/bin/qemu-img convert -f raw -O vdi /run/shm/rand 1.vdireal 0m0.431suser 0m0.192ssys 0m0.248sAlthough 400M is not a giant file, it does show the trend.As you can see when there's drastic allocation needs, and when there no extra buffering from a virtualized host, the throughput drops about 50%. But still it has no effect on "unsafe" mode, as predicted. Also I believe that expecting to use a half-converted image is seldom the use case, while host crash and power loss are not so unimaginable.Looks like qemu-img convert is using "unsafe" as default as well, so even novice "qemu-img convert" users are not likely to find performance degradation.I have not yet tried guest OS installation on top, but I guess a new flag for one-time faster OS installation is not likely useful, and "cache=unsafe" already does the trick.On Sat, May 9, 2015 at 5:26 AM Stefan Weil <address@hidden> wrote:Am 08.05.2015 um 15:55 schrieb Kevin Wolf:
> Am 08.05.2015 um 15:14 hat Max Reitz geschrieben:
>> On 07.05.2015 17:16, Zhe Qiu wrote:
>>> In reference to b0ad5a45...078a458e, metadata writes to
>>> qcow2/cow/qcow/vpc/vmdk are all synced prior to succeeding writes.
>>>
>>> Only when write is successful that bdrv_flush is called.
>>>
>>> Signed-off-by: Zhe Qiu <address@hidden>
>>> ---
>>> block/vdi.c | 3 +++
>>> 1 file changed, 3 insertions(+)
>> I missed Kevin's arguments before, but I think that adding this is
>> more correct than not having it; and when thinking about speed, this
>> is vdi, a format supported for compatibility.
> If you use it only as a convert target, you probably care more about
> speed than about leaks in case of a host crash.
>
>> So if we wanted to optimize it, we'd probably have to cache multiple
>> allocations, do them at once and then flush afterwards (like the
>> metadata cache we have in qcow2?)
> That would defeat the purpose of this patch which aims at having
> metadata and data written out almost at the same time. On the other
> hand, fully avoiding the problem instead of just making the window
> smaller would require a journal, which VDI just doesn't have.
>
> I'm not convinced of this patch, but I'll defer to Stefan Weil as the
> VDI maintainer.
>
> Kevin
Thanks for asking. I share your concerns regarding reduced performance
caused by bdrv_flush. Conversions to VDI will take longer (how much?),
and also installation of an OS on a new VDI disk image will be slower
because that are the typical scenarios where the disk usage grows.
@phoeagon: Did the benchmark which you used allocate additional disk
storage? If not or if it only allocated once and then spent some time
on already allocated blocks, that benchmark was not valid for this case.
On the other hand I don't see a need for the flushing because the kind
of failures (power failure) and their consequences seem to be acceptable
for typical VDI usage, namely either image conversion or tests with
existing images.
That's why I'd prefer not to use bdrv_flush here. Could we make
bdrv_flush optional (either generally or for cases like this one) so
both people who prefer speed and people who would want
bdrv_flush to decrease the likelihood of inconsistencies can be
satisfied?
Stefan
[Prev in Thread] | Current Thread | [Next in Thread] |