qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] disk image: self-organized format or raw file


From: Xingbo Wu
Subject: Re: [Qemu-devel] disk image: self-organized format or raw file
Date: Thu, 14 Aug 2014 16:53:12 -0400

>> >> The main trick of QED was to introduce a dirty flag, which allowed to
>> >> call fdatasync() less often because it was okay for image metadata to
>> >> become inconsistent. After a crash, you have to repair the image then.
>> >>
>> >
>> > I'm very curious about this dirty flag trick. I was surprised when I
>> > observed very fast 'sync write' performance on QED.
>> > If it skips the fdatasync when processing the device 'flush' command from
>> > guest, it literally cheats the guest as the data can be lost. Am I that 
>> > correct?
>> > Does the repairing make sure all the data written before the last
>> > successful 'flush'
>> > can be recovered?
>> > To my understanding, the 'flush' command in guest asks for persistence.
>> > Data has to be persistent on host storage after flush except for the
>> > image opened with 'cache=unsafe' mode.
>> >
>>
>> I have some different ideas. Please correct me if I make any mistake.
>> The trick may not cause true consistency issues. The relaxed write
>> ordering (less fdatasync) seems to be safe.
>> The analysis on this is described in this
>> [http://lists.nongnu.org/archive/html/qemu-devel/2010-09/msg00515.html].
>
> Yes, specifically point 3. Without the dirty flag, you would have to
> ensure that the file size is updated first and then the L2 table entry
> is written. (This would still allow cluster leaks that cannot be
> reclaimed, but at least no data corruption.)
>
>> In my opinion the reason why the ordering is irreverent is that any
>> uninitialized block could exist in a block device.
>> Unordered update l1 and alloc-write l2 are also safe because
>> uninitialized blocks in a file is always zero or beyond the EOF.
>
> Yes. This holds true because QED (unlike qcow2) cannot be used directly
> on block devices. This is a real limitation.
>

I don't know much about the best practices in virtualization. Could
you give me some examples? Thanks.
Do some products provide resizeable (automatically?) Logical Volumes
and put one qcow2 on each LV?
Anyway, does someone use a physical disk to hold only one qcow2 image
for some special usage?

>> Any unsuccessful write of the l1/l2/data would cause the loss of the
>> data. However, at that point the guest must not have returned from its
>> last 'flush' so the guest won't have consistency issue on its data.
>> The repair process (qed-check.c) doesn't recover data, it only does
>> some scanning for processing new requests. the 'check' can be
>> considered as a normal operation of bdrv_open().
>>
>> BTW, filesystems heavily use this kind of 'tricks' to improve performance.
>> The sync write could return as a indication of data being persistently
>> written, while the data may have only been committed to the journal.
>> Scanning and recovering from journal is considered as the normal job
>> of filesystems.
>
> But this is not a journal. It is something like fsck in ext2 times.
>
> I believe qcow2 could be optimised a bit more if we added a journal to
> it, but currently qcow2 performance isn't a problem urgent enough that I
> could easily find the time to implement it. (We've discussed it several
> times in the past.)
>
> Kevin



-- 

Cheers!
       吴兴博  Wu, Xingbo <address@hidden>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]