qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC] qcow2 journalling draft


From: Kevin Wolf
Subject: Re: [Qemu-devel] [RFC] qcow2 journalling draft
Date: Thu, 5 Sep 2013 17:20:02 +0200
User-agent: Mutt/1.5.21 (2010-09-15)

Am 05.09.2013 um 16:55 hat Stefan Hajnoczi geschrieben:
> On Thu, Sep 5, 2013 at 1:18 PM, Kevin Wolf <address@hidden> wrote:
> > Am 05.09.2013 um 11:21 hat Stefan Hajnoczi geschrieben:
> >> On Wed, Sep 04, 2013 at 11:39:51AM +0200, Kevin Wolf wrote:
> >> > > > +A journal is organised in journal blocks, all of which have a 
> >> > > > reference count
> >> > > > +of exactly 1. It starts with a block containing the following 
> >> > > > journal header:
> >> > > > +
> >> > > > +    Byte  0 -  7:   Magic ("qjournal" ASCII string)
> >> > > > +
> >> > > > +          8 - 11:   Journal size in bytes, including the header
> >> > > > +
> >> > > > +         12 - 15:   Journal block size order (block size in bytes = 
> >> > > > 1 << order)
> >> > > > +                    The block size must be at least 512 bytes and 
> >> > > > must not
> >> > > > +                    exceed the cluster size.
> >> > > > +
> >> > > > +         16 - 19:   Journal block index of the descriptor for the 
> >> > > > last
> >> > > > +                    transaction that has been synced, starting with 
> >> > > > 1 for the
> >> > > > +                    journal block after the header. 0 is used for 
> >> > > > empty
> >> > > > +                    journals.
> >> > > > +
> >> > > > +         20 - 23:   Sequence number of the last transaction that 
> >> > > > has been
> >> > > > +                    synced. 0 is recommended as the initial value.
> >> > > > +
> >> > > > +         24 - 27:   Sequence number of the last transaction that 
> >> > > > has been
> >> > > > +                    committed. When replaying a journal, all 
> >> > > > transactions
> >> > > > +                    after the last synced one up to the last commit 
> >> > > > one must be
> >> > > > +                    synced. Note that this may include a wraparound 
> >> > > > of sequence
> >> > > > +                    numbers.
> >> > > > +
> >> > > > +         28 -  31:  Checksum (one's complement of the sum of all 
> >> > > > bytes in the
> >> > > > +                    header journal block except those of the 
> >> > > > checksum field)
> >> > > > +
> >> > > > +         32 - 511:  Reserved (set to 0)
> >> > >
> >> > > I'm not sure if these fields are necessary.  They require updates (and
> >> > > maybe flush) after every commit and sync.
> >> > >
> >> > > The fewer metadata updates, the better, not just for performance but
> >> > > also to reduce the risk of data loss.  If any metadata required to
> >> > > access the journal is corrupted, the image will be unavailable.
> >> > >
> >> > > It should be possible to determine this information by scanning the
> >> > > journal transactions.
> >> >
> >> > This is rather handwavy. Can you elaborate how this would work in detail?
> >> >
> >> >
> >> > For example, let's assume we get to read this journal (a journal can be
> >> > rather large, too, so I'm not sure if we want to read it in completely):
> >> >
> >> >  - Descriptor, seq 42, 2 data blocks
> >> >  - Data block
> >> >  - Data block
> >> >  - Data block starting with "qjbk"
> >> >  - Data block
> >> >  - Descriptor, seq 7, 0 data blocks
> >> >  - Descriptor, seq 8, 1 data block
> >> >  - Data block
> >> >
> >> > Which of these have already been synced? Which have been committed?
> >
> > So what's your algorithm for this?
> 
> Scan the journal to find unsynced transactions, if they exist:
> 
> last_sync_seq = 0
> last_seqno = 0
> while True:
>     block = journal[(i++) % journal_nblocks]
>     if i >= journal_nblocks * 2:
>         break # avoid infinite loop
>     if block.magic != 'qjbk':
>         continue

Important implication: This doesn't allow data blocks starting with
'qjbk'. Otherwise you're not even guaranteed to find a descriptor block
to start your seach with.

The second time you make this assumption is when there are stale data
blocks in the unused area between the head and tail of the journal.

>     if block.seqno < last_seqno:
>         # Wrapped around to oldest transaction
>         break

Why can you stop here? There might be transactions in the second half of
the journal that aren't synced yet.

>     elif block.seqno == seqno:
>         # Corrupt journal, sequence number should be
>         # monotonically increasing
>         raise InvalidJournalException
>     if block.last_sync_seq != last_sync_seq:
>         last_sync_seq = block.last_sync_seq

The 'if' doesn't add anything here, so you end up using the
last_sync_seq field of the last valid descriptor.

>     last_seqno = block.seqno
> 
> print 'First unsynced block seq no:', last_sync_seq
> print 'Last block seq no:', last_seqno
> 
> This is broken pseudocode, but hopefully the idea makes sense.

One additional thought that might make the thing a bit more interesting:
Sequence numbers can wrap around as well.

Kevin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]