[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [RFC V8 01/24] qcow2: Add journal specification.
From: |
Benoît Canet |
Subject: |
Re: [Qemu-devel] [RFC V8 01/24] qcow2: Add journal specification. |
Date: |
Tue, 2 Jul 2013 23:23:56 +0200 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
> > +QCOW2 can use one or more instance of a metadata journal.
>
> s/instance/instances/
>
> Is there a reason to use multiple journals rather than a single journal
> for all entry types? The single journal area avoids seeks.
Here are the main reason for this:
For the deduplication some patterns like cycles of insertion/deletion could
leave the hash table almost empty while filling the journal.
If the journal is full and the hash table is empty a packing operation is
started.
Basically a new journal is created and only the entry presents in the hash table
are reinserted.
This is why I want to keep the deduplication journal appart from regular qcow2
journal: to avoid interferences between a pack operation and regular qcow2
journal entries.
The other thing is that freezing the log store would need a replay of regular
qcow2 entries as it trigger a reset of the journal.
Also since deduplication will not work on spinning disk I discarded the seek
time factor.
Maybe commiting the dedupe journal by erase block sized chunk would be a good
idea to reduce random writes to the SSD.
The additional reason for having multiple journals is that the SILT paper
propose a mode where prefix of the hash is used to dispatch insertions in
multiples store and it easier to do with multiple journals.
>
> > +
> > +A journal is a sequential log of journal entries appended on a previously
> > +allocated and reseted area.
>
> I think you say "previously reset area" instead of "reseted". Another
> option is "initialized area".
>
> > +A journal is designed like a linked list with each entry pointing to the
> > next
> > +so it's easy to iterate over entries.
> > +
> > +A journal uses the following constants to denote the type of each entry
> > +
> > +TYPE_NONE = 0xFF default value of any bytes in a reseted journal
> > +TYPE_END = 1 the entry ends a journal cluster and point to the
> > next
> > + cluster
> > +TYPE_HASH = 2 the entry contains a deduplication hash
> > +
> > +QCOW2 journal entry:
> > +
> > + Byte 0 : Size of the entry: size = 2 + n with size <= 254
>
> This is not clear. I'm wondering if the +2 is included in the byte
> value or not. I'm also wondering what a byte value of zero means and
> what a byte value of 255 means.
I am counting the journal entry header in the size. So yes the +2 is in the byte
value.
A byte value of zero, 1 or 255 is an error.
Maybe this design is bogus and I should only count the payload size in the size
field. It would make less tricky cases.
>
> Please include an example to illustrate how this field works.
>
> > +
> > + 1 : Type of the entry
> > +
> > + 2 - size : The optional n bytes structure carried by entry
> > +
> > +A journal is divided into clusters and no journal entry can be spilled on
> > two
> > +clusters. This avoid having to read more than one cluster to get a single
> > entry.
> > +
> > +For this purpose an entry with the end type is added at the end of a
> > journal
> > +cluster before starting to write in the next cluster.
> > +The size of such an entry is set so the entry points to the next cluster.
> > +
> > +As any journal cluster must be ended with an end entry the size of regular
> > +journal entries is limited to 254 bytes in order to always left room for
> > an end
> > +entry which mimimal size is two bytes.
> > +
> > +The only cases where size > 254 are none entries where size = 255.
> > +
> > +The replay of a journal stop when the first end none entry is reached.
>
> s/stop/stops/
>
> > +The journal cluster size is 4096 bytes.
>
> Questions about this layout:
>
> 1. Journal entries have no integrity mechanism, which is especially
> important if they span physical sectors where cheap disks may perform
> a partial write. This would leave a corrupt journal. If the last
> bytes are a checksum then you can get some confidence that the entry
> was fully written and is valid.
I will add a checksum mecanism.
Do you have any preferences regarding the checksum function ?
>
> Did I miss something?
>
> 2. Byte-granularity means that read-modify-write is necessary to append
> entries to the journal. Therefore a failure could destroy previously
> committed entries.
It's designed to be committed by 4KB blocks.
>
> Any ideas how existing journals handle this?
>
- Re: [Qemu-devel] [RFC V8 01/24] qcow2: Add journal specification., Stefan Hajnoczi, 2013/07/02
- Re: [Qemu-devel] [RFC V8 01/24] qcow2: Add journal specification., Kevin Wolf, 2013/07/02
- Re: [Qemu-devel] [RFC V8 01/24] qcow2: Add journal specification.,
Benoît Canet <=
- Re: [Qemu-devel] [RFC V8 01/24] qcow2: Add journal specification., Stefan Hajnoczi, 2013/07/03
- Re: [Qemu-devel] [RFC V8 01/24] qcow2: Add journal specification., Kevin Wolf, 2013/07/03
- Re: [Qemu-devel] [RFC V8 01/24] qcow2: Add journal specification., Stefan Hajnoczi, 2013/07/03
- Re: [Qemu-devel] [RFC V8 01/24] qcow2: Add journal specification., Benoît Canet, 2013/07/03
- Re: [Qemu-devel] [RFC V8 01/24] qcow2: Add journal specification., Stefan Hajnoczi, 2013/07/04
- Re: [Qemu-devel] [RFC V8 01/24] qcow2: Add journal specification., Benoît Canet, 2013/07/04
- Re: [Qemu-devel] [RFC V8 01/24] qcow2: Add journal specification., Benoît Canet, 2013/07/16
- Re: [Qemu-devel] [RFC V8 01/24] qcow2: Add journal specification., Kevin Wolf, 2013/07/17