[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC] qcow2 journalling draft

From: Kevin Wolf
Subject: Re: [Qemu-devel] [RFC] qcow2 journalling draft
Date: Thu, 5 Sep 2013 13:50:28 +0200
User-agent: Mutt/1.5.21 (2010-09-15)

Am 05.09.2013 um 11:35 hat Stefan Hajnoczi geschrieben:
> Although we are still discussing details of the on-disk layout, the
> general design is clear enough to discuss how the journal will be used.
> Today qcow2 uses Qcow2Cache to do lazy, ordered metadata updates.  The
> performance is pretty good with two exceptions that I can think of:
> 1. The delayed CoW problem that Kevin has been working on.  Guests
>    perform sequential writes that are smaller than a qcow2 cluster.  The
>    first write triggers a copy-on-write of the full cluster.  Later
>    writes then overwrite the copied data.  It would be more efficient to
>    anticipate sequential writes and hold off on CoW where possible.

To be clear, "more efficient" can mean a plus of 50% and more. COW
overhead is the only major overhead compared to raw when looking at
normal cluster allocations. So this is something that is really
important for cluster allocation performance.

The patches that I posted a while ago showed that it's possible to do
this without a journal, however the flush operation became very complex
(which we all found rather scary) and required that the COW be completed
before signalling flush completion.

With a journal, the only thing that you need to do on a flush is to
commit all transactions, i.e. write them out and bdrv_flush(bs->file).
The actualy data copy of the COW (i.e. the sync) can be further delayed
and doesn't have to happen at commit type as it would have without a

> 2. Lazy metadata updates lead to bursty behavior and expensive flushes.
>    We do not take advantage of disk bandwidth since metadata updates
>    stay in the Qcow2Cache until the last possible second.  When the
>    guest issues a flush we must write out dirty Qcow2Cache entries and
>    possibly fsync between them if dependencies have been set (e.g.
>    refcount before L2).

Hm, have we ever measured the impact of this?

I don't think a journal can make a fundamental difference here - either
you write only at the last possible second (today flush, with a journal
commit), or you write out more data than strictly necessary.

> How will the journal change this situation?  Writes that go through the
> journal are doubled - they must first be journalled, fsync, and then
> they can be applied to the actual image.
> How do we benefit by using the journal?

I believe Delayed COW is a pretty strong one. But there are more cases
in which performance isn't that great.

I think you refer to the simple case with a normal empty image where new
clusters are allocated, which is pretty good indeed if we ignore COW.
Trouble starts when you also free clusters, which happens for example
with internal COW (internal snapshots, compressed images) or discard.
Deduplication as well in the future, I suppose.

Then you get very quickly alternating sequences of "L2 depends on
refcount update" (for allocation) and "refcount update depends on L2
update" (for freeing), which means that Qcow2Cache starts flushing all
the time without accumulating many requests. These are cases that would
benefit as well from the atomicity of journal transactions.

And then, of course, we still leak clusters on failed operations. With a
journal, this wouldn't happen any more and the image would always stay
consistent (instead of only corruption-free).


reply via email to

[Prev in Thread] Current Thread [Next in Thread]