[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [RFC PATCH v2 19/23] qcow2: Add error handling to the l
From: |
Kevin Wolf |
Subject: |
Re: [Qemu-devel] [RFC PATCH v2 19/23] qcow2: Add error handling to the l2meta coroutine |
Date: |
Thu, 21 Feb 2013 11:17:51 +0100 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
On Thu, Feb 21, 2013 at 10:35:42AM +0100, Kevin Wolf wrote:
> On Mon, Feb 18, 2013 at 04:42:55PM +0100, Stefan Hajnoczi wrote:
> > On Wed, Feb 13, 2013 at 02:22:09PM +0100, Kevin Wolf wrote:
> > > diff --git a/block/qcow2.c b/block/qcow2.c
> > > index 57552aa..2819336 100644
> > > --- a/block/qcow2.c
> > > +++ b/block/qcow2.c
> > > @@ -774,11 +774,33 @@ static void coroutine_fn process_l2meta(void
> > > *opaque)
> > > m->sleeping = false;
> > > }
> > >
> > > +again:
> > > qemu_co_mutex_lock(&s->lock);
> > >
> > > ret = qcow2_alloc_cluster_link_l2(bs, m);
> > > if (ret < 0) {
> > > - /* FIXME */
> > > + /*
> > > + * This is a nasty situation: We have already completed the
> > > allocation
> > > + * write request and returned success, so just failing it isn't
> > > + * possible. We need to make sure to return an error during the
> > > next
> > > + * flush.
> > > + *
> > > + * However, we still can't drop the l2meta because we want I/O
> > > errors
> > > + * to be recoverable e.g. after the block device has been grown
> > > or the
> > > + * network connection restored. Sleep until the next flush comes
> > > and
> > > + * then retry.
> > > + */
> >
> > A failed flush is live migrated by hw/virtio-blk.c but what happens when
> > we fail during drain?
>
> That's a very good questions. Looks like things become rather hairy...
> This would be a case where we really need a VMState for block drivers
> (which is in fact how the whole rerror/werror handling would have been
> implemented best).
>
> Juan, any chance to introduce such a thing without breaking everything?
> Is there something like optional top-level sections?
In fact, don't we have the same problem today, when flushing the image
on the source fails during completion of the migration? With this series
it just becomes much more likely to happen in practice.
Kevin
- Re: [Qemu-devel] [RFC PATCH v2 13/23] qcow2: handle_copied(): Implement non-zero host_offset, (continued)
[Qemu-devel] [RFC PATCH v2 16/23] qcow2: Reading from areas not in L2 tables yet, Kevin Wolf, 2013/02/13
[Qemu-devel] [RFC PATCH v2 18/23] qcow2: Delay the COW, Kevin Wolf, 2013/02/13
[Qemu-devel] [RFC PATCH v2 19/23] qcow2: Add error handling to the l2meta coroutine, Kevin Wolf, 2013/02/13
[Qemu-devel] [RFC PATCH v2 14/23] qcow2: Use byte granularity in qcow2_alloc_cluster_offset(), Kevin Wolf, 2013/02/13
[Qemu-devel] [RFC PATCH v2 17/23] qcow2: Move COW and L2 update into own coroutine, Kevin Wolf, 2013/02/13
[Qemu-devel] [RFC PATCH v2 22/23] qcow2: Move cluster gathering to a non-looping loop, Kevin Wolf, 2013/02/13
[Qemu-devel] [RFC PATCH v2 21/23] qemu-iotests: Another concurrent multicluster allocation case, Kevin Wolf, 2013/02/13
[Qemu-devel] [RFC PATCH v2 20/23] qcow2: Cancel COW when overwritten, Kevin Wolf, 2013/02/13