[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-devel] [RFC PATCH v2 19/23] qcow2: Add error handling to the l2met
From: |
Kevin Wolf |
Subject: |
[Qemu-devel] [RFC PATCH v2 19/23] qcow2: Add error handling to the l2meta coroutine |
Date: |
Wed, 13 Feb 2013 14:22:09 +0100 |
Not exactly bisectable, but one large patch isn't much better either :-(
m->error is used to allow bdrv_drain() to stop with l2meta in error
state rather than go into an endless loop.
Signed-off-by: Kevin Wolf <address@hidden>
---
block/qcow2.c | 44 ++++++++++++++++++++++++++++++++++++++++----
block/qcow2.h | 3 +++
2 files changed, 43 insertions(+), 4 deletions(-)
diff --git a/block/qcow2.c b/block/qcow2.c
index 57552aa..2819336 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -774,11 +774,33 @@ static void coroutine_fn process_l2meta(void *opaque)
m->sleeping = false;
}
+again:
qemu_co_mutex_lock(&s->lock);
ret = qcow2_alloc_cluster_link_l2(bs, m);
if (ret < 0) {
- /* FIXME */
+ /*
+ * This is a nasty situation: We have already completed the allocation
+ * write request and returned success, so just failing it isn't
+ * possible. We need to make sure to return an error during the next
+ * flush.
+ *
+ * However, we still can't drop the l2meta because we want I/O errors
+ * to be recoverable e.g. after the block device has been grown or the
+ * network connection restored. Sleep until the next flush comes and
+ * then retry.
+ */
+ s->flush_error = ret;
+
+ qemu_co_mutex_unlock(&s->lock);
+ qemu_co_rwlock_unlock(&s->l2meta_flush);
+ m->sleeping = true;
+ m->error = true;
+ qemu_coroutine_yield();
+ m->error = false;
+ m->sleeping = false;
+ qemu_co_rwlock_rdlock(&s->l2meta_flush);
+ goto again;
}
qemu_co_mutex_unlock(&s->lock);
@@ -801,11 +823,12 @@ static bool qcow2_drain(BlockDriverState *bs)
{
BDRVQcowState *s = bs->opaque;
QCowL2Meta *m;
+ bool busy = false;
s->in_l2meta_flush = true;
again:
QLIST_FOREACH(m, &s->cluster_allocs, next_in_flight) {
- if (m->sleeping) {
+ if (m->sleeping && !m->error) {
qemu_coroutine_enter(m->co, NULL);
/* next_in_flight link could have become invalid */
goto again;
@@ -813,7 +836,19 @@ again:
}
s->in_l2meta_flush = false;
- return !QLIST_EMPTY(&s->cluster_allocs);
+ /*
+ * If there's still a sleeping l2meta, then an error must have occured.
+ * Don't consider l2metas in this state as busy, they only get active on
+ * flushes.
+ */
+ QLIST_FOREACH(m, &s->cluster_allocs, next_in_flight) {
+ if (!m->sleeping) {
+ busy = true;
+ break;
+ }
+ }
+
+ return busy;
}
static inline coroutine_fn void stop_l2meta(BlockDriverState *bs)
@@ -1683,7 +1718,8 @@ static coroutine_fn int
qcow2_co_flush_to_os(BlockDriverState *bs)
}
}
- ret = 0;
+ ret = s->flush_error;
+ s->flush_error = 0;
fail:
qemu_co_mutex_unlock(&s->lock);
resume_l2meta(bs);
diff --git a/block/qcow2.h b/block/qcow2.h
index 1d7cdab..504f10f 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -171,6 +171,8 @@ typedef struct BDRVQcowState {
CoRwlock l2meta_flush;
bool in_l2meta_flush;
+ int flush_error;
+
uint32_t crypt_method; /* current crypt method, 0 if no key yet */
uint32_t crypt_method_header;
AES_KEY aes_encrypt_key;
@@ -250,6 +252,7 @@ typedef struct QCowL2Meta
* be reentered in order to cancel the timer.
*/
bool sleeping;
+ bool error;
/** Coroutine that handles delayed COW and updates L2 entry */
Coroutine *co;
--
1.7.6.5
- Re: [Qemu-devel] [RFC PATCH v2 13/23] qcow2: handle_copied(): Implement non-zero host_offset, (continued)
[Qemu-devel] [RFC PATCH v2 16/23] qcow2: Reading from areas not in L2 tables yet, Kevin Wolf, 2013/02/13
[Qemu-devel] [RFC PATCH v2 18/23] qcow2: Delay the COW, Kevin Wolf, 2013/02/13
[Qemu-devel] [RFC PATCH v2 19/23] qcow2: Add error handling to the l2meta coroutine,
Kevin Wolf <=
[Qemu-devel] [RFC PATCH v2 14/23] qcow2: Use byte granularity in qcow2_alloc_cluster_offset(), Kevin Wolf, 2013/02/13
[Qemu-devel] [RFC PATCH v2 17/23] qcow2: Move COW and L2 update into own coroutine, Kevin Wolf, 2013/02/13
[Qemu-devel] [RFC PATCH v2 22/23] qcow2: Move cluster gathering to a non-looping loop, Kevin Wolf, 2013/02/13
[Qemu-devel] [RFC PATCH v2 21/23] qemu-iotests: Another concurrent multicluster allocation case, Kevin Wolf, 2013/02/13
[Qemu-devel] [RFC PATCH v2 20/23] qcow2: Cancel COW when overwritten, Kevin Wolf, 2013/02/13