[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] fixing qemu-0.1X endless loop in qcow2_alloc_cluster_of
From: |
Kevin Wolf |
Subject: |
Re: [Qemu-devel] fixing qemu-0.1X endless loop in qcow2_alloc_cluster_offset |
Date: |
Mon, 14 May 2012 14:20:47 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:11.0) Gecko/20120329 Thunderbird/11.0.1 |
Am 13.05.2012 10:03, schrieb Zhouyi Zhou:
> hi all
>
> sometimes, qemu/kvm-0.1x will hang in endless loop in
> qcow2_alloc_cluster_offset.
> after some investigation, I found that:
> in function posix_aio_process_queue(void *opaque)
> 440 ret = qemu_paio_error(acb);
> 441 if (ret == ECANCELED) {
> 442 /* remove the request */
> 443 *pacb = acb->next;
> 444 qemu_aio_release(acb);
> 445 result = 1;
> 446 } else if (ret != EINPROGRESS) {
> in line 444 acb got released but acb->common.opaque does not.
> which will be released via guest OS via ide_dma_cancel which
> will in term call qcow_aio_cancel which does not check its argument
> is in flight list or not.
> The fix is as follows: (debian 6's qemu-kvm-0.12.5)
> #######################################
> --- block/qcow2.h~ 2010-07-27 08:43:53.000000000 +0800
> +++ block/qcow2.h 2012-05-13 15:51:39.000000000 +0800
> @@ -143,6 +143,7 @@
> QLIST_HEAD(QCowAioDependencies, QCowAIOCB) dependent_requests;
>
> QLIST_ENTRY(QCowL2Meta) next_in_flight;
> + int inflight;
> } QCowL2Meta;
> --- block/qcow2.c~ 2012-05-13 15:57:09.000000000 +0800
> +++ block/qcow2.c 2012-05-13 15:57:24.000000000 +0800
> @@ -349,6 +349,10 @@
> QCowAIOCB *acb = (QCowAIOCB *)blockacb;
> if (acb->hd_aiocb)
> bdrv_aio_cancel(acb->hd_aiocb);
> + if (acb->l2meta.inflight) {
> + QLIST_REMOVE(&acb->l2meta, next_in_flight);
> + acb->l2meta.inflight = 0;
> + }
> qemu_aio_release(acb);
> }
>
> @@ -506,6 +510,7 @@
> acb->n = 0;
> acb->cluster_offset = 0;
> acb->l2meta.nb_clusters = 0;
> + acb->l2meta.inflight = 0;
> QLIST_INIT(&acb->l2meta.dependent_requests);
> return acb;
> }
> @@ -534,6 +539,7 @@
> /* Take the request off the list of running requests */
> if (m->nb_clusters != 0) {
> QLIST_REMOVE(m, next_in_flight);
> + m->inflight = 0;
> }
>
> /*
> @@ -632,6 +638,7 @@
> fail:
> if (acb->l2meta.nb_clusters != 0) {
> QLIST_REMOVE(&acb->l2meta, next_in_flight);
> + acb->l2meta.inflight = 0;
> }
> done:
> if (acb->qiov->niov > 1)
> --- block/qcow2-cluster.c~ 2010-07-27 08:43:53.000000000 +0800
> +++ block/qcow2-cluster.c 2012-05-13 15:53:53.000000000 +0800
> @@ -827,6 +827,7 @@
> m->offset = offset;
> m->n_start = n_start;
> m->nb_clusters = nb_clusters;
> + m->inflight = 1;
>
> out:
> m->nb_available = MIN(nb_clusters << (s->cluster_bits - 9), n_end);
>
> Thanks for investigation
> Zhouyi
The patch looks reasonable to me. Note however that while it fixes the
hang, it still causes cluster leaks. I'm not sure if someone is
interested in picking these up for old stable releases. Andreas, I think
you were going to take 0.15? The first version that doesn't have the
problem is 1.0.
Kevin