[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [question] savevm/delvm: Is it necessary to perform bdr
From: |
Kevin Wolf |
Subject: |
Re: [Qemu-devel] [question] savevm/delvm: Is it necessary to perform bdrv_drain_all before savevm and delvm? |
Date: |
Tue, 21 Oct 2014 09:19:49 +0200 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
Am 21.10.2014 um 03:13 hat Zhang Haoyu geschrieben:
> >> >> Hi,
> >> >>
> >> >> I noticed that bdrv_drain_all is performed in load_vmstate before
> >> bdrv_snapshot_goto,
> >> >> and bdrv_drain_all is performed in qmp_transaction before
> >> internal_snapshot_prepare,
> >> >> so is it also neccesary to perform bdrv_drain_all in savevm and delvm?
> >> >
> >> >Definitely yes for savevm. do_savevm() calls it indirectly via
> >> >vm_stop(), so that part looks okay.
> >> >
> >> Yes, you are right.
> >>
> >> >delvm doesn't affect the currently running VM, and therefore doesn't
> >> >interfere with guest requests that are in flight. So I think that a
> >> >bdrv_drain_all() isn't needed there.
> >> >
> >> I'm worried about that there are still pending IOs while deleting snapshot,
> >> then is it possible that there is concurrency problem between the
> >> process of deleting snapshot
> >> and the coroutine of io read/write(bdrv_co_do_rw) invoked by the
> >> pending IOs?
> >> This coroutine is also in main thread.
> >> Am I missing something?
> >
> >What kind of concurrency problem are you thinking of?
> >
> I have encountered two problem,
> 1) double-free of Qcow2DiscardRegion in qcow2_process_discards
> please see the discussing mail: [PATCH] qcow2: fix double-free of
> Qcow2DiscardRegion in qcow2_process_discards
> 2) qcow2 image is truncated to very huge size, but the size shown by qemu-img
> info is normal
> please see the discussing mail:
> [question] is it possible that big-endian l1 table offset referenced by
> other I/O while updating l1 table offset in qcow2_update_snapshot_refcount?
Did you verify that the invalid value actually makes sense if
byteswapped? For example, that there is no reserved bit set then?
> I suspect that both of the two problems are concurrency problem mentioned
> above,
> please see the discussing mail.
>
>
> >I do see that there might be a chance of concurrency, but that doesn't
> >automatically mean the requests are conflicting.
> >
> >Would you feel better with taking s->lock in qcow2_snapshot_delete()?
> Both deleting snapshot and the coroutine of pending io
> read/write(bdrv_co_do_rw)
> are performed in main thread, could BDRVQcowState.lock work?
Yes. s->lock is not a mutex for threads, but a coroutine based one.
The problem, however, is that qcow2_snapshot_delete() isn't execute in a
coroutine, so we can't take s->lock here. We would either need to move
it into a coroutine or add a bdrv_drain_all() indeed.
This also means that we probably need to review all other cases where
non-coroutine callbacks from BlockDriver might interfere with running
requests. The original assumption that they are safe as long as they are
not running in a coroutine seems to be wrong.
Kevin