[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] migration: qemu-coroutine-lock.c:141: qemu_co_mutex_unl
From: |
Paolo Bonzini |
Subject: |
Re: [Qemu-devel] migration: qemu-coroutine-lock.c:141: qemu_co_mutex_unlock: Assertion `mutex->locked == 1' failed |
Date: |
Wed, 17 Sep 2014 17:53:49 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.0 |
Il 17/09/2014 17:04, Stefan Hajnoczi ha scritto:
> On Wed, Sep 17, 2014 at 10:25 AM, Paolo Bonzini <address@hidden> wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> Il 17/09/2014 11:06, Stefan Hajnoczi ha scritto:
>>> I think the fundamental problem here is that the mirror block job
>>> on the source host does not synchronize with live migration.
>>>
>>> Remember the mirror block job iterates on the dirty bitmap
>>> whenever it feels like.
>>>
>>> There is no guarantee that the mirror block job has quiesced before
>>> migration handover takes place, right?
>>
>> Libvirt does that. Migration is started only once storage mirroring
>> is out of the bulk phase, and the handover looks like:
>>
>> 1) migration completes
>>
>> 2) because the source VM is stopped, the disk has quiesced on the source
>
> But the mirror block job might still be writing out dirty blocks.
Right, but it quiesces after (3).
>> 3) libvirt sends block-job-complete
>
> No, it sends block-job-cancel after the source QEMU's migration has
> completed. See the qemuMigrationCancelDriveMirror() call in
> src/qemu/qemu_migration.c:qemuMigrationRun().
No problem, block-job-cancel and block-job-complete are the same except
for pivoting to the destination.
>> 4) libvirt receives BLOCK_JOB_COMPLETED. The disk has now quiesced on
>> the destination as well.
>
> I don't see where this happens in the libvirt source code. Libvirt
> doesn't care about block job events for drive-mirror during migration.
>
> And that's why there could still be I/O going on (since
> block-job-cancel is asynchronous).
Oops, this would be a bug! block-job-complete and block-job-cancel are
asynchronous. CCing Michal Privoznik who wrote the libvirt code.
Paolo
>> 5) the VM is started on the destination
>>
>> 6) the NBD server is stopped on the destination and the source VM is quit.
>>
>> It is actually a feature that storage migration is completed
>> asynchronously with respect to RAM migration. The problem is that
>> qcow2_invalidate_cache happens between (3) and (5), and it doesn't
>> like the concurrent I/O received by the NBD server.
>
> I agree that qcow2_invalidate_cache() (and any other invalidate cache
> implementations) need to allow concurrent I/O requests.
>
> Either I'm misreading the libvirt code or libvirt is not actually
> ensuring that the block job on the source has cancelled/completed
> before the guest is resumed on the destination. So I think there is
> still a bug, maybe Eric can verify this?
>
> Stefan
>
- Re: [Qemu-devel] migration: qemu-coroutine-lock.c:141: qemu_co_mutex_unlock: Assertion `mutex->locked == 1' failed, (continued)
- Re: [Qemu-devel] migration: qemu-coroutine-lock.c:141: qemu_co_mutex_unlock: Assertion `mutex->locked == 1' failed, Alexey Kardashevskiy, 2014/09/16
- Re: [Qemu-devel] migration: qemu-coroutine-lock.c:141: qemu_co_mutex_unlock: Assertion `mutex->locked == 1' failed, Stefan Hajnoczi, 2014/09/17
- Re: [Qemu-devel] migration: qemu-coroutine-lock.c:141: qemu_co_mutex_unlock: Assertion `mutex->locked == 1' failed, Paolo Bonzini, 2014/09/17
- Re: [Qemu-devel] migration: qemu-coroutine-lock.c:141: qemu_co_mutex_unlock: Assertion `mutex->locked == 1' failed, Alexey Kardashevskiy, 2014/09/17
- Re: [Qemu-devel] migration: qemu-coroutine-lock.c:141: qemu_co_mutex_unlock: Assertion `mutex->locked == 1' failed, Stefan Hajnoczi, 2014/09/17
- Re: [Qemu-devel] migration: qemu-coroutine-lock.c:141: qemu_co_mutex_unlock: Assertion `mutex->locked == 1' failed, Alexey Kardashevskiy, 2014/09/17
- Re: [Qemu-devel] migration: qemu-coroutine-lock.c:141: qemu_co_mutex_unlock: Assertion `mutex->locked == 1' failed, Paolo Bonzini, 2014/09/18
- Re: [Qemu-devel] migration: qemu-coroutine-lock.c:141: qemu_co_mutex_unlock: Assertion `mutex->locked == 1' failed, Alexey Kardashevskiy, 2014/09/19
- Re: [Qemu-devel] migration: qemu-coroutine-lock.c:141: qemu_co_mutex_unlock: Assertion `mutex->locked == 1' failed, Stefan Hajnoczi, 2014/09/17
- Re: [Qemu-devel] migration: qemu-coroutine-lock.c:141: qemu_co_mutex_unlock: Assertion `mutex->locked == 1' failed, Eric Blake, 2014/09/17
- Re: [Qemu-devel] migration: qemu-coroutine-lock.c:141: qemu_co_mutex_unlock: Assertion `mutex->locked == 1' failed,
Paolo Bonzini <=