[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH] migration: avoid starting a new migration task
From: |
Paolo Bonzini |
Subject: |
Re: [Qemu-devel] [PATCH] migration: avoid starting a new migration task while the previous one still exist |
Date: |
Tue, 05 Nov 2013 10:15:45 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130923 Thunderbird/17.0.9 |
Il 05/11/2013 03:23, Zhanghaoyu (A) ha scritto:
>>> Avoid starting a new migration task while the previous one still
>> exist.
>>
>> Can you explain how to reproduce the problem?
>>
> When network disconnection between source and destination happened, the
> migration thread stuck at below stack,
> #0 0x00007f07e96c8288 in writev () from /lib64/libc.so.6
> #1 0x00007f07eb9bf11d in unix_writev_buffer (opaque=0x7f07eca2de80,
> iov=0x7f07ede9b1e0, iovcnt=64,
> pos=259870577) at /mnt/sdb2/zjl/bsod0x101_0821/qemu-kvm-1.5.1/savevm.c:354
> #2 0x00007f07eb9bf999 in qemu_fflush (f=0x7f07ede931b0)
> at /mnt/sdb2/zjl/bsod0x101_0821/qemu-kvm-1.5.1/savevm.c:600
> #3 0x00007f07eb9c011f in add_to_iovec (f=0x7f07ede931b0, buf=0x7f000ee23000
> "", size=4096)
> at /mnt/sdb2/zjl/bsod0x101_0821/qemu-kvm-1.5.1/savevm.c:756
> #4 0x00007f07eb9c01c0 in qemu_put_buffer_async (f=0x7f07ede931b0,
> buf=0x7f000ee23000 "", size=4096)
> at /mnt/sdb2/zjl/bsod0x101_0821/qemu-kvm-1.5.1/savevm.c:772
> #5 0x00007f07eb92ad2f in ram_save_block (f=0x7f07ede931b0, last_stage=false)
> at /mnt/sdb2/zjl/bsod0x101_0821/qemu-kvm-1.5.1/arch_init.c:493
> #6 0x00007f07eb92b30c in ram_save_iterate (f=0x7f07ede931b0, opaque=0x0)
> at /mnt/sdb2/zjl/bsod0x101_0821/qemu-kvm-1.5.1/arch_init.c:654
> #7 0x00007f07eb9c2e12 in qemu_savevm_state_iterate (f=0x7f07ede931b0)
> at /mnt/sdb2/zjl/bsod0x101_0821/qemu-kvm-1.5.1/savevm.c:1914
> #8 0x00007f07eb8975e1 in migration_thread (opaque=0x7f07ebf53300
> <current_migration.25325>)
> at migration.c:578
> Then I cancel the migration task, the migration state in qemu will be set to
> MIG_STATE_CANCELLED, so the migration job in libvirt quits.
> Then I perform migration again, at this time, the network reconnected
> successfully,
> since the TCP timeout retransmission, above stack will not return
> immediately, so two migration tasks exist at the same time.
> And still worse, source qemu will crash, because of accessing the NULL
> pointer in qemu_bh_schedule(s->cleanup_bh); statement in latter migration
> task,
> since the "s->cleanup_bh" had been deleted by previous migration task.
Thanks for explaining. CANCELLING looks like a useful addition.
Why do you need both CANCELLING and COMPLETING? The COMPLETED state
should be set only after all I/O is done.
I agree with Eric that the CANCELLING state should not be exposed via QMP.
"info migrate" and "query-migrate" can keep showing "active" for maximum
backwards compatibility.
More comments below.
> - if (s->state != MIG_STATE_COMPLETED) {
> + if (s->state != MIG_STATE_COMPLETING) {
> qemu_savevm_state_cancel();
> + if (s->state == MIG_STATE_CANCELLING) {
> + migrate_set_state(s, MIG_STATE_CANCELLING, MIG_STATE_CANCELLED);
> + }
I think you can remove the "if" and unconditionally call migrate_set_state.
> + }else {
> + migrate_set_state(s, MIG_STATE_COMPLETING, MIG_STATE_COMPLETED);
> }
>
> notifier_list_notify(&migration_state_notifiers, s);
> }
>
> -static void migrate_set_state(MigrationState *s, int old_state, int
> new_state)
> -{
> - if (atomic_cmpxchg(&s->state, old_state, new_state) == new_state) {
> - trace_migrate_set_state(new_state);
> - }
> -}
> -
> void migrate_fd_error(MigrationState *s)
> {
> DPRINTF("setting error state\n");
> @@ -328,7 +337,7 @@ static void migrate_fd_cancel(MigrationState *s)
> {
> DPRINTF("cancelling migration\n");
>
> - migrate_set_state(s, s->state, MIG_STATE_CANCELLED);
> + migrate_set_state(s, s->state, MIG_STATE_CANCELLING);
Here probably we want something like
do {
old_state = s->state;
if (old_state != MIG_STATE_SETUP && old_state != MIG_STATE_ACTIVE) {
break;
}
migrate_set_state(s, old_state, MIG_STATE_CANCELLING);
} while (s->state != MIG_STATE_CANCELLING);
to avoid a bogus COMPLETED->CANCELLED transition. Please separate the patch in
two parts:
(1) the first uses the above code, with CANCELLED instead of CANCELLING
(2) the second, similar to the one you have posted, introduces the new
CANCELLING
state
Thanks,
Paolo
> }
>
> void add_migration_state_change_notifier(Notifier *notify)
> @@ -405,7 +414,8 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
> params.blk = has_blk && blk;
> params.shared = has_inc && inc;
>
> - if (s->state == MIG_STATE_ACTIVE || s->state == MIG_STATE_SETUP) {
> + if (s->state == MIG_STATE_ACTIVE || s->state == MIG_STATE_SETUP ||
> + s->state == MIG_STATE_COMPLETING || s->state ==
> MIG_STATE_CANCELLING) {
> error_set(errp, QERR_MIGRATION_ACTIVE);
> return;
> }
> @@ -594,7 +604,7 @@ static void *migration_thread(void *opaque)
> }
>
> if (!qemu_file_get_error(s->file)) {
> - migrate_set_state(s, MIG_STATE_ACTIVE,
> MIG_STATE_COMPLETED);
> + migrate_set_state(s, MIG_STATE_ACTIVE,
> MIG_STATE_COMPLETING);
> break;
> }
> }
> @@ -634,7 +644,7 @@ static void *migration_thread(void *opaque)
> }
>
> qemu_mutex_lock_iothread();
> - if (s->state == MIG_STATE_COMPLETED) {
> + if (s->state == MIG_STATE_COMPLETING) {
> int64_t end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> s->total_time = end_time - s->total_time;
> s->downtime = end_time - start_time;
>
Re: [Qemu-devel] [PATCH] migration: avoid starting a new migration task while the previous one still exist, Eric Blake, 2013/11/04