Re: s390 migration crash

qemu-devel
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: s390 migration crash

From:	Dr. David Alan Gilbert
Subject:	Re: s390 migration crash
Date:	Wed, 22 Mar 2023 14:05:06 +0000
User-agent:	Mutt/2.2.9 (2022-11-12)
* Peter Xu (peterx@redhat.com) wrote:
> On Tue, Mar 21, 2023 at 08:24:37PM +0000, Dr. David Alan Gilbert wrote:
> > Hi Peter's,
> >   Peter M pointed me to a seg in a migration test in CI; I can reproduce
> > it:
> >   * On an s390 host
> 
> How easy to reproduce?

Pretty much every time when run as:
make check -j 4

> >   * only as part of a make check - running migration-test by itself
> > doesn't trigger for me.
> >   * It looks like it's postcopy preempt
> > 
> > (gdb) bt full
> > #0  iov_size (iov=iov@entry=0x2aa00e60670, iov_cnt=<optimized out>) at 
> > ../util/iov.c:88
> >         len = 13517923312037845750
> >         i = 17305
> > #1  0x000002aa004d068c in qemu_fflush (f=0x2aa00e58630) at 
> > ../migration/qemu-file.c:307
> >         local_error = 0x0
> > #2  0x000002aa004d0e04 in qemu_fflush (f=<optimized out>) at 
> > ../migration/qemu-file.c:297
> > #3  0x000002aa00613962 in postcopy_preempt_shutdown_file 
> > (s=s@entry=0x2aa00d1b4e0) at ../migration/ram.c:4657
> > #4  0x000002aa004e12b4 in migration_completion (s=0x2aa00d1b4e0) at 
> > ../migration/migration.c:3469
> >         ret = <optimized out>
> >         current_active_state = 5
> >         must_precopy = 0
> >         can_postcopy = 0
> >         in_postcopy = true
> >         pending_size = 0
> >         __func__ = "migration_iteration_run"
> >         iter_state = <optimized out>
> >         s = 0x2aa00d1b4e0
> >         thread = <optimized out>
> >         setup_start = <optimized out>
> >         thr_error = <optimized out>
> >         urgent = <optimized out>
> > #5  migration_iteration_run (s=0x2aa00d1b4e0) at 
> > ../migration/migration.c:3882
> >         must_precopy = 0
> >         can_postcopy = 0
> >         in_postcopy = true
> >         pending_size = 0
> >         __func__ = "migration_iteration_run"
> >         iter_state = <optimized out>
> >         s = 0x2aa00d1b4e0
> >         thread = <optimized out>
> >         setup_start = <optimized out>
> >         thr_error = <optimized out>
> >         urgent = <optimized out>
> > #6  migration_thread (opaque=opaque@entry=0x2aa00d1b4e0) at 
> > ../migration/migration.c:4124
> >         iter_state = <optimized out>
> >         s = 0x2aa00d1b4e0
> > --Type <RET> for more, q to quit, c to continue without paging--
> >         thread = <optimized out>
> >         setup_start = <optimized out>
> >         thr_error = <optimized out>
> >         urgent = <optimized out>
> > #7  0x000002aa00819b8c in qemu_thread_start (args=<optimized out>) at 
> > ../util/qemu-thread-posix.c:541
> >         __cancel_buf = 
> >             {__cancel_jmp_buf = {{__cancel_jmp_buf = {{__gregs = 
> > {4396782422080, 4393751543808, 4397299389454, 4396844235904, 2929182727824, 
> > 2929182933488, 4396843986792, 4397299389455, 33679382915066768, 
> > 33678512846981306}, __fpregs = {4396774031360, 8392704, 2929182933488, 0, 
> > 4396782422272, 2929172491858, 4396774031360, 1}}}, __mask_was_saved = 0}}, 
> > __pad = {0x3ffb4a77a60, 0x0, 0x0, 0x0}}
> >         __cancel_routine = 0x2aa00819bf0 <qemu_thread_atexit_notify>
> >         __not_first_call = <optimized out>
> >         start_routine = 0x2aa004e08f0 <migration_thread>
> >         arg = 0x2aa00d1b4e0
> >         r = <optimized out>
> > #8  0x000003ffb7b1e2e6 in start_thread () at /lib64/libc.so.6
> > #9  0x000003ffb7aafdbe in thread_start () at /lib64/libc.so.6
> > 
> > It looks like it's in the preempt test:
> > 
> > (gdb) where
> > #0  0x000003ffb17a0126 in __pthread_kill_implementation () from 
> > /lib64/libc.so.6
> > #1  0x000003ffb1750890 in raise () from /lib64/libc.so.6
> > #2  0x000003ffb172a340 in abort () from /lib64/libc.so.6
> > #3  0x000002aa0041c130 in qtest_check_status (s=<optimized out>) at 
> > ../tests/qtest/libqtest.c:194
> > #4  0x000003ffb1a3b5de in g_hook_list_invoke () from /lib64/libglib-2.0.so.0
> > #5  <signal handler called>
> > #6  0x000003ffb17a0126 in __pthread_kill_implementation () from 
> > /lib64/libc.so.6
> > #7  0x000003ffb1750890 in raise () from /lib64/libc.so.6
> > #8  0x000003ffb172a340 in abort () from /lib64/libc.so.6
> > #9  0x000002aa00420318 in qmp_fd_receive (fd=<optimized out>) at 
> > ../tests/qtest/libqmp.c:80
> > #10 0x000002aa0041d5ee in qtest_qmp_receive_dict (s=0x2aa01eb2700) at 
> > ../tests/qtest/libqtest.c:713
> > #11 qtest_qmp_receive (s=0x2aa01eb2700) at ../tests/qtest/libqtest.c:701
> > #12 qtest_vqmp (s=s@entry=0x2aa01eb2700, fmt=fmt@entry=0x2aa00487100 "{ 
> > 'execute': 'query-migrate' }", ap=ap@entry=0x3ffc247cc68)
> >     at ../tests/qtest/libqtest.c:765
> > #13 0x000002aa00413f1e in wait_command (who=who@entry=0x2aa01eb2700, 
> > command=command@entry=0x2aa00487100 "{ 'execute': 'query-migrate' }")
> >     at ../tests/qtest/migration-helpers.c:73
> > #14 0x000002aa00414078 in migrate_query (who=who@entry=0x2aa01eb2700) at 
> > ../tests/qtest/migration-helpers.c:139
> > #15 migrate_query_status (who=who@entry=0x2aa01eb2700) at 
> > ../tests/qtest/migration-helpers.c:161
> > #16 0x000002aa00414480 in check_migration_status (ungoals=0x0, 
> > goal=0x2aa00495c7e "completed", who=0x2aa01eb2700) at 
> > ../tests/qtest/migration-helpers.c:177
> > #17 wait_for_migration_status (who=0x2aa01eb2700, goal=<optimized out>, 
> > ungoals=0x0) at ../tests/qtest/migration-helpers.c:202
> > #18 0x000002aa0041300e in migrate_postcopy_complete 
> > (from=from@entry=0x2aa01eb2700, to=to@entry=0x2aa01eb3000, 
> > args=args@entry=0x3ffc247cf48)
> >     at ../tests/qtest/migration-test.c:1137
> > #19 0x000002aa004131a4 in test_postcopy_common (args=0x3ffc247cf48) at 
> > ../tests/qtest/migration-test.c:1162
> > #20 test_postcopy_preempt () at ../tests/qtest/migration-test.c:1178
> > 
> > Looking at the iov and file it's garbage; so it makes me think this is
> > something like a flush on a closed file.
> 
> I didn't figure out how that could be closed, but I think there's indeed a
> possible race that the qemufile can be accessed by both the return path
> thread and the migration thread concurrently, while qemufile is not thread
> safe on that.
> 
> What postcopy_preempt_shutdown_file() does was: the src uses this EOS to
> kick the dest QEMU preempt thread out of the migration and shut it off.
> After some thought I think this is unnecessary complexity, since postcopy
> should end at the point where dest received all the data, then it sends a
> SHUT to src.  So potentially it's not good to have dest relying on anything
> from src to shutdown anything (the preempt thread here) because it's the
> dest qemu that makes the final decision to finish.  Ideally the preempt
> thread on dest should be able to shutdown itself.
> 
> The trick here is preempt thread will block at read() (aka, recvmsg()) at
> the channel at that time and the only way to kick it out from that is a
> shutdown() on dest.  I attached a patch did it.  I'm not 100% sure whether
> it'll already resolve our problem but worth trying.  This also made me
> notice we forgot to enable SHUTDOWN feature on tls server when I was
> running the patch 1 with qtest, so two patches needed.

Well, it seems to fix it:

Tested-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Will this break interaction with an old qemu that's still waiting for
the EOS?

Dave

> -- 
> Peter Xu

> From 0e317fa78e9671c119f6be78a0e0a36201517dc2 Mon Sep 17 00:00:00 2001
> From: Peter Xu <peterx@redhat.com>
> Date: Tue, 21 Mar 2023 19:58:42 -0400
> Subject: [PATCH 1/2] io: tls: Inherit QIO_CHANNEL_FEATURE_SHUTDOWN on server
>  side
> 
> TLS iochannel will inherit io_shutdown() from the master ioc, however we
> missed to do that on the server side.
> 
> This will e.g. allow qemu_file_shutdown() to work on dest QEMU too for
> migration.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  io/channel-tls.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/io/channel-tls.c b/io/channel-tls.c
> index 5a7a3d48d6..9805dd0a3f 100644
> --- a/io/channel-tls.c
> +++ b/io/channel-tls.c
> @@ -74,6 +74,9 @@ qio_channel_tls_new_server(QIOChannel *master,
>      ioc = QIO_CHANNEL_TLS(object_new(TYPE_QIO_CHANNEL_TLS));
>  
>      ioc->master = master;
> +    if (qio_channel_has_feature(master, QIO_CHANNEL_FEATURE_SHUTDOWN)) {
> +        qio_channel_set_feature(QIO_CHANNEL(ioc), 
> QIO_CHANNEL_FEATURE_SHUTDOWN);
> +    }
>      object_ref(OBJECT(master));
>  
>      ioc->session = qcrypto_tls_session_new(
> -- 
> 2.39.1
> 

> From 73b267c17e689d0dafbde1d93c0125b81a43184e Mon Sep 17 00:00:00 2001
> From: Peter Xu <peterx@redhat.com>
> Date: Tue, 21 Mar 2023 19:11:22 -0400
> Subject: [PATCH 2/2] migration: Fix potential race on postcopy_qemufile_src
> 
> postcopy_qemufile_src object should be owned by one thread, either the main
> thread (e.g. when at the beginning, or at the end of migration), or by the
> return path thread (when during a preempt enabled postcopy migration).  If
> that's not the case the access to the object might be racy.
> 
> postcopy_preempt_shutdown_file() can be potentially racy, because it's
> called at the end phase of migration on the main thread, however during
> which the return path thread hasn't yet been recycled; the recycle happens
> in await_return_path_close_on_source() which is after this point.
> 
> It means, logically it's posslbe the main thread and the return path thread
> are both operating on the same qemufile.  While I don't think qemufile is
> thread safe at all.
> 
> postcopy_preempt_shutdown_file() used to be needed because that's where we
> send EOS to dest so that dest can safely shutdown the preempt thread.
> 
> To avoid the possible race, remove this only place that a race can happen.
> Instead we figure out another way to safely close the preempt thread on
> dest.
> 
> The core idea during postcopy on deciding "when to stop" is that dest will
> send a postcopy SHUT message to src, telling src that all data is there.
> Hence to shut the dest preempt thread maybe better to do it directly on
> dest node.
> 
> This patch proposed such a way that we change postcopy_prio_thread_created
> into PreemptThreadStatus, so that we kick the preempt thread on dest qemu
> by a sequence of:
> 
>   mis->preempt_thread_status = PREEMPT_THREAD_QUIT;
>   qemu_file_shutdown(mis->postcopy_qemufile_dst);
> 
> While here shutdown() is probably so far the easiest way to kick preempt
> thread from a blocked qemu_get_be64().  Then it reads preempt_thread_status
> to make sure it's not a network failure but a willingness to quit the
> thread.
> 
> We could have avoided that extra status but just rely on migration status.
> The problem is postcopy_ram_incoming_cleanup() is just called early enough
> so we're still during POSTCOPY_ACTIVE no matter what.. So just make it
> simple to have the status introduced.
> 
> Fixes: 36f62f11e4 ("migration: Postcopy preemption preparation on channel 
> creation")
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  migration/migration.c    |  7 -------
>  migration/migration.h    | 13 ++++++++++++-
>  migration/postcopy-ram.c | 20 +++++++++++++++-----
>  migration/ram.c          |  6 ------
>  migration/ram.h          |  1 -
>  5 files changed, 27 insertions(+), 20 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index ae2025d9d8..bbc36100de 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -3459,16 +3459,9 @@ static void migration_completion(MigrationState *s)
>          }
>      } else if (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
>          trace_migration_completion_postcopy_end();
> -
>          qemu_mutex_lock_iothread();
>          qemu_savevm_state_complete_postcopy(s->to_dst_file);
>          qemu_mutex_unlock_iothread();
> -
> -        /* Shutdown the postcopy fast path thread */
> -        if (migrate_postcopy_preempt()) {
> -            postcopy_preempt_shutdown_file(s);
> -        }
> -
>          trace_migration_completion_postcopy_end_after_complete();
>      } else {
>          goto fail;
> diff --git a/migration/migration.h b/migration/migration.h
> index 2da2f8a164..7228163cc8 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -65,6 +65,12 @@ typedef struct {
>      bool all_zero;
>  } PostcopyTmpPage;
>  
> +typedef enum {
> +    PREEMPT_THREAD_NONE = 0,
> +    PREEMPT_THREAD_CREATED,
> +    PREEMPT_THREAD_QUIT,
> +} PreemptThreadStatus;
> +
>  /* State for the incoming migration */
>  struct MigrationIncomingState {
>      QEMUFile *from_src_file;
> @@ -124,7 +130,12 @@ struct MigrationIncomingState {
>      QemuSemaphore postcopy_qemufile_dst_done;
>      /* Postcopy priority thread is used to receive postcopy requested pages 
> */
>      QemuThread postcopy_prio_thread;
> -    bool postcopy_prio_thread_created;
> +    /*
> +     * Always set by the main vm load thread only, but can be read by the
> +     * postcopy preempt thread.  "volatile" makes sure all reads will be
> +     * uptodate across cores.
> +     */
> +    volatile PreemptThreadStatus preempt_thread_status;
>      /*
>       * Used to sync between the ram load main thread and the fast ram load
>       * thread.  It protects postcopy_qemufile_dst, which is the postcopy
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index 41c0713650..263bab75ec 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -568,9 +568,14 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState 
> *mis)
>  {
>      trace_postcopy_ram_incoming_cleanup_entry();
>  
> -    if (mis->postcopy_prio_thread_created) {
> +    if (mis->preempt_thread_status == PREEMPT_THREAD_CREATED) {
> +        /* Notify the fast load thread to quit */
> +        mis->preempt_thread_status = PREEMPT_THREAD_QUIT;
> +        if (mis->postcopy_qemufile_dst) {
> +            qemu_file_shutdown(mis->postcopy_qemufile_dst);
> +        }
>          qemu_thread_join(&mis->postcopy_prio_thread);
> -        mis->postcopy_prio_thread_created = false;
> +        mis->preempt_thread_status = PREEMPT_THREAD_NONE;
>      }
>  
>      if (mis->have_fault_thread) {
> @@ -1203,7 +1208,7 @@ int postcopy_ram_incoming_setup(MigrationIncomingState 
> *mis)
>           */
>          postcopy_thread_create(mis, &mis->postcopy_prio_thread, "fault-fast",
>                                 postcopy_preempt_thread, 
> QEMU_THREAD_JOINABLE);
> -        mis->postcopy_prio_thread_created = true;
> +        mis->preempt_thread_status = PREEMPT_THREAD_CREATED;
>      }
>  
>      trace_postcopy_ram_enable_notify();
> @@ -1652,6 +1657,11 @@ static void 
> postcopy_pause_ram_fast_load(MigrationIncomingState *mis)
>      trace_postcopy_pause_fast_load_continued();
>  }
>  
> +static bool preempt_thread_should_run(MigrationIncomingState *mis)
> +{
> +    return mis->preempt_thread_status != PREEMPT_THREAD_QUIT;
> +}
> +
>  void *postcopy_preempt_thread(void *opaque)
>  {
>      MigrationIncomingState *mis = opaque;
> @@ -1671,11 +1681,11 @@ void *postcopy_preempt_thread(void *opaque)
>  
>      /* Sending RAM_SAVE_FLAG_EOS to terminate this thread */
>      qemu_mutex_lock(&mis->postcopy_prio_thread_mutex);
> -    while (1) {
> +    while (preempt_thread_should_run(mis)) {
>          ret = ram_load_postcopy(mis->postcopy_qemufile_dst,
>                                  RAM_CHANNEL_POSTCOPY);
>          /* If error happened, go into recovery routine */
> -        if (ret) {
> +        if (ret && preempt_thread_should_run(mis)) {
>              postcopy_pause_ram_fast_load(mis);
>          } else {
>              /* We're done */
> diff --git a/migration/ram.c b/migration/ram.c
> index 96e8a19a58..8c316ffed2 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -4651,12 +4651,6 @@ static int ram_resume_prepare(MigrationState *s, void 
> *opaque)
>      return 0;
>  }
>  
> -void postcopy_preempt_shutdown_file(MigrationState *s)
> -{
> -    qemu_put_be64(s->postcopy_qemufile_src, RAM_SAVE_FLAG_EOS);
> -    qemu_fflush(s->postcopy_qemufile_src);
> -}
> -
>  static SaveVMHandlers savevm_ram_handlers = {
>      .save_setup = ram_save_setup,
>      .save_live_iterate = ram_save_iterate,
> diff --git a/migration/ram.h b/migration/ram.h
> index 81cbb0947c..93b071a1a7 100644
> --- a/migration/ram.h
> +++ b/migration/ram.h
> @@ -96,7 +96,6 @@ int64_t ramblock_recv_bitmap_send(QEMUFile *file,
>                                    const char *block_name);
>  int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *rb);
>  bool ramblock_page_is_discarded(RAMBlock *rb, ram_addr_t start);
> -void postcopy_preempt_shutdown_file(MigrationState *s);
>  void *postcopy_preempt_thread(void *opaque);
>  
>  /* ram cache */
> -- 
> 2.39.1
> 

-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
[Prev in Thread]
Current Thread
[Next in Thread]
s390 migration crash, Dr. David Alan Gilbert, 2023/03/21
- Re: s390 migration crash, Peter Xu, 2023/03/21
  - Re: s390 migration crash, Dr. David Alan Gilbert <=
    - Re: s390 migration crash, Peter Xu, 2023/03/22
    - Re: s390 migration crash, Dr. David Alan Gilbert, 2023/03/22
    - Re: s390 migration crash, Peter Xu, 2023/03/22
    - Re: s390 migration crash, Peter Xu, 2023/03/26
  - Re: s390 migration crash, Daniel P . Berrangé, 2023/03/22
Prev by Date: [PATCH v9 2/8] igb: handle PF/VF reset properly
Next by Date: [PATCH 00/12] [RFC PATCHv2] Add vhost-user Video decode
Previous by thread: Re: s390 migration crash
Next by thread: Re: s390 migration crash
Index(es):
- Date
- Thread