[Qemu-commits] [qemu/qemu] 06152b: migration: prevent migration when VM

qemu-commits
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-commits] [qemu/qemu] 06152b: migration: prevent migration when VM

From:	Peter Maydell
Subject:	[Qemu-commits] [qemu/qemu] 06152b: migration: prevent migration when VM has poisoned ...
Date:	Fri, 09 Feb 2024 08:14:50 -0800
  Branch: refs/heads/master
  Home:   https://github.com/qemu/qemu
  Commit: 06152b89db64bc5ccec1e54576706ba891654df9
      
https://github.com/qemu/qemu/commit/06152b89db64bc5ccec1e54576706ba891654df9
  Author: William Roche <william.roche@oracle.com>
  Date:   2024-02-05 (Mon, 05 Feb 2024)

  Changed paths:
    M accel/kvm/kvm-all.c
    M accel/stubs/kvm-stub.c
    M include/sysemu/kvm.h
    M migration/migration.c

  Log Message:
  -----------
  migration: prevent migration when VM has poisoned memory

A memory page poisoned from the hypervisor level is no longer readable.
The migration of a VM will crash Qemu when it tries to read the
memory address space and stumbles on the poisoned page with a similar
stack trace:

Program terminated with signal SIGBUS, Bus error.
#0  _mm256_loadu_si256
#1  buffer_zero_avx2
#2  select_accel_fn
#3  buffer_is_zero
#4  save_zero_page
#5  ram_save_target_page_legacy
#6  ram_save_host_page
#7  ram_find_and_save_block
#8  ram_save_iterate
#9  qemu_savevm_state_iterate
#10 migration_iteration_run
#11 migration_thread
#12 qemu_thread_start

To avoid this VM crash during the migration, prevent the migration
when a known hardware poison exists on the VM.

Signed-off-by: William Roche <william.roche@oracle.com>
Link: https://lore.kernel.org/r/20240130190640.139364-2-william.roche@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>


  Commit: 8888a552bf7af200e36ff123772547dfb4f133c4
      
https://github.com/qemu/qemu/commit/8888a552bf7af200e36ff123772547dfb4f133c4
  Author: Peter Xu <peterx@redhat.com>
  Date:   2024-02-05 (Mon, 05 Feb 2024)

  Changed paths:
    M migration/multifd.c

  Log Message:
  -----------
  migration/multifd: Drop stale comment for multifd zero copy

We've already done that with multifd_flush_after_each_section, for multifd
in general.  Drop the stale "TODO-like" comment.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-2-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>


  Commit: 48c0f5d56fd2ff0a0cda23301637b742c690f59a
      
https://github.com/qemu/qemu/commit/48c0f5d56fd2ff0a0cda23301637b742c690f59a
  Author: Peter Xu <peterx@redhat.com>
  Date:   2024-02-05 (Mon, 05 Feb 2024)

  Changed paths:
    M migration/multifd.c

  Log Message:
  -----------
  migration/multifd: multifd_send_kick_main()

When a multifd sender thread hit errors, it always needs to kick the main
thread by kicking all the semaphores that it can be waiting upon.

Provide a helper for it and deduplicate the code.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-3-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>


  Commit: 15f3f21d598148895c33b6fc41e29777cf6ad992
      
https://github.com/qemu/qemu/commit/15f3f21d598148895c33b6fc41e29777cf6ad992
  Author: Peter Xu <peterx@redhat.com>
  Date:   2024-02-05 (Mon, 05 Feb 2024)

  Changed paths:
    M migration/multifd.c
    M migration/multifd.h

  Log Message:
  -----------
  migration/multifd: Drop MultiFDSendParams.quit, cleanup error paths

Multifd send side has two fields to indicate error quits:

  - MultiFDSendParams.quit
  - &multifd_send_state->exiting

Merge them into the global one.  The replacement is done by changing all
p->quit checks into the global var check.  The global check doesn't need
any lock.

A few more things done on top of this altogether:

  - multifd_send_terminate_threads()

    Moving the xchg() of &multifd_send_state->exiting upper, so as to cover
    the tracepoint, migrate_set_error() and migrate_set_state().

  - multifd_send_sync_main()

    In the 2nd loop, add one more check over the global var to make sure we
    don't keep the looping if QEMU already decided to quit.

  - multifd_tls_outgoing_handshake()

    Use multifd_send_terminate_threads() to set the error state.  That has
    a benefit of updating MigrationState.error to that error too, so we can
    persist that 1st error we hit in that specific channel.

  - multifd_new_send_channel_async()

    Take similar approach like above, drop the migrate_set_error() because
    multifd_send_terminate_threads() already covers that.  Unwrap the helper
    multifd_new_send_channel_cleanup() along the way; not really needed.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-4-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>


  Commit: 836eca47f62f9f6d5b8e9b6fedfc3539775c4e2e
      
https://github.com/qemu/qemu/commit/836eca47f62f9f6d5b8e9b6fedfc3539775c4e2e
  Author: Peter Xu <peterx@redhat.com>
  Date:   2024-02-05 (Mon, 05 Feb 2024)

  Changed paths:
    M migration/multifd.c

  Log Message:
  -----------
  migration/multifd: Postpone reset of MultiFDPages_t

Now we reset MultiFDPages_t object in the multifd sender thread in the
middle of the sending job.  That's not necessary, because the "*pages"
struct will not be reused anyway until pending_job is cleared.

Move that to the end after the job is completed, provide a helper to reset
a "*pages" object.  Use that same helper when free the object too.

This prepares us to keep using p->pages in the follow up patches, where we
may drop p->normal[].

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-5-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>


  Commit: efd8c5439db7eaf00f35adc0fcc4f01d916e8619
      
https://github.com/qemu/qemu/commit/efd8c5439db7eaf00f35adc0fcc4f01d916e8619
  Author: Peter Xu <peterx@redhat.com>
  Date:   2024-02-05 (Mon, 05 Feb 2024)

  Changed paths:
    M migration/multifd-zlib.c
    M migration/multifd-zstd.c
    M migration/multifd.c
    M migration/multifd.h

  Log Message:
  -----------
  migration/multifd: Drop MultiFDSendParams.normal[] array

This array is redundant when p->pages exists.  Now we extended the life of
p->pages to the whole period where pending_job is set, it should be safe to
always use p->pages->offset[] rather than p->normal[].  Drop the array.

Alongside, the normal_num is also redundant, which is the same to
p->pages->num.

This doesn't apply to recv side, because there's no extra buffering on recv
side, so p->normal[] array is still needed.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-6-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>


  Commit: f5f48a7891cf6664a920ba52f6f4dea1646049a4
      
https://github.com/qemu/qemu/commit/f5f48a7891cf6664a920ba52f6f4dea1646049a4
  Author: Peter Xu <peterx@redhat.com>
  Date:   2024-02-05 (Mon, 05 Feb 2024)

  Changed paths:
    M migration/multifd.c
    M migration/multifd.h

  Log Message:
  -----------
  migration/multifd: Separate SYNC request with normal jobs

Multifd provide a threaded model for processing jobs.  On sender side,
there can be two kinds of job: (1) a list of pages to send, or (2) a sync
request.

The sync request is a very special kind of job.  It never contains a page
array, but only a multifd packet telling the dest side to synchronize with
sent pages.

Before this patch, both requests use the pending_job field, no matter what
the request is, it will boost pending_job, while multifd sender thread will
decrement it after it finishes one job.

However this should be racy, because SYNC is special in that it needs to
set p->flags with MULTIFD_FLAG_SYNC, showing that this is a sync request.
Consider a sequence of operations where:

  - migration thread enqueue a job to send some pages, pending_job++ (0->1)

  - [...before the selected multifd sender thread wakes up...]

  - migration thread enqueue another job to sync, pending_job++ (1->2),
    setup p->flags=MULTIFD_FLAG_SYNC

  - multifd sender thread wakes up, found pending_job==2
    - send the 1st packet with MULTIFD_FLAG_SYNC and list of pages
    - send the 2nd packet with flags==0 and no pages

This is not expected, because MULTIFD_FLAG_SYNC should hopefully be done
after all the pages are received.  Meanwhile, the 2nd packet will be
completely useless, which contains zero information.

I didn't verify above, but I think this issue is still benign in that at
least on the recv side we always receive pages before handling
MULTIFD_FLAG_SYNC.  However that's not always guaranteed and just tricky.

One other reason I want to separate it is using p->flags to communicate
between the two threads is also not clearly defined, it's very hard to read
and understand why accessing p->flags is always safe; see the current impl
of multifd_send_thread() where we tried to cache only p->flags.  It doesn't
need to be that complicated.

This patch introduces pending_sync, a separate flag just to show that the
requester needs a sync.  Alongside, we remove the tricky caching of
p->flags now because after this patch p->flags should only be used by
multifd sender thread now, which will be crystal clear.  So it is always
thread safe to access p->flags.

With that, we can also safely convert the pending_job into a boolean,
because we don't support >1 pending jobs anyway.

Always use atomic ops to access both flags to make sure no cache effect.
When at it, drop the initial setting of "pending_job = 0" because it's
always allocated using g_new0().

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-7-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>


  Commit: e3cce9af10b06c51434ced4e1a6686f1ce43e124
      
https://github.com/qemu/qemu/commit/e3cce9af10b06c51434ced4e1a6686f1ce43e124
  Author: Peter Xu <peterx@redhat.com>
  Date:   2024-02-05 (Mon, 05 Feb 2024)

  Changed paths:
    M migration/multifd.c

  Log Message:
  -----------
  migration/multifd: Simplify locking in sender thread

The sender thread will yield the p->mutex before IO starts, trying to not
block the requester thread.  This may be unnecessary lock optimizations,
because the requester can already read pending_job safely even without the
lock, because the requester is currently the only one who can assign a
task.

Drop that lock complication on both sides:

  (1) in the sender thread, always take the mutex until job done
  (2) in the requester thread, check pending_job clear lockless

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-8-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>


  Commit: 83c560fb4249ee5698652249e0c1730c3d611a9b
      
https://github.com/qemu/qemu/commit/83c560fb4249ee5698652249e0c1730c3d611a9b
  Author: Peter Xu <peterx@redhat.com>
  Date:   2024-02-05 (Mon, 05 Feb 2024)

  Changed paths:
    M migration/multifd.c

  Log Message:
  -----------
  migration/multifd: Drop pages->num check in sender thread

Now with a split SYNC handler, we always have pages->num set for
pending_job==true.  Assert it instead.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-9-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>


  Commit: 05b7ec1890158471afb8537a6817a7e0d0a6c938
      
https://github.com/qemu/qemu/commit/05b7ec1890158471afb8537a6817a7e0d0a6c938
  Author: Peter Xu <peterx@redhat.com>
  Date:   2024-02-05 (Mon, 05 Feb 2024)

  Changed paths:
    M migration/multifd.c
    M migration/multifd.h

  Log Message:
  -----------
  migration/multifd: Rename p->num_packets and clean it up

This field, no matter whether on src or dest, is only used for debugging
purpose.

They can even be removed already, unless it still more or less provide some
accounting on "how many packets are sent/recved for this thread".  The
other more important one is called packet_num, which is embeded in the
multifd packet headers (MultiFDPacket_t).

So let's keep them for now, but make them much easier to understand, by
doing below:

  - Rename both of them to packets_sent / packets_recved, the old
  name (num_packets) are waaay too confusing when we already have
  MultiFDPacket_t.packets_num.

  - Avoid worrying on the "initial packet": we know we will send it, that's
  good enough.  The accounting won't matter a great deal to start with 0 or
  with 1.

  - Move them to where we send/recv the packets.  They're:

    - multifd_send_fill_packet() for senders.
    - multifd_recv_unfill_packet() for receivers.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-10-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>


  Commit: db7e1cc5103137743394a939045a17fa2b30a0dc
      
https://github.com/qemu/qemu/commit/db7e1cc5103137743394a939045a17fa2b30a0dc
  Author: Peter Xu <peterx@redhat.com>
  Date:   2024-02-05 (Mon, 05 Feb 2024)

  Changed paths:
    M migration/multifd.c

  Log Message:
  -----------
  migration/multifd: Move total_normal_pages accounting

Just like the previous patch, move the accounting for total_normal_pages on
both src/dst sides into the packet fill/unfill procedures.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-11-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>


  Commit: 8a9ef1738037e1d1132f9e1bd3e2f1102bde719f
      
https://github.com/qemu/qemu/commit/8a9ef1738037e1d1132f9e1bd3e2f1102bde719f
  Author: Peter Xu <peterx@redhat.com>
  Date:   2024-02-05 (Mon, 05 Feb 2024)

  Changed paths:
    M migration/multifd.c

  Log Message:
  -----------
  migration/multifd: Move trace_multifd_send|recv()

Move them into fill/unfill of packets.  With that, we can further cleanup
the send/recv thread procedure, and remove one more temp var.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-12-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>


  Commit: 452b205702335ddd45554aaf0eb37baf50bdfa00
      
https://github.com/qemu/qemu/commit/452b205702335ddd45554aaf0eb37baf50bdfa00
  Author: Peter Xu <peterx@redhat.com>
  Date:   2024-02-05 (Mon, 05 Feb 2024)

  Changed paths:
    M migration/multifd.c
    M migration/multifd.h

  Log Message:
  -----------
  migration/multifd: multifd_send_prepare_header()

Introduce a helper multifd_send_prepare_header() to setup the header packet
for multifd sender.

It's fine to setup the IOV[0] _before_ send_prepare() because the packet
buffer is already ready, even if the content is to be filled in.

With this helper, we can already slightly clean up the zero copy path.

Note that I explicitly put it into multifd.h, because I want it inlined
directly into multifd*.c where necessary later.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-13-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>


  Commit: 25a1f8787597f6906b151b2f73ae6cc92a31de57
      
https://github.com/qemu/qemu/commit/25a1f8787597f6906b151b2f73ae6cc92a31de57
  Author: Peter Xu <peterx@redhat.com>
  Date:   2024-02-05 (Mon, 05 Feb 2024)

  Changed paths:
    M migration/multifd-zlib.c
    M migration/multifd-zstd.c
    M migration/multifd.c
    M migration/multifd.h

  Log Message:
  -----------
  migration/multifd: Move header prepare/fill into send_prepare()

This patch redefines the interfacing of ->send_prepare().  It further
simplifies multifd_send_thread() especially on zero copy.

Now with the new interface, we require the hook to do all the work for
preparing the IOVs to send.  After it's completed, the IOVs should be ready
to be dumped into the specific multifd QIOChannel later.

So now the API looks like:

  p->pages ----------->  send_prepare() -------------> IOVs

This also prepares for the case where the input can be extended to even not
any p->pages.  But that's for later.

This patch will achieve similar goal of what Fabiano used to propose here:

https://lore.kernel.org/r/20240126221943.26628-1-farosas@suse.de

However the send() interface may not be necessary.  I'm boldly attaching a
"Co-developed-by" for Fabiano.

Co-developed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-14-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>


  Commit: 859ebaf346e8b5dece6cf255c604fe953d8ec9ab
      
https://github.com/qemu/qemu/commit/859ebaf346e8b5dece6cf255c604fe953d8ec9ab
  Author: Peter Xu <peterx@redhat.com>
  Date:   2024-02-05 (Mon, 05 Feb 2024)

  Changed paths:
    M migration/multifd.c

  Log Message:
  -----------
  migration/multifd: Forbid spurious wakeups

Now multifd's logic is designed to have no spurious wakeup.  I still
remember a talk to Juan and he seems to agree we should drop it now, and if
my memory was right it was there because multifd used to hit that when
still debugging.

Let's drop it and see what can explode; as long as it's not reaching
soft-freeze.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-15-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>


  Commit: 3ab4441d97af59ea09ee015d68c4770704b2b34f
      
https://github.com/qemu/qemu/commit/3ab4441d97af59ea09ee015d68c4770704b2b34f
  Author: Peter Xu <peterx@redhat.com>
  Date:   2024-02-05 (Mon, 05 Feb 2024)

  Changed paths:
    M migration/multifd.c
    M migration/trace-events

  Log Message:
  -----------
  migration/multifd: Split multifd_send_terminate_threads()

Split multifd_send_terminate_threads() into two functions:

  - multifd_send_set_error(): used when an error happened on the sender
    side, set error and quit state only

  - multifd_send_terminate_threads(): used only by the main thread to kick
    all multifd send threads out of sleep, for the last recycling.

Use multifd_send_set_error() in the three old call sites where only the
error will be set.

Use multifd_send_terminate_threads() in the last one where the main thread
will kick the multifd threads at last in multifd_save_cleanup().

Both helpers will need to set quitting=1.

Suggested-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-16-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>


  Commit: d6556d174a6b9fc443f2320193f18e71eb67052a
      
https://github.com/qemu/qemu/commit/d6556d174a6b9fc443f2320193f18e71eb67052a
  Author: Peter Xu <peterx@redhat.com>
  Date:   2024-02-05 (Mon, 05 Feb 2024)

  Changed paths:
    M migration/multifd.c
    M migration/multifd.h
    M migration/ram.c

  Log Message:
  -----------
  migration/multifd: Change retval of multifd_queue_page()

Using int is an overkill when there're only two options.  Change it to a
boolean.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-17-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>


  Commit: 3b40964a863d69121733c8b9794a02347ed0000b
      
https://github.com/qemu/qemu/commit/3b40964a863d69121733c8b9794a02347ed0000b
  Author: Peter Xu <peterx@redhat.com>
  Date:   2024-02-05 (Mon, 05 Feb 2024)

  Changed paths:
    M migration/multifd.c

  Log Message:
  -----------
  migration/multifd: Change retval of multifd_send_pages()

Using int is an overkill when there're only two options.  Change it to a
boolean.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-18-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>


  Commit: f88f86c4ee3fe673b34873e27af2de0a16fe01fd
      
https://github.com/qemu/qemu/commit/f88f86c4ee3fe673b34873e27af2de0a16fe01fd
  Author: Peter Xu <peterx@redhat.com>
  Date:   2024-02-05 (Mon, 05 Feb 2024)

  Changed paths:
    M migration/multifd.c

  Log Message:
  -----------
  migration/multifd: Rewrite multifd_queue_page()

The current multifd_queue_page() is not easy to read and follow.  It is not
good with a few reasons:

  - No helper at all to show what exactly does a condition mean; in short,
  readability is low.

  - Rely on pages->ramblock being cleared to detect an empty queue.  It's
  slightly an overload of the ramblock pointer, per Fabiano [1], which I
  also agree.

  - Contains a self recursion, even if not necessary..

Rewrite this function.  We add some comments to make it even clearer on
what it does.

[1] https://lore.kernel.org/r/87wmrpjzew.fsf@suse.de

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-19-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>


  Commit: 12808db3b8c22d26c9bc3da6f41756890ce882e4
      
https://github.com/qemu/qemu/commit/12808db3b8c22d26c9bc3da6f41756890ce882e4
  Author: Peter Xu <peterx@redhat.com>
  Date:   2024-02-05 (Mon, 05 Feb 2024)

  Changed paths:
    M migration/multifd.c

  Log Message:
  -----------
  migration/multifd: Cleanup multifd_save_cleanup()

Shrink the function by moving relevant works into helpers: move the thread
join()s into multifd_send_terminate_threads(), then create two more helpers
to cover channel/state cleanups.

Add a TODO entry for the thread terminate process because p->running is
still buggy.  We need to fix it at some point but not yet covered.

Suggested-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-20-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>


  Commit: 5e6ea8a1d64e72e648b5a5277f08ec7fb09c3b8e
      
https://github.com/qemu/qemu/commit/5e6ea8a1d64e72e648b5a5277f08ec7fb09c3b8e
  Author: Peter Xu <peterx@redhat.com>
  Date:   2024-02-05 (Mon, 05 Feb 2024)

  Changed paths:
    M migration/multifd.c

  Log Message:
  -----------
  migration/multifd: Cleanup multifd_load_cleanup()

Use similar logic to cleanup the recv side.

Note that multifd_recv_terminate_threads() may need some similar rework
like the sender side, but let's leave that for later.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-21-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>


  Commit: cde85c37ca54e4a2dbee8653181938499887f6be
      
https://github.com/qemu/qemu/commit/cde85c37ca54e4a2dbee8653181938499887f6be
  Author: Peter Xu <peterx@redhat.com>
  Date:   2024-02-05 (Mon, 05 Feb 2024)

  Changed paths:
    M migration/migration.c
    M migration/multifd.c
    M migration/multifd.h

  Log Message:
  -----------
  migration/multifd: Stick with send/recv on function names

Most of the multifd code uses send/recv to represent the two sides, but
some rare cases use save/load.

Since send/recv is the majority, replacing the save/load use cases to use
send/recv globally.  Now we reach a consensus on the naming.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-22-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>


  Commit: 98ea497d8b8a5076be7b6ceb0dcc4a475373eb76
      
https://github.com/qemu/qemu/commit/98ea497d8b8a5076be7b6ceb0dcc4a475373eb76
  Author: Peter Xu <peterx@redhat.com>
  Date:   2024-02-05 (Mon, 05 Feb 2024)

  Changed paths:
    M migration/multifd.c
    M migration/multifd.h

  Log Message:
  -----------
  migration/multifd: Fix MultiFDSendParams.packet_num race

As reported correctly by Fabiano [1] (while per Fabiano, it sourced back to
Elena's initial report in Oct 2023), MultiFDSendParams.packet_num is buggy
to be assigned and stored.  Consider two consequent operations of: (1)
queue a job into multifd send thread X, then (2) queue another sync request
to the same send thread X.  Then the MultiFDSendParams.packet_num will be
assigned twice, and the first assignment can get lost already.

To avoid that, we move the packet_num assignment from p->packet_num into
where the thread will fill in the packet.  Use atomic operations to protect
the field, making sure there's no race.

Note that atomic fetch_add() may not be good for scaling purposes, however
multifd should be fine as number of threads should normally not go beyond
16 threads.  Let's leave that concern for later but fix the issue first.

There's also a trick on how to make it always work even on 32 bit hosts for
uint64_t packet number.  Switching to uintptr_t as of now to simply the
case.  It will cause packet number to overflow easier on 32 bit, but that
shouldn't be a major concern for now as 32 bit systems is not the major
audience for any performance concerns like what multifd wants to address.

We also need to move multifd_send_state definition upper, so that
multifd_send_fill_packet() can reference it.

[1] https://lore.kernel.org/r/87o7d1jlu5.fsf@suse.de

Reported-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-23-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>


  Commit: 488c84acb465c21b716c3fd14de27ab5ce388c85
      
https://github.com/qemu/qemu/commit/488c84acb465c21b716c3fd14de27ab5ce388c85
  Author: Peter Xu <peterx@redhat.com>
  Date:   2024-02-06 (Tue, 06 Feb 2024)

  Changed paths:
    M migration/multifd.c
    M migration/multifd.h

  Log Message:
  -----------
  migration/multifd: Optimize sender side to be lockless

When reviewing my attempt to refactor send_prepare(), Fabiano suggested we
try out with dropping the mutex in multifd code [1].

I thought about that before but I never tried to change the code.  Now
maybe it's time to give it a stab.  This only optimizes the sender side.

The trick here is multifd has a clear provider/consumer model, that the
migration main thread publishes requests (either pending_job/pending_sync),
while the multifd sender threads are consumers.  Here we don't have a lot
of complicated data sharing, and the jobs can logically be submitted
lockless.

Arm the code with atomic weapons.  Two things worth mentioning:

  - For multifd_send_pages(): we can use qatomic_load_acquire() when trying
  to find a free channel, but that's expensive if we attach one ACQUIRE per
  channel.  Instead, keep the qatomic_read() on reading the pending_job
  flag as we do already, meanwhile use one smp_mb_acquire() after the loop
  to guarantee the memory ordering.

  - For pending_sync: it doesn't have any extra data required since now
  p->flags are never touched, it should be safe to not use memory barrier.
  That's different from pending_job.

Provide rich comments for all the lockless operations to state how they are
paired.  With that, we can remove the mutex.

[1] https://lore.kernel.org/r/87o7d1jlu5.fsf@suse.de

Suggested-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-24-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>


  Commit: 3205bebd4fc6dd501fb8b10c93ddce9da18e09db
      
https://github.com/qemu/qemu/commit/3205bebd4fc6dd501fb8b10c93ddce9da18e09db
  Author: Avihai Horon <avihaih@nvidia.com>
  Date:   2024-02-07 (Wed, 07 Feb 2024)

  Changed paths:
    M migration/migration.c

  Log Message:
  -----------
  migration: Fix logic of channels and transport compatibility check

The commit in the fixes line mistakenly modified the channels and
transport compatibility check logic so it now checks multi-channel
support only for socket transport type.

Thus, running multifd migration using a transport other than socket that
is incompatible with multi-channels (such as "exec") would lead to a
segmentation fault instead of an error message.
For example:
  (qemu) migrate_set_capability multifd on
  (qemu) migrate -d "exec:cat > /tmp/vm_state"
  Segmentation fault (core dumped)

Fix it by checking multi-channel compatibility for all transport types.

Cc: qemu-stable <qemu-stable@nongnu.org>
Fixes: d95533e1cdcc ("migration: modify migration_channels_and_uri_compatible() 
for new QAPI syntax")
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/20240125162528.7552-2-avihaih@nvidia.com
Signed-off-by: Peter Xu <peterx@redhat.com>


  Commit: e1921f10d9afe651f4887284e85f6789b37e67d3
      
https://github.com/qemu/qemu/commit/e1921f10d9afe651f4887284e85f6789b37e67d3
  Author: Fabiano Rosas <farosas@suse.de>
  Date:   2024-02-07 (Wed, 07 Feb 2024)

  Changed paths:
    M migration/multifd.c
    M migration/multifd.h

  Log Message:
  -----------
  migration/multifd: Join the TLS thread

We're currently leaking the resources of the TLS thread by not joining
it and also overwriting the p->thread pointer altogether.

Fixes: a1af605bd5 ("migration/multifd: fix hangup with TLS-Multifd due to 
blocking handshake")
Cc: qemu-stable <qemu-stable@nongnu.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240206215118.6171-2-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>


  Commit: a2a63c4abd52f4e3ff4046dcb67fe44ebf0bb8de
      
https://github.com/qemu/qemu/commit/a2a63c4abd52f4e3ff4046dcb67fe44ebf0bb8de
  Author: Fabiano Rosas <farosas@suse.de>
  Date:   2024-02-07 (Wed, 07 Feb 2024)

  Changed paths:
    M migration/multifd.c
    M migration/multifd.h

  Log Message:
  -----------
  migration/multifd: Remove p->running

We currently only need p->running to avoid calling qemu_thread_join()
on a non existent thread if the thread has never been created.

However, there are at least two bugs in this logic:

1) On the sending side, p->running is set too early and
qemu_thread_create() can be skipped due to an error during TLS
handshake, leaving the flag set and leading to a crash when
multifd_send_cleanup() calls qemu_thread_join().

2) During exit, the multifd thread clears the flag while holding the
channel lock. The counterpart at multifd_send_cleanup() reads the flag
outside of the lock and might free the mutex while the multifd thread
still has it locked.

Fix the first issue by setting the flag right before creating the
thread. Rename it from p->running to p->thread_created to clarify its
usage.

Fix the second issue by not clearing the flag at the multifd thread
exit. We don't have any use for that.

Note that these bugs are straight-forward logic issues and not race
conditions. There is still a gap for races to affect this code due to
multifd_send_cleanup() being allowed to run concurrently with the
thread creation loop. This issue is solved in the next patches.

Cc: qemu-stable <qemu-stable@nongnu.org>
Fixes: 29647140157a ("migration/tls: add support for multifd tls-handshake")
Reported-by: Avihai Horon <avihaih@nvidia.com>
Reported-by: chenyuhui5@huawei.com
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240206215118.6171-3-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>


  Commit: bd8b0a8f82d8fc17aa285ab963ba75675c2fbe7a
      
https://github.com/qemu/qemu/commit/bd8b0a8f82d8fc17aa285ab963ba75675c2fbe7a
  Author: Fabiano Rosas <farosas@suse.de>
  Date:   2024-02-07 (Wed, 07 Feb 2024)

  Changed paths:
    M migration/migration.c
    M migration/multifd.c
    M migration/multifd.h

  Log Message:
  -----------
  migration/multifd: Move multifd_send_setup error handling in to the function

Hide the error handling inside multifd_send_setup to make it cleaner
for the next patch to move the function around.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240206215118.6171-4-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>


  Commit: dd904bc13f2af0c605c3fe72f118ea4e27a6610c
      
https://github.com/qemu/qemu/commit/dd904bc13f2af0c605c3fe72f118ea4e27a6610c
  Author: Fabiano Rosas <farosas@suse.de>
  Date:   2024-02-07 (Wed, 07 Feb 2024)

  Changed paths:
    M migration/migration.c

  Log Message:
  -----------
  migration/multifd: Move multifd_send_setup into migration thread

We currently have an unfavorable situation around multifd channels
creation and the migration thread execution.

We create the multifd channels with qio_channel_socket_connect_async
-> qio_task_run_in_thread, but only connect them at the
multifd_new_send_channel_async callback, called from
qio_task_complete, which is registered as a glib event.

So at multifd_send_setup() we create the channels, but they will only
be actually usable after the whole multifd_send_setup() calling stack
returns back to the main loop. Which means that the migration thread
is already up and running without any possibility for the multifd
channels to be ready on time.

We currently rely on the channels-ready semaphore blocking
multifd_send_sync_main() until channels start to come up and release
it. However there have been bugs recently found when a channel's
creation fails and multifd_send_cleanup() is allowed to run while
other channels are still being created.

Let's start to organize this situation by moving the
multifd_send_setup() call into the migration thread. That way we
unblock the main-loop to dispatch the completion callbacks and
actually have a chance of getting the multifd channels ready for when
the migration thread needs them.

The next patches will deal with the synchronization aspects.

Note that this takes multifd_send_setup() out of the BQL.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240206215118.6171-5-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>


  Commit: 2576ae488ef9aa692486157df7d8b410919cd219
      
https://github.com/qemu/qemu/commit/2576ae488ef9aa692486157df7d8b410919cd219
  Author: Fabiano Rosas <farosas@suse.de>
  Date:   2024-02-07 (Wed, 07 Feb 2024)

  Changed paths:
    M migration/multifd.c

  Log Message:
  -----------
  migration/multifd: Unify multifd and TLS connection paths

During multifd channel creation (multifd_send_new_channel_async) when
TLS is enabled, the multifd_channel_connect function is called twice,
once to create the TLS handshake thread and another time after the
asynchrounous TLS handshake has finished.

This creates a slightly confusing call stack where
multifd_channel_connect() is called more times than the number of
channels. It also splits error handling between the two callers of
multifd_channel_connect() causing some code duplication. Lastly, it
gets in the way of having a single point to determine whether all
channel creation tasks have been initiated.

Refactor the code to move the reentrancy one level up at the
multifd_new_send_channel_async() level, de-duplicating the error
handling and allowing for the next patch to introduce a
synchronization point common to all the multifd channel creation,
regardless of TLS.

Note that the previous code would never fail once p->c had been set.
This patch changes this assumption, which affects refcounting, so add
comments around object_unref to explain the situation.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240206215118.6171-6-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>


  Commit: 93fa9dc2e0522c54b813dee0898a5feb98b624c9
      
https://github.com/qemu/qemu/commit/93fa9dc2e0522c54b813dee0898a5feb98b624c9
  Author: Fabiano Rosas <farosas@suse.de>
  Date:   2024-02-07 (Wed, 07 Feb 2024)

  Changed paths:
    M migration/multifd.c

  Log Message:
  -----------
  migration/multifd: Add a synchronization point for channel creation

It is possible that one of the multifd channels fails to be created at
multifd_new_send_channel_async() while the rest of the channel
creation tasks are still in flight.

This could lead to multifd_save_cleanup() executing the
qemu_thread_join() loop too early and not waiting for the threads
which haven't been created yet, leading to the freeing of resources
that the newly created threads will try to access and crash.

Add a synchronization point after which there will be no attempts at
thread creation and therefore calling multifd_save_cleanup() past that
point will ensure it properly waits for the threads.

A note about performance: Prior to this patch, if a channel took too
long to be established, other channels could finish connecting first
and already start taking load. Now we're bounded by the
slowest-connecting channel.

Reported-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240206215118.6171-7-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>


  Commit: bdb0ade663c73270ceaec719c62f59bf049afbbe
      
https://github.com/qemu/qemu/commit/bdb0ade663c73270ceaec719c62f59bf049afbbe
  Author: Peter Xu <peterx@redhat.com>
  Date:   2024-02-07 (Wed, 07 Feb 2024)

  Changed paths:
    M tests/qtest/migration-test.c

  Log Message:
  -----------
  tests/migration-test: Stick with gicv3 in aarch64 test

Recently we introduced cross-binary migration test.  It's always wanted
that migration-test uses stable guest ABI for both QEMU binaries in this
case, so that both QEMU binaries will be compatible on the migration
stream with the cmdline specified.

Switch to a static gic version "3" rather than using version "max", so that
GIC should be stable now across any future QEMU binaries for migration-test.

Here the version can actually be anything as long as the ABI is stable.  We
choose "3" because it's the majority of what we already use in QEMU while
still new enough: "git grep gic-version=3" shows 6 hit, while version 4 has
no direct user yet besides "max".

Note that even with this change, aarch64 won't be able to work yet with
migration cross binary test, but then the only missing piece will be the
stable CPU model.

Reviewed-by: "Daniel P. Berrangé" <berrange@redhat.com>
Link: https://lore.kernel.org/r/20240207005403.242235-2-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>


  Commit: 70704779028fb2bda8963c39a120e22dc07e66b9
      
https://github.com/qemu/qemu/commit/70704779028fb2bda8963c39a120e22dc07e66b9
  Author: Peter Xu <peterx@redhat.com>
  Date:   2024-02-07 (Wed, 07 Feb 2024)

  Changed paths:
    M .gitlab-ci.d/buildtest.yml

  Log Message:
  -----------
  ci: Remove tag dependency for build-previous-qemu

The new build-previous-qemu job relies on QEMU release tag being present,
while that may not be always true for personal git repositories since by
default tag is not pushed.  The job can fail on those CI kicks, as reported
by Peter Maydell.

Fix it by fetching the tags remotely from the official repository, as
suggested by Dan.

[1] https://lore.kernel.org/r/ZcC9ScKJ7VvqektA@redhat.com

Reported-by: Peter Maydell <peter.maydell@linaro.org>
Suggested-by: "Daniel P. Berrangé" <berrange@redhat.com>
Reviewed-by: "Daniel P. Berrangé" <berrange@redhat.com>
Link: https://lore.kernel.org/r/20240207005403.242235-3-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>


  Commit: 940bf8ff1ca82aa458c553d9aa9dd7671ed15a4d
      
https://github.com/qemu/qemu/commit/940bf8ff1ca82aa458c553d9aa9dd7671ed15a4d
  Author: Peter Xu <peterx@redhat.com>
  Date:   2024-02-07 (Wed, 07 Feb 2024)

  Changed paths:
    M .gitlab-ci.d/buildtest.yml

  Log Message:
  -----------
  ci: Update comment for migration-compat-aarch64

It turns out that we may not be able to enable this test even for the
upcoming v9.0.  Document what we're still missing.

Reviewed-by: "Daniel P. Berrangé" <berrange@redhat.com>
Link: https://lore.kernel.org/r/20240207005403.242235-4-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>


  Commit: 5d1fc614413b10dd94858b07a1b2e26b1aa0296c
      
https://github.com/qemu/qemu/commit/5d1fc614413b10dd94858b07a1b2e26b1aa0296c
  Author: Peter Maydell <peter.maydell@linaro.org>
  Date:   2024-02-09 (Fri, 09 Feb 2024)

  Changed paths:
    M .gitlab-ci.d/buildtest.yml
    M accel/kvm/kvm-all.c
    M accel/stubs/kvm-stub.c
    M include/sysemu/kvm.h
    M migration/migration.c
    M migration/multifd-zlib.c
    M migration/multifd-zstd.c
    M migration/multifd.c
    M migration/multifd.h
    M migration/ram.c
    M migration/trace-events
    M tests/qtest/migration-test.c

  Log Message:
  -----------
  Merge tag 'migration-staging-pull-request' of https://gitlab.com/peterx/qemu 
into staging

Migration pull

- William's fix on hwpoison migration which used to crash QEMU
- Peter's multifd cleanup + bugfix + optimizations
- Avihai's fix on multifd crash over non-socket channels
- Fabiano's multifd thread-race fix
- Peter's CI fix series

# -----BEGIN PGP SIGNATURE-----
#
# iIgEABYKADAWIQS5GE3CDMRX2s990ak7X8zN86vXBgUCZcREtRIccGV0ZXJ4QHJl
# ZGhhdC5jb20ACgkQO1/MzfOr1wacrwEAl2aeQkh51h/e+OKX7MG4/4Y6Edf6Oz7o
# IJLk/cyrUFQA/2exo2lOdv5zHNOJKwAYj8HYDraezrC/MK1eED4Wji0M
# =k53l
# -----END PGP SIGNATURE-----
# gpg: Signature made Thu 08 Feb 2024 03:04:21 GMT
# gpg:                using EDDSA key B9184DC20CC457DACF7DD1A93B5FCCCDF3ABD706
# gpg:                issuer "peterx@redhat.com"
# gpg: Good signature from "Peter Xu <xzpeter@gmail.com>" [marginal]
# gpg:                 aka "Peter Xu <peterx@redhat.com>" [marginal]
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: B918 4DC2 0CC4 57DA CF7D  D1A9 3B5F CCCD F3AB D706

* tag 'migration-staging-pull-request' of https://gitlab.com/peterx/qemu: (34 
commits)
  ci: Update comment for migration-compat-aarch64
  ci: Remove tag dependency for build-previous-qemu
  tests/migration-test: Stick with gicv3 in aarch64 test
  migration/multifd: Add a synchronization point for channel creation
  migration/multifd: Unify multifd and TLS connection paths
  migration/multifd: Move multifd_send_setup into migration thread
  migration/multifd: Move multifd_send_setup error handling in to the function
  migration/multifd: Remove p->running
  migration/multifd: Join the TLS thread
  migration: Fix logic of channels and transport compatibility check
  migration/multifd: Optimize sender side to be lockless
  migration/multifd: Fix MultiFDSendParams.packet_num race
  migration/multifd: Stick with send/recv on function names
  migration/multifd: Cleanup multifd_load_cleanup()
  migration/multifd: Cleanup multifd_save_cleanup()
  migration/multifd: Rewrite multifd_queue_page()
  migration/multifd: Change retval of multifd_send_pages()
  migration/multifd: Change retval of multifd_queue_page()
  migration/multifd: Split multifd_send_terminate_threads()
  migration/multifd: Forbid spurious wakeups
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>


Compare: https://github.com/qemu/qemu/compare/e2beaf7bad96...5d1fc614413b
[Prev in Thread]
Current Thread
[Next in Thread]
[Qemu-commits] [qemu/qemu] 06152b: migration: prevent migration when VM has poisoned ..., Peter Maydell, 2024/02/09
- [Qemu-commits] [qemu/qemu] 06152b: migration: prevent migration when VM has poisoned ..., Peter Maydell <=
Prev by Date: [Qemu-commits] [qemu/qemu] 06152b: migration: prevent migration when VM has poisoned ...
Next by Date: [Qemu-commits] [qemu/qemu] 878502: target/riscv: Check for 'A' extension on all atomi...
Previous by thread: [Qemu-commits] [qemu/qemu] 06152b: migration: prevent migration when VM has poisoned ...
Next by thread: [Qemu-commits] [qemu/qemu] 878502: target/riscv: Check for 'A' extension on all atomi...
Index(es):
- Date
- Thread