[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PULL 18/19] migration: Split log_clear() into smaller
From: |
Dr. David Alan Gilbert |
Subject: |
Re: [Qemu-devel] [PULL 18/19] migration: Split log_clear() into smaller chunks |
Date: |
Thu, 11 Jul 2019 12:51:53 +0100 |
User-agent: |
Mutt/1.12.0 (2019-05-25) |
* Juan Quintela (address@hidden) wrote:
> From: Peter Xu <address@hidden>
>
> Currently we are doing log_clear() right after log_sync() which mostly
> keeps the old behavior when log_clear() was still part of log_sync().
>
> This patch tries to further optimize the migration log_clear() code
> path to split huge log_clear()s into smaller chunks.
>
> We do this by spliting the whole guest memory region into memory
> chunks, whose size is decided by MigrationState.clear_bitmap_shift (an
> example will be given below). With that, we don't do the dirty bitmap
> clear operation on the remote node (e.g., KVM) when we fetch the dirty
> bitmap, instead we explicitly clear the dirty bitmap for the memory
> chunk for each of the first time we send a page in that chunk.
>
> Here comes an example.
>
> Assuming the guest has 64G memory, then before this patch the KVM
> ioctl KVM_CLEAR_DIRTY_LOG will be a single one covering 64G memory.
> If after the patch, let's assume when the clear bitmap shift is 18,
> then the memory chunk size on x86_64 will be 1UL<<18 * 4K = 1GB. Then
> instead of sending a big 64G ioctl, we'll send 64 small ioctls, each
> of the ioctl will cover 1G of the guest memory. For each of the 64
> small ioctls, we'll only send if any of the page in that small chunk
> was going to be sent right away.
>
> Signed-off-by: Peter Xu <address@hidden>
> Reviewed-by: Juan Quintela <address@hidden>
> Message-Id: <address@hidden>
> Signed-off-by: Juan Quintela <address@hidden>
> ---
> include/exec/memory.h.rej | 26 +
> include/exec/ram_addr.h | 76 +-
> include/exec/ram_addr.h.orig | 488 ++++
> memory.c.rej | 17 +
> migration/migration.c | 4 +
> migration/migration.h | 27 +
> migration/migration.h.orig | 315 +++
> migration/ram.c | 44 +
> migration/ram.c.orig | 4599 ++++++++++++++++++++++++++++++++++
> migration/ram.c.rej | 33 +
> migration/trace-events | 1 +
> migration/trace-events.orig | 297 +++
It looks like this patch has had an accident with 'patch' and git commit
Dave
> 12 files changed, 5925 insertions(+), 2 deletions(-)
> create mode 100644 include/exec/memory.h.rej
> create mode 100644 include/exec/ram_addr.h.orig
> create mode 100644 memory.c.rej
> create mode 100644 migration/migration.h.orig
> create mode 100644 migration/ram.c.orig
> create mode 100644 migration/ram.c.rej
> create mode 100644 migration/trace-events.orig
>
> diff --git a/include/exec/memory.h.rej b/include/exec/memory.h.rej
> new file mode 100644
> index 0000000000..66aa66616a
> --- /dev/null
> +++ b/include/exec/memory.h.rej
> @@ -0,0 +1,26 @@
> +--- include/exec/memory.h
> ++++ include/exec/memory.h
> +@@ -1254,23 +1254,6 @@ void memory_region_ram_resize(MemoryRegion *mr,
> ram_addr_t newsize,
> + */
> + void memory_region_set_log(MemoryRegion *mr, bool log, unsigned client);
> +
> +-/**
> +- * memory_region_get_dirty: Check whether a range of bytes is dirty
> +- * for a specified client.
> +- *
> +- * Checks whether a range of bytes has been written to since the last
> +- * call to memory_region_reset_dirty() with the same @client. Dirty logging
> +- * must be enabled.
> +- *
> +- * @mr: the memory region being queried.
> +- * @addr: the address (relative to the start of the region) being queried.
> +- * @size: the size of the range being queried.
> +- * @client: the user of the logging information; %DIRTY_MEMORY_MIGRATION or
> +- * %DIRTY_MEMORY_VGA.
> +- */
> +-bool memory_region_get_dirty(MemoryRegion *mr, hwaddr addr,
> +- hwaddr size, unsigned client);
> +-
> + /**
> + * memory_region_set_dirty: Mark a range of bytes as dirty in a memory
> region.
> + *
> diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
> index 222b4338fb..b7b2e60ff6 100644
> --- a/include/exec/ram_addr.h
> +++ b/include/exec/ram_addr.h
> @@ -51,8 +51,70 @@ struct RAMBlock {
> unsigned long *unsentmap;
> /* bitmap of already received pages in postcopy */
> unsigned long *receivedmap;
> +
> + /*
> + * bitmap to track already cleared dirty bitmap. When the bit is
> + * set, it means the corresponding memory chunk needs a log-clear.
> + * Set this up to non-NULL to enable the capability to postpone
> + * and split clearing of dirty bitmap on the remote node (e.g.,
> + * KVM). The bitmap will be set only when doing global sync.
> + *
> + * NOTE: this bitmap is different comparing to the other bitmaps
> + * in that one bit can represent multiple guest pages (which is
> + * decided by the `clear_bmap_shift' variable below). On
> + * destination side, this should always be NULL, and the variable
> + * `clear_bmap_shift' is meaningless.
> + */
> + unsigned long *clear_bmap;
> + uint8_t clear_bmap_shift;
> };
>
> +/**
> + * clear_bmap_size: calculate clear bitmap size
> + *
> + * @pages: number of guest pages
> + * @shift: guest page number shift
> + *
> + * Returns: number of bits for the clear bitmap
> + */
> +static inline long clear_bmap_size(uint64_t pages, uint8_t shift)
> +{
> + return DIV_ROUND_UP(pages, 1UL << shift);
> +}
> +
> +/**
> + * clear_bmap_set: set clear bitmap for the page range
> + *
> + * @rb: the ramblock to operate on
> + * @start: the start page number
> + * @size: number of pages to set in the bitmap
> + *
> + * Returns: None
> + */
> +static inline void clear_bmap_set(RAMBlock *rb, uint64_t start,
> + uint64_t npages)
> +{
> + uint8_t shift = rb->clear_bmap_shift;
> +
> + bitmap_set_atomic(rb->clear_bmap, start >> shift,
> + clear_bmap_size(npages, shift));
> +}
> +
> +/**
> + * clear_bmap_test_and_clear: test clear bitmap for the page, clear if set
> + *
> + * @rb: the ramblock to operate on
> + * @page: the page number to check
> + *
> + * Returns: true if the bit was set, false otherwise
> + */
> +static inline bool clear_bmap_test_and_clear(RAMBlock *rb, uint64_t page)
> +{
> + uint8_t shift = rb->clear_bmap_shift;
> +
> + return bitmap_test_and_clear_atomic(rb->clear_bmap, page >> shift, 1);
> +}
> +
> static inline bool offset_in_ramblock(RAMBlock *b, ram_addr_t offset)
> {
> return (b && b->host && offset < b->used_length) ? true : false;
> @@ -463,8 +525,18 @@ uint64_t cpu_physical_memory_sync_dirty_bitmap(RAMBlock
> *rb,
> }
> }
>
> - /* TODO: split the huge bitmap into smaller chunks */
> - memory_region_clear_dirty_bitmap(rb->mr, start, length);
> + if (rb->clear_bmap) {
> + /*
> + * Postpone the dirty bitmap clear to the point before we
> + * really send the pages, also we will split the clear
> + * dirty procedure into smaller chunks.
> + */
> + clear_bmap_set(rb, start >> TARGET_PAGE_BITS,
> + length >> TARGET_PAGE_BITS);
> + } else {
> + /* Slow path - still do that in a huge chunk */
> + memory_region_clear_dirty_bitmap(rb->mr, start, length);
> + }
> } else {
> ram_addr_t offset = rb->offset;
>
> diff --git a/include/exec/ram_addr.h.orig b/include/exec/ram_addr.h.orig
> new file mode 100644
> index 0000000000..222b4338fb
> --- /dev/null
> +++ b/include/exec/ram_addr.h.orig
> @@ -0,0 +1,488 @@
> +/*
> + * Declarations for cpu physical memory functions
> + *
> + * Copyright 2011 Red Hat, Inc. and/or its affiliates
> + *
> + * Authors:
> + * Avi Kivity <address@hidden>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or
> + * later. See the COPYING file in the top-level directory.
> + *
> + */
> +
> +/*
> + * This header is for use by exec.c and memory.c ONLY. Do not include it.
> + * The functions declared here will be removed soon.
> + */
> +
> +#ifndef RAM_ADDR_H
> +#define RAM_ADDR_H
> +
> +#ifndef CONFIG_USER_ONLY
> +#include "hw/xen/xen.h"
> +#include "sysemu/tcg.h"
> +#include "exec/ramlist.h"
> +
> +struct RAMBlock {
> + struct rcu_head rcu;
> + struct MemoryRegion *mr;
> + uint8_t *host;
> + uint8_t *colo_cache; /* For colo, VM's ram cache */
> + ram_addr_t offset;
> + ram_addr_t used_length;
> + ram_addr_t max_length;
> + void (*resized)(const char*, uint64_t length, void *host);
> + uint32_t flags;
> + /* Protected by iothread lock. */
> + char idstr[256];
> + /* RCU-enabled, writes protected by the ramlist lock */
> + QLIST_ENTRY(RAMBlock) next;
> + QLIST_HEAD(, RAMBlockNotifier) ramblock_notifiers;
> + int fd;
> + size_t page_size;
> + /* dirty bitmap used during migration */
> + unsigned long *bmap;
> + /* bitmap of pages that haven't been sent even once
> + * only maintained and used in postcopy at the moment
> + * where it's used to send the dirtymap at the start
> + * of the postcopy phase
> + */
> + unsigned long *unsentmap;
> + /* bitmap of already received pages in postcopy */
> + unsigned long *receivedmap;
> +};
> +
> +static inline bool offset_in_ramblock(RAMBlock *b, ram_addr_t offset)
> +{
> + return (b && b->host && offset < b->used_length) ? true : false;
> +}
> +
> +static inline void *ramblock_ptr(RAMBlock *block, ram_addr_t offset)
> +{
> + assert(offset_in_ramblock(block, offset));
> + return (char *)block->host + offset;
> +}
> +
> +static inline unsigned long int ramblock_recv_bitmap_offset(void *host_addr,
> + RAMBlock *rb)
> +{
> + uint64_t host_addr_offset =
> + (uint64_t)(uintptr_t)(host_addr - (void *)rb->host);
> + return host_addr_offset >> TARGET_PAGE_BITS;
> +}
> +
> +bool ramblock_is_pmem(RAMBlock *rb);
> +
> +long qemu_minrampagesize(void);
> +long qemu_maxrampagesize(void);
> +
> +/**
> + * qemu_ram_alloc_from_file,
> + * qemu_ram_alloc_from_fd: Allocate a ram block from the specified backing
> + * file or device
> + *
> + * Parameters:
> + * @size: the size in bytes of the ram block
> + * @mr: the memory region where the ram block is
> + * @ram_flags: specify the properties of the ram block, which can be one
> + * or bit-or of following values
> + * - RAM_SHARED: mmap the backing file or device with MAP_SHARED
> + * - RAM_PMEM: the backend @mem_path or @fd is persistent memory
> + * Other bits are ignored.
> + * @mem_path or @fd: specify the backing file or device
> + * @errp: pointer to Error*, to store an error if it happens
> + *
> + * Return:
> + * On success, return a pointer to the ram block.
> + * On failure, return NULL.
> + */
> +RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
> + uint32_t ram_flags, const char *mem_path,
> + Error **errp);
> +RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
> + uint32_t ram_flags, int fd,
> + Error **errp);
> +
> +RAMBlock *qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
> + MemoryRegion *mr, Error **errp);
> +RAMBlock *qemu_ram_alloc(ram_addr_t size, bool share, MemoryRegion *mr,
> + Error **errp);
> +RAMBlock *qemu_ram_alloc_resizeable(ram_addr_t size, ram_addr_t max_size,
> + void (*resized)(const char*,
> + uint64_t length,
> + void *host),
> + MemoryRegion *mr, Error **errp);
> +void qemu_ram_free(RAMBlock *block);
> +
> +int qemu_ram_resize(RAMBlock *block, ram_addr_t newsize, Error **errp);
> +
> +#define DIRTY_CLIENTS_ALL ((1 << DIRTY_MEMORY_NUM) - 1)
> +#define DIRTY_CLIENTS_NOCODE (DIRTY_CLIENTS_ALL & ~(1 << DIRTY_MEMORY_CODE))
> +
> +void tb_invalidate_phys_range(ram_addr_t start, ram_addr_t end);
> +
> +static inline bool cpu_physical_memory_get_dirty(ram_addr_t start,
> + ram_addr_t length,
> + unsigned client)
> +{
> + DirtyMemoryBlocks *blocks;
> + unsigned long end, page;
> + unsigned long idx, offset, base;
> + bool dirty = false;
> +
> + assert(client < DIRTY_MEMORY_NUM);
> +
> + end = TARGET_PAGE_ALIGN(start + length) >> TARGET_PAGE_BITS;
> + page = start >> TARGET_PAGE_BITS;
> +
> + rcu_read_lock();
> +
> + blocks = atomic_rcu_read(&ram_list.dirty_memory[client]);
> +
> + idx = page / DIRTY_MEMORY_BLOCK_SIZE;
> + offset = page % DIRTY_MEMORY_BLOCK_SIZE;
> + base = page - offset;
> + while (page < end) {
> + unsigned long next = MIN(end, base + DIRTY_MEMORY_BLOCK_SIZE);
> + unsigned long num = next - base;
> + unsigned long found = find_next_bit(blocks->blocks[idx], num,
> offset);
> + if (found < num) {
> + dirty = true;
> + break;
> + }
> +
> + page = next;
> + idx++;
> + offset = 0;
> + base += DIRTY_MEMORY_BLOCK_SIZE;
> + }
> +
> + rcu_read_unlock();
> +
> + return dirty;
> +}
> +
> +static inline bool cpu_physical_memory_all_dirty(ram_addr_t start,
> + ram_addr_t length,
> + unsigned client)
> +{
> + DirtyMemoryBlocks *blocks;
> + unsigned long end, page;
> + unsigned long idx, offset, base;
> + bool dirty = true;
> +
> + assert(client < DIRTY_MEMORY_NUM);
> +
> + end = TARGET_PAGE_ALIGN(start + length) >> TARGET_PAGE_BITS;
> + page = start >> TARGET_PAGE_BITS;
> +
> + rcu_read_lock();
> +
> + blocks = atomic_rcu_read(&ram_list.dirty_memory[client]);
> +
> + idx = page / DIRTY_MEMORY_BLOCK_SIZE;
> + offset = page % DIRTY_MEMORY_BLOCK_SIZE;
> + base = page - offset;
> + while (page < end) {
> + unsigned long next = MIN(end, base + DIRTY_MEMORY_BLOCK_SIZE);
> + unsigned long num = next - base;
> + unsigned long found = find_next_zero_bit(blocks->blocks[idx], num,
> offset);
> + if (found < num) {
> + dirty = false;
> + break;
> + }
> +
> + page = next;
> + idx++;
> + offset = 0;
> + base += DIRTY_MEMORY_BLOCK_SIZE;
> + }
> +
> + rcu_read_unlock();
> +
> + return dirty;
> +}
> +
> +static inline bool cpu_physical_memory_get_dirty_flag(ram_addr_t addr,
> + unsigned client)
> +{
> + return cpu_physical_memory_get_dirty(addr, 1, client);
> +}
> +
> +static inline bool cpu_physical_memory_is_clean(ram_addr_t addr)
> +{
> + bool vga = cpu_physical_memory_get_dirty_flag(addr, DIRTY_MEMORY_VGA);
> + bool code = cpu_physical_memory_get_dirty_flag(addr, DIRTY_MEMORY_CODE);
> + bool migration =
> + cpu_physical_memory_get_dirty_flag(addr, DIRTY_MEMORY_MIGRATION);
> + return !(vga && code && migration);
> +}
> +
> +static inline uint8_t cpu_physical_memory_range_includes_clean(ram_addr_t
> start,
> + ram_addr_t
> length,
> + uint8_t mask)
> +{
> + uint8_t ret = 0;
> +
> + if (mask & (1 << DIRTY_MEMORY_VGA) &&
> + !cpu_physical_memory_all_dirty(start, length, DIRTY_MEMORY_VGA)) {
> + ret |= (1 << DIRTY_MEMORY_VGA);
> + }
> + if (mask & (1 << DIRTY_MEMORY_CODE) &&
> + !cpu_physical_memory_all_dirty(start, length, DIRTY_MEMORY_CODE)) {
> + ret |= (1 << DIRTY_MEMORY_CODE);
> + }
> + if (mask & (1 << DIRTY_MEMORY_MIGRATION) &&
> + !cpu_physical_memory_all_dirty(start, length,
> DIRTY_MEMORY_MIGRATION)) {
> + ret |= (1 << DIRTY_MEMORY_MIGRATION);
> + }
> + return ret;
> +}
> +
> +static inline void cpu_physical_memory_set_dirty_flag(ram_addr_t addr,
> + unsigned client)
> +{
> + unsigned long page, idx, offset;
> + DirtyMemoryBlocks *blocks;
> +
> + assert(client < DIRTY_MEMORY_NUM);
> +
> + page = addr >> TARGET_PAGE_BITS;
> + idx = page / DIRTY_MEMORY_BLOCK_SIZE;
> + offset = page % DIRTY_MEMORY_BLOCK_SIZE;
> +
> + rcu_read_lock();
> +
> + blocks = atomic_rcu_read(&ram_list.dirty_memory[client]);
> +
> + set_bit_atomic(offset, blocks->blocks[idx]);
> +
> + rcu_read_unlock();
> +}
> +
> +static inline void cpu_physical_memory_set_dirty_range(ram_addr_t start,
> + ram_addr_t length,
> + uint8_t mask)
> +{
> + DirtyMemoryBlocks *blocks[DIRTY_MEMORY_NUM];
> + unsigned long end, page;
> + unsigned long idx, offset, base;
> + int i;
> +
> + if (!mask && !xen_enabled()) {
> + return;
> + }
> +
> + end = TARGET_PAGE_ALIGN(start + length) >> TARGET_PAGE_BITS;
> + page = start >> TARGET_PAGE_BITS;
> +
> + rcu_read_lock();
> +
> + for (i = 0; i < DIRTY_MEMORY_NUM; i++) {
> + blocks[i] = atomic_rcu_read(&ram_list.dirty_memory[i]);
> + }
> +
> + idx = page / DIRTY_MEMORY_BLOCK_SIZE;
> + offset = page % DIRTY_MEMORY_BLOCK_SIZE;
> + base = page - offset;
> + while (page < end) {
> + unsigned long next = MIN(end, base + DIRTY_MEMORY_BLOCK_SIZE);
> +
> + if (likely(mask & (1 << DIRTY_MEMORY_MIGRATION))) {
> + bitmap_set_atomic(blocks[DIRTY_MEMORY_MIGRATION]->blocks[idx],
> + offset, next - page);
> + }
> + if (unlikely(mask & (1 << DIRTY_MEMORY_VGA))) {
> + bitmap_set_atomic(blocks[DIRTY_MEMORY_VGA]->blocks[idx],
> + offset, next - page);
> + }
> + if (unlikely(mask & (1 << DIRTY_MEMORY_CODE))) {
> + bitmap_set_atomic(blocks[DIRTY_MEMORY_CODE]->blocks[idx],
> + offset, next - page);
> + }
> +
> + page = next;
> + idx++;
> + offset = 0;
> + base += DIRTY_MEMORY_BLOCK_SIZE;
> + }
> +
> + rcu_read_unlock();
> +
> + xen_hvm_modified_memory(start, length);
> +}
> +
> +#if !defined(_WIN32)
> +static inline void cpu_physical_memory_set_dirty_lebitmap(unsigned long
> *bitmap,
> + ram_addr_t start,
> + ram_addr_t pages)
> +{
> + unsigned long i, j;
> + unsigned long page_number, c;
> + hwaddr addr;
> + ram_addr_t ram_addr;
> + unsigned long len = (pages + HOST_LONG_BITS - 1) / HOST_LONG_BITS;
> + unsigned long hpratio = getpagesize() / TARGET_PAGE_SIZE;
> + unsigned long page = BIT_WORD(start >> TARGET_PAGE_BITS);
> +
> + /* start address is aligned at the start of a word? */
> + if ((((page * BITS_PER_LONG) << TARGET_PAGE_BITS) == start) &&
> + (hpratio == 1)) {
> + unsigned long **blocks[DIRTY_MEMORY_NUM];
> + unsigned long idx;
> + unsigned long offset;
> + long k;
> + long nr = BITS_TO_LONGS(pages);
> +
> + idx = (start >> TARGET_PAGE_BITS) / DIRTY_MEMORY_BLOCK_SIZE;
> + offset = BIT_WORD((start >> TARGET_PAGE_BITS) %
> + DIRTY_MEMORY_BLOCK_SIZE);
> +
> + rcu_read_lock();
> +
> + for (i = 0; i < DIRTY_MEMORY_NUM; i++) {
> + blocks[i] = atomic_rcu_read(&ram_list.dirty_memory[i])->blocks;
> + }
> +
> + for (k = 0; k < nr; k++) {
> + if (bitmap[k]) {
> + unsigned long temp = leul_to_cpu(bitmap[k]);
> +
> + atomic_or(&blocks[DIRTY_MEMORY_VGA][idx][offset], temp);
> +
> + if (global_dirty_log) {
> + atomic_or(&blocks[DIRTY_MEMORY_MIGRATION][idx][offset],
> + temp);
> + }
> +
> + if (tcg_enabled()) {
> + atomic_or(&blocks[DIRTY_MEMORY_CODE][idx][offset], temp);
> + }
> + }
> +
> + if (++offset >= BITS_TO_LONGS(DIRTY_MEMORY_BLOCK_SIZE)) {
> + offset = 0;
> + idx++;
> + }
> + }
> +
> + rcu_read_unlock();
> +
> + xen_hvm_modified_memory(start, pages << TARGET_PAGE_BITS);
> + } else {
> + uint8_t clients = tcg_enabled() ? DIRTY_CLIENTS_ALL :
> DIRTY_CLIENTS_NOCODE;
> +
> + if (!global_dirty_log) {
> + clients &= ~(1 << DIRTY_MEMORY_MIGRATION);
> + }
> +
> + /*
> + * bitmap-traveling is faster than memory-traveling (for addr...)
> + * especially when most of the memory is not dirty.
> + */
> + for (i = 0; i < len; i++) {
> + if (bitmap[i] != 0) {
> + c = leul_to_cpu(bitmap[i]);
> + do {
> + j = ctzl(c);
> + c &= ~(1ul << j);
> + page_number = (i * HOST_LONG_BITS + j) * hpratio;
> + addr = page_number * TARGET_PAGE_SIZE;
> + ram_addr = start + addr;
> + cpu_physical_memory_set_dirty_range(ram_addr,
> + TARGET_PAGE_SIZE * hpratio, clients);
> + } while (c != 0);
> + }
> + }
> + }
> +}
> +#endif /* not _WIN32 */
> +
> +bool cpu_physical_memory_test_and_clear_dirty(ram_addr_t start,
> + ram_addr_t length,
> + unsigned client);
> +
> +DirtyBitmapSnapshot *cpu_physical_memory_snapshot_and_clear_dirty
> + (MemoryRegion *mr, hwaddr offset, hwaddr length, unsigned client);
> +
> +bool cpu_physical_memory_snapshot_get_dirty(DirtyBitmapSnapshot *snap,
> + ram_addr_t start,
> + ram_addr_t length);
> +
> +static inline void cpu_physical_memory_clear_dirty_range(ram_addr_t start,
> + ram_addr_t length)
> +{
> + cpu_physical_memory_test_and_clear_dirty(start, length,
> DIRTY_MEMORY_MIGRATION);
> + cpu_physical_memory_test_and_clear_dirty(start, length,
> DIRTY_MEMORY_VGA);
> + cpu_physical_memory_test_and_clear_dirty(start, length,
> DIRTY_MEMORY_CODE);
> +}
> +
> +
> +/* Called with RCU critical section */
> +static inline
> +uint64_t cpu_physical_memory_sync_dirty_bitmap(RAMBlock *rb,
> + ram_addr_t start,
> + ram_addr_t length,
> + uint64_t *real_dirty_pages)
> +{
> + ram_addr_t addr;
> + unsigned long word = BIT_WORD((start + rb->offset) >> TARGET_PAGE_BITS);
> + uint64_t num_dirty = 0;
> + unsigned long *dest = rb->bmap;
> +
> + /* start address and length is aligned at the start of a word? */
> + if (((word * BITS_PER_LONG) << TARGET_PAGE_BITS) ==
> + (start + rb->offset) &&
> + !(length & ((BITS_PER_LONG << TARGET_PAGE_BITS) - 1))) {
> + int k;
> + int nr = BITS_TO_LONGS(length >> TARGET_PAGE_BITS);
> + unsigned long * const *src;
> + unsigned long idx = (word * BITS_PER_LONG) / DIRTY_MEMORY_BLOCK_SIZE;
> + unsigned long offset = BIT_WORD((word * BITS_PER_LONG) %
> + DIRTY_MEMORY_BLOCK_SIZE);
> + unsigned long page = BIT_WORD(start >> TARGET_PAGE_BITS);
> +
> + src = atomic_rcu_read(
> + &ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION])->blocks;
> +
> + for (k = page; k < page + nr; k++) {
> + if (src[idx][offset]) {
> + unsigned long bits = atomic_xchg(&src[idx][offset], 0);
> + unsigned long new_dirty;
> + *real_dirty_pages += ctpopl(bits);
> + new_dirty = ~dest[k];
> + dest[k] |= bits;
> + new_dirty &= bits;
> + num_dirty += ctpopl(new_dirty);
> + }
> +
> + if (++offset >= BITS_TO_LONGS(DIRTY_MEMORY_BLOCK_SIZE)) {
> + offset = 0;
> + idx++;
> + }
> + }
> +
> + /* TODO: split the huge bitmap into smaller chunks */
> + memory_region_clear_dirty_bitmap(rb->mr, start, length);
> + } else {
> + ram_addr_t offset = rb->offset;
> +
> + for (addr = 0; addr < length; addr += TARGET_PAGE_SIZE) {
> + if (cpu_physical_memory_test_and_clear_dirty(
> + start + addr + offset,
> + TARGET_PAGE_SIZE,
> + DIRTY_MEMORY_MIGRATION)) {
> + *real_dirty_pages += 1;
> + long k = (start + addr) >> TARGET_PAGE_BITS;
> + if (!test_and_set_bit(k, dest)) {
> + num_dirty++;
> + }
> + }
> + }
> + }
> +
> + return num_dirty;
> +}
> +#endif
> +#endif
> diff --git a/memory.c.rej b/memory.c.rej
> new file mode 100644
> index 0000000000..bb1c1d0360
> --- /dev/null
> +++ b/memory.c.rej
> @@ -0,0 +1,17 @@
> +--- memory.c
> ++++ memory.c
> +@@ -2027,14 +2027,6 @@ void memory_region_set_log(MemoryRegion *mr, bool
> log, unsigned client)
> + memory_region_transaction_commit();
> + }
> +
> +-bool memory_region_get_dirty(MemoryRegion *mr, hwaddr addr,
> +- hwaddr size, unsigned client)
> +-{
> +- assert(mr->ram_block);
> +- return cpu_physical_memory_get_dirty(memory_region_get_ram_addr(mr) +
> addr,
> +- size, client);
> +-}
> +-
> + void memory_region_set_dirty(MemoryRegion *mr, hwaddr addr,
> + hwaddr size)
> + {
> diff --git a/migration/migration.c b/migration/migration.c
> index 2865ae3fa9..8a607fe1e2 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -3362,6 +3362,8 @@ void migration_global_dump(Monitor *mon)
> ms->send_section_footer ? "on" : "off");
> monitor_printf(mon, "decompress-error-check: %s\n",
> ms->decompress_error_check ? "on" : "off");
> + monitor_printf(mon, "clear-bitmap-shift: %u\n",
> + ms->clear_bitmap_shift);
> }
>
> #define DEFINE_PROP_MIG_CAP(name, x) \
> @@ -3376,6 +3378,8 @@ static Property migration_properties[] = {
> send_section_footer, true),
> DEFINE_PROP_BOOL("decompress-error-check", MigrationState,
> decompress_error_check, true),
> + DEFINE_PROP_UINT8("x-clear-bitmap-shift", MigrationState,
> + clear_bitmap_shift, CLEAR_BITMAP_SHIFT_DEFAULT),
>
> /* Migration parameters */
> DEFINE_PROP_UINT8("x-compress-level", MigrationState,
> diff --git a/migration/migration.h b/migration/migration.h
> index 5e8f09c6db..1fdd7b21fd 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -26,6 +26,23 @@ struct PostcopyBlocktimeContext;
>
> #define MIGRATION_RESUME_ACK_VALUE (1)
>
> +/*
> + * 1<<6=64 pages -> 256K chunk when page size is 4K. This gives us
> + * the benefit that all the chunks are 64 pages aligned then the
> + * bitmaps are always aligned to LONG.
> + */
> +#define CLEAR_BITMAP_SHIFT_MIN 6
> +/*
> + * 1<<18=256K pages -> 1G chunk when page size is 4K. This is the
> + * default value to use if no one specified.
> + */
> +#define CLEAR_BITMAP_SHIFT_DEFAULT 18
> +/*
> + * 1<<31=2G pages -> 8T chunk when page size is 4K. This should be
> + * big enough and make sure we won't overflow easily.
> + */
> +#define CLEAR_BITMAP_SHIFT_MAX 31
> +
> /* State for the incoming migration */
> struct MigrationIncomingState {
> QEMUFile *from_src_file;
> @@ -232,6 +249,16 @@ struct MigrationState
> * do not trigger spurious decompression errors.
> */
> bool decompress_error_check;
> +
> + /*
> + * This decides the size of guest memory chunk that will be used
> + * to track dirty bitmap clearing. The size of memory chunk will
> + * be GUEST_PAGE_SIZE << N. Say, N=0 means we will clear dirty
> + * bitmap for each page to send (1<<0=1); N=10 means we will clear
> + * dirty bitmap only once for 1<<10=1K continuous guest pages
> + * (which is in 4M chunk).
> + */
> + uint8_t clear_bitmap_shift;
> };
>
> void migrate_set_state(int *state, int old_state, int new_state);
> diff --git a/migration/migration.h.orig b/migration/migration.h.orig
> new file mode 100644
> index 0000000000..5e8f09c6db
> --- /dev/null
> +++ b/migration/migration.h.orig
> @@ -0,0 +1,315 @@
> +/*
> + * QEMU live migration
> + *
> + * Copyright IBM, Corp. 2008
> + *
> + * Authors:
> + * Anthony Liguori <address@hidden>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2. See
> + * the COPYING file in the top-level directory.
> + *
> + */
> +
> +#ifndef QEMU_MIGRATION_H
> +#define QEMU_MIGRATION_H
> +
> +#include "qapi/qapi-types-migration.h"
> +#include "qemu/thread.h"
> +#include "exec/cpu-common.h"
> +#include "qemu/coroutine_int.h"
> +#include "hw/qdev.h"
> +#include "io/channel.h"
> +#include "net/announce.h"
> +
> +struct PostcopyBlocktimeContext;
> +
> +#define MIGRATION_RESUME_ACK_VALUE (1)
> +
> +/* State for the incoming migration */
> +struct MigrationIncomingState {
> + QEMUFile *from_src_file;
> +
> + /*
> + * Free at the start of the main state load, set as the main thread
> finishes
> + * loading state.
> + */
> + QemuEvent main_thread_load_event;
> +
> + /* For network announces */
> + AnnounceTimer announce_timer;
> +
> + size_t largest_page_size;
> + bool have_fault_thread;
> + QemuThread fault_thread;
> + QemuSemaphore fault_thread_sem;
> + /* Set this when we want the fault thread to quit */
> + bool fault_thread_quit;
> +
> + bool have_listen_thread;
> + QemuThread listen_thread;
> + QemuSemaphore listen_thread_sem;
> +
> + /* For the kernel to send us notifications */
> + int userfault_fd;
> + /* To notify the fault_thread to wake, e.g., when need to quit */
> + int userfault_event_fd;
> + QEMUFile *to_src_file;
> + QemuMutex rp_mutex; /* We send replies from multiple threads */
> + /* RAMBlock of last request sent to source */
> + RAMBlock *last_rb;
> + void *postcopy_tmp_page;
> + void *postcopy_tmp_zero_page;
> + /* PostCopyFD's for external userfaultfds & handlers of shared memory */
> + GArray *postcopy_remote_fds;
> +
> + QEMUBH *bh;
> +
> + int state;
> +
> + bool have_colo_incoming_thread;
> + QemuThread colo_incoming_thread;
> + /* The coroutine we should enter (back) after failover */
> + Coroutine *migration_incoming_co;
> + QemuSemaphore colo_incoming_sem;
> +
> + /*
> + * PostcopyBlocktimeContext to keep information for postcopy
> + * live migration, to calculate vCPU block time
> + * */
> + struct PostcopyBlocktimeContext *blocktime_ctx;
> +
> + /* notify PAUSED postcopy incoming migrations to try to continue */
> + bool postcopy_recover_triggered;
> + QemuSemaphore postcopy_pause_sem_dst;
> + QemuSemaphore postcopy_pause_sem_fault;
> +
> + /* List of listening socket addresses */
> + SocketAddressList *socket_address_list;
> +};
> +
> +MigrationIncomingState *migration_incoming_get_current(void);
> +void migration_incoming_state_destroy(void);
> +/*
> + * Functions to work with blocktime context
> + */
> +void fill_destination_postcopy_migration_info(MigrationInfo *info);
> +
> +#define TYPE_MIGRATION "migration"
> +
> +#define MIGRATION_CLASS(klass) \
> + OBJECT_CLASS_CHECK(MigrationClass, (klass), TYPE_MIGRATION)
> +#define MIGRATION_OBJ(obj) \
> + OBJECT_CHECK(MigrationState, (obj), TYPE_MIGRATION)
> +#define MIGRATION_GET_CLASS(obj) \
> + OBJECT_GET_CLASS(MigrationClass, (obj), TYPE_MIGRATION)
> +
> +typedef struct MigrationClass {
> + /*< private >*/
> + DeviceClass parent_class;
> +} MigrationClass;
> +
> +struct MigrationState
> +{
> + /*< private >*/
> + DeviceState parent_obj;
> +
> + /*< public >*/
> + size_t bytes_xfer;
> + QemuThread thread;
> + QEMUBH *cleanup_bh;
> + QEMUFile *to_dst_file;
> + /*
> + * Protects to_dst_file pointer. We need to make sure we won't
> + * yield or hang during the critical section, since this lock will
> + * be used in OOB command handler.
> + */
> + QemuMutex qemu_file_lock;
> +
> + /*
> + * Used to allow urgent requests to override rate limiting.
> + */
> + QemuSemaphore rate_limit_sem;
> +
> + /* pages already send at the beginning of current iteration */
> + uint64_t iteration_initial_pages;
> +
> + /* pages transferred per second */
> + double pages_per_second;
> +
> + /* bytes already send at the beginning of current iteration */
> + uint64_t iteration_initial_bytes;
> + /* time at the start of current iteration */
> + int64_t iteration_start_time;
> + /*
> + * The final stage happens when the remaining data is smaller than
> + * this threshold; it's calculated from the requested downtime and
> + * measured bandwidth
> + */
> + int64_t threshold_size;
> +
> + /* params from 'migrate-set-parameters' */
> + MigrationParameters parameters;
> +
> + int state;
> +
> + /* State related to return path */
> + struct {
> + QEMUFile *from_dst_file;
> + QemuThread rp_thread;
> + bool error;
> + QemuSemaphore rp_sem;
> + } rp_state;
> +
> + double mbps;
> + /* Timestamp when recent migration starts (ms) */
> + int64_t start_time;
> + /* Total time used by latest migration (ms) */
> + int64_t total_time;
> + /* Timestamp when VM is down (ms) to migrate the last stuff */
> + int64_t downtime_start;
> + int64_t downtime;
> + int64_t expected_downtime;
> + bool enabled_capabilities[MIGRATION_CAPABILITY__MAX];
> + int64_t setup_time;
> + /*
> + * Whether guest was running when we enter the completion stage.
> + * If migration is interrupted by any reason, we need to continue
> + * running the guest on source.
> + */
> + bool vm_was_running;
> +
> + /* Flag set once the migration has been asked to enter postcopy */
> + bool start_postcopy;
> + /* Flag set after postcopy has sent the device state */
> + bool postcopy_after_devices;
> +
> + /* Flag set once the migration thread is running (and needs joining) */
> + bool migration_thread_running;
> +
> + /* Flag set once the migration thread called bdrv_inactivate_all */
> + bool block_inactive;
> +
> + /* Migration is paused due to pause-before-switchover */
> + QemuSemaphore pause_sem;
> +
> + /* The semaphore is used to notify COLO thread that failover is finished
> */
> + QemuSemaphore colo_exit_sem;
> +
> + /* The semaphore is used to notify COLO thread to do checkpoint */
> + QemuSemaphore colo_checkpoint_sem;
> + int64_t colo_checkpoint_time;
> + QEMUTimer *colo_delay_timer;
> +
> + /* The first error that has occurred.
> + We used the mutex to be able to return the 1st error message */
> + Error *error;
> + /* mutex to protect errp */
> + QemuMutex error_mutex;
> +
> + /* Do we have to clean up -b/-i from old migrate parameters */
> + /* This feature is deprecated and will be removed */
> + bool must_remove_block_options;
> +
> + /*
> + * Global switch on whether we need to store the global state
> + * during migration.
> + */
> + bool store_global_state;
> +
> + /* Whether we send QEMU_VM_CONFIGURATION during migration */
> + bool send_configuration;
> + /* Whether we send section footer during migration */
> + bool send_section_footer;
> +
> + /* Needed by postcopy-pause state */
> + QemuSemaphore postcopy_pause_sem;
> + QemuSemaphore postcopy_pause_rp_sem;
> + /*
> + * Whether we abort the migration if decompression errors are
> + * detected at the destination. It is left at false for qemu
> + * older than 3.0, since only newer qemu sends streams that
> + * do not trigger spurious decompression errors.
> + */
> + bool decompress_error_check;
> +};
> +
> +void migrate_set_state(int *state, int old_state, int new_state);
> +
> +void migration_fd_process_incoming(QEMUFile *f);
> +void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp);
> +void migration_incoming_process(void);
> +
> +bool migration_has_all_channels(void);
> +
> +uint64_t migrate_max_downtime(void);
> +
> +void migrate_set_error(MigrationState *s, const Error *error);
> +void migrate_fd_error(MigrationState *s, const Error *error);
> +
> +void migrate_fd_connect(MigrationState *s, Error *error_in);
> +
> +bool migration_is_setup_or_active(int state);
> +
> +void migrate_init(MigrationState *s);
> +bool migration_is_blocked(Error **errp);
> +/* True if outgoing migration has entered postcopy phase */
> +bool migration_in_postcopy(void);
> +MigrationState *migrate_get_current(void);
> +
> +bool migrate_postcopy(void);
> +
> +bool migrate_release_ram(void);
> +bool migrate_postcopy_ram(void);
> +bool migrate_zero_blocks(void);
> +bool migrate_dirty_bitmaps(void);
> +bool migrate_ignore_shared(void);
> +
> +bool migrate_auto_converge(void);
> +bool migrate_use_multifd(void);
> +bool migrate_pause_before_switchover(void);
> +int migrate_multifd_channels(void);
> +
> +int migrate_use_xbzrle(void);
> +int64_t migrate_xbzrle_cache_size(void);
> +bool migrate_colo_enabled(void);
> +
> +bool migrate_use_block(void);
> +bool migrate_use_block_incremental(void);
> +int migrate_max_cpu_throttle(void);
> +bool migrate_use_return_path(void);
> +
> +uint64_t ram_get_total_transferred_pages(void);
> +
> +bool migrate_use_compression(void);
> +int migrate_compress_level(void);
> +int migrate_compress_threads(void);
> +int migrate_compress_wait_thread(void);
> +int migrate_decompress_threads(void);
> +bool migrate_use_events(void);
> +bool migrate_postcopy_blocktime(void);
> +
> +/* Sending on the return path - generic and then for each message type */
> +void migrate_send_rp_shut(MigrationIncomingState *mis,
> + uint32_t value);
> +void migrate_send_rp_pong(MigrationIncomingState *mis,
> + uint32_t value);
> +int migrate_send_rp_req_pages(MigrationIncomingState *mis, const char*
> rbname,
> + ram_addr_t start, size_t len);
> +void migrate_send_rp_recv_bitmap(MigrationIncomingState *mis,
> + char *block_name);
> +void migrate_send_rp_resume_ack(MigrationIncomingState *mis, uint32_t value);
> +
> +void dirty_bitmap_mig_before_vm_start(void);
> +void init_dirty_bitmap_incoming_migration(void);
> +void migrate_add_address(SocketAddress *address);
> +
> +int foreach_not_ignored_block(RAMBlockIterFunc func, void *opaque);
> +
> +#define qemu_ram_foreach_block \
> + #warning "Use foreach_not_ignored_block in migration code"
> +
> +void migration_make_urgent_request(void);
> +void migration_consume_urgent_request(void);
> +
> +#endif
> diff --git a/migration/ram.c b/migration/ram.c
> index 48969db84b..8a6ad61d3d 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1664,6 +1664,33 @@ static inline bool
> migration_bitmap_clear_dirty(RAMState *rs,
> bool ret;
>
> qemu_mutex_lock(&rs->bitmap_mutex);
> +
> + /*
> + * Clear dirty bitmap if needed. This _must_ be called before we
> + * send any of the page in the chunk because we need to make sure
> + * we can capture further page content changes when we sync dirty
> + * log the next time. So as long as we are going to send any of
> + * the page in the chunk we clear the remote dirty bitmap for all.
> + * Clearing it earlier won't be a problem, but too late will.
> + */
> + if (rb->clear_bmap && clear_bmap_test_and_clear(rb, page)) {
> + uint8_t shift = rb->clear_bmap_shift;
> + hwaddr size = 1ULL << (TARGET_PAGE_BITS + shift);
> + hwaddr start = (page << TARGET_PAGE_BITS) & (-size);
> +
> + /*
> + * CLEAR_BITMAP_SHIFT_MIN should always guarantee this... this
> + * can make things easier sometimes since then start address
> + * of the small chunk will always be 64 pages aligned so the
> + * bitmap will always be aligned to unsigned long. We should
> + * even be able to remove this restriction but I'm simply
> + * keeping it.
> + */
> + assert(shift >= 6);
> + trace_migration_bitmap_clear_dirty(rb->idstr, start, size, page);
> + memory_region_clear_dirty_bitmap(rb->mr, start, size);
> + }
> +
> ret = test_and_clear_bit(page, rb->bmap);
>
> if (ret) {
> @@ -2687,6 +2714,8 @@ static void ram_save_cleanup(void *opaque)
> memory_global_dirty_log_stop();
>
> RAMBLOCK_FOREACH_NOT_IGNORED(block) {
> + g_free(block->clear_bmap);
> + block->clear_bmap = NULL;
> g_free(block->bmap);
> block->bmap = NULL;
> g_free(block->unsentmap);
> @@ -3197,11 +3226,24 @@ static int ram_state_init(RAMState **rsp)
>
> static void ram_list_init_bitmaps(void)
> {
> + MigrationState *ms = migrate_get_current();
> RAMBlock *block;
> unsigned long pages;
> + uint8_t shift;
>
> /* Skip setting bitmap if there is no RAM */
> if (ram_bytes_total()) {
> + shift = ms->clear_bitmap_shift;
> + if (shift > CLEAR_BITMAP_SHIFT_MAX) {
> + error_report("clear_bitmap_shift (%u) too big, using "
> + "max value (%u)", shift, CLEAR_BITMAP_SHIFT_MAX);
> + shift = CLEAR_BITMAP_SHIFT_MAX;
> + } else if (shift < CLEAR_BITMAP_SHIFT_MIN) {
> + error_report("clear_bitmap_shift (%u) too small, using "
> + "min value (%u)", shift, CLEAR_BITMAP_SHIFT_MIN);
> + shift = CLEAR_BITMAP_SHIFT_MIN;
> + }
> +
> RAMBLOCK_FOREACH_NOT_IGNORED(block) {
> pages = block->max_length >> TARGET_PAGE_BITS;
> /*
> @@ -3214,6 +3256,8 @@ static void ram_list_init_bitmaps(void)
> * Here setting RAMBlock.bmap would be fine too but not
> necessary.
> */
> block->bmap = bitmap_new(pages);
> + block->clear_bmap_shift = shift;
> + block->clear_bmap = bitmap_new(clear_bmap_size(pages, shift));
> if (migrate_postcopy_ram()) {
> block->unsentmap = bitmap_new(pages);
> bitmap_set(block->unsentmap, 0, pages);
> diff --git a/migration/ram.c.orig b/migration/ram.c.orig
> new file mode 100644
> index 0000000000..48969db84b
> --- /dev/null
> +++ b/migration/ram.c.orig
> @@ -0,0 +1,4599 @@
> +/*
> + * QEMU System Emulator
> + *
> + * Copyright (c) 2003-2008 Fabrice Bellard
> + * Copyright (c) 2011-2015 Red Hat Inc
> + *
> + * Authors:
> + * Juan Quintela <address@hidden>
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> copy
> + * of this software and associated documentation files (the "Software"), to
> deal
> + * in the Software without restriction, including without limitation the
> rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "cpu.h"
> +#include <zlib.h>
> +#include "qemu/cutils.h"
> +#include "qemu/bitops.h"
> +#include "qemu/bitmap.h"
> +#include "qemu/main-loop.h"
> +#include "qemu/pmem.h"
> +#include "xbzrle.h"
> +#include "ram.h"
> +#include "migration.h"
> +#include "socket.h"
> +#include "migration/register.h"
> +#include "migration/misc.h"
> +#include "qemu-file.h"
> +#include "postcopy-ram.h"
> +#include "page_cache.h"
> +#include "qemu/error-report.h"
> +#include "qapi/error.h"
> +#include "qapi/qapi-events-migration.h"
> +#include "qapi/qmp/qerror.h"
> +#include "trace.h"
> +#include "exec/ram_addr.h"
> +#include "exec/target_page.h"
> +#include "qemu/rcu_queue.h"
> +#include "migration/colo.h"
> +#include "block.h"
> +#include "sysemu/sysemu.h"
> +#include "qemu/uuid.h"
> +#include "savevm.h"
> +#include "qemu/iov.h"
> +
> +/***********************************************************/
> +/* ram save/restore */
> +
> +/* RAM_SAVE_FLAG_ZERO used to be named RAM_SAVE_FLAG_COMPRESS, it
> + * worked for pages that where filled with the same char. We switched
> + * it to only search for the zero value. And to avoid confusion with
> + * RAM_SSAVE_FLAG_COMPRESS_PAGE just rename it.
> + */
> +
> +#define RAM_SAVE_FLAG_FULL 0x01 /* Obsolete, not used anymore */
> +#define RAM_SAVE_FLAG_ZERO 0x02
> +#define RAM_SAVE_FLAG_MEM_SIZE 0x04
> +#define RAM_SAVE_FLAG_PAGE 0x08
> +#define RAM_SAVE_FLAG_EOS 0x10
> +#define RAM_SAVE_FLAG_CONTINUE 0x20
> +#define RAM_SAVE_FLAG_XBZRLE 0x40
> +/* 0x80 is reserved in migration.h start with 0x100 next */
> +#define RAM_SAVE_FLAG_COMPRESS_PAGE 0x100
> +
> +static inline bool is_zero_range(uint8_t *p, uint64_t size)
> +{
> + return buffer_is_zero(p, size);
> +}
> +
> +XBZRLECacheStats xbzrle_counters;
> +
> +/* struct contains XBZRLE cache and a static page
> + used by the compression */
> +static struct {
> + /* buffer used for XBZRLE encoding */
> + uint8_t *encoded_buf;
> + /* buffer for storing page content */
> + uint8_t *current_buf;
> + /* Cache for XBZRLE, Protected by lock. */
> + PageCache *cache;
> + QemuMutex lock;
> + /* it will store a page full of zeros */
> + uint8_t *zero_target_page;
> + /* buffer used for XBZRLE decoding */
> + uint8_t *decoded_buf;
> +} XBZRLE;
> +
> +static void XBZRLE_cache_lock(void)
> +{
> + if (migrate_use_xbzrle())
> + qemu_mutex_lock(&XBZRLE.lock);
> +}
> +
> +static void XBZRLE_cache_unlock(void)
> +{
> + if (migrate_use_xbzrle())
> + qemu_mutex_unlock(&XBZRLE.lock);
> +}
> +
> +/**
> + * xbzrle_cache_resize: resize the xbzrle cache
> + *
> + * This function is called from qmp_migrate_set_cache_size in main
> + * thread, possibly while a migration is in progress. A running
> + * migration may be using the cache and might finish during this call,
> + * hence changes to the cache are protected by XBZRLE.lock().
> + *
> + * Returns 0 for success or -1 for error
> + *
> + * @new_size: new cache size
> + * @errp: set *errp if the check failed, with reason
> + */
> +int xbzrle_cache_resize(int64_t new_size, Error **errp)
> +{
> + PageCache *new_cache;
> + int64_t ret = 0;
> +
> + /* Check for truncation */
> + if (new_size != (size_t)new_size) {
> + error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "cache size",
> + "exceeding address space");
> + return -1;
> + }
> +
> + if (new_size == migrate_xbzrle_cache_size()) {
> + /* nothing to do */
> + return 0;
> + }
> +
> + XBZRLE_cache_lock();
> +
> + if (XBZRLE.cache != NULL) {
> + new_cache = cache_init(new_size, TARGET_PAGE_SIZE, errp);
> + if (!new_cache) {
> + ret = -1;
> + goto out;
> + }
> +
> + cache_fini(XBZRLE.cache);
> + XBZRLE.cache = new_cache;
> + }
> +out:
> + XBZRLE_cache_unlock();
> + return ret;
> +}
> +
> +static bool ramblock_is_ignored(RAMBlock *block)
> +{
> + return !qemu_ram_is_migratable(block) ||
> + (migrate_ignore_shared() && qemu_ram_is_shared(block));
> +}
> +
> +/* Should be holding either ram_list.mutex, or the RCU lock. */
> +#define RAMBLOCK_FOREACH_NOT_IGNORED(block) \
> + INTERNAL_RAMBLOCK_FOREACH(block) \
> + if (ramblock_is_ignored(block)) {} else
> +
> +#define RAMBLOCK_FOREACH_MIGRATABLE(block) \
> + INTERNAL_RAMBLOCK_FOREACH(block) \
> + if (!qemu_ram_is_migratable(block)) {} else
> +
> +#undef RAMBLOCK_FOREACH
> +
> +int foreach_not_ignored_block(RAMBlockIterFunc func, void *opaque)
> +{
> + RAMBlock *block;
> + int ret = 0;
> +
> + rcu_read_lock();
> + RAMBLOCK_FOREACH_NOT_IGNORED(block) {
> + ret = func(block, opaque);
> + if (ret) {
> + break;
> + }
> + }
> + rcu_read_unlock();
> + return ret;
> +}
> +
> +static void ramblock_recv_map_init(void)
> +{
> + RAMBlock *rb;
> +
> + RAMBLOCK_FOREACH_NOT_IGNORED(rb) {
> + assert(!rb->receivedmap);
> + rb->receivedmap = bitmap_new(rb->max_length >>
> qemu_target_page_bits());
> + }
> +}
> +
> +int ramblock_recv_bitmap_test(RAMBlock *rb, void *host_addr)
> +{
> + return test_bit(ramblock_recv_bitmap_offset(host_addr, rb),
> + rb->receivedmap);
> +}
> +
> +bool ramblock_recv_bitmap_test_byte_offset(RAMBlock *rb, uint64_t
> byte_offset)
> +{
> + return test_bit(byte_offset >> TARGET_PAGE_BITS, rb->receivedmap);
> +}
> +
> +void ramblock_recv_bitmap_set(RAMBlock *rb, void *host_addr)
> +{
> + set_bit_atomic(ramblock_recv_bitmap_offset(host_addr, rb),
> rb->receivedmap);
> +}
> +
> +void ramblock_recv_bitmap_set_range(RAMBlock *rb, void *host_addr,
> + size_t nr)
> +{
> + bitmap_set_atomic(rb->receivedmap,
> + ramblock_recv_bitmap_offset(host_addr, rb),
> + nr);
> +}
> +
> +#define RAMBLOCK_RECV_BITMAP_ENDING (0x0123456789abcdefULL)
> +
> +/*
> + * Format: bitmap_size (8 bytes) + whole_bitmap (N bytes).
> + *
> + * Returns >0 if success with sent bytes, or <0 if error.
> + */
> +int64_t ramblock_recv_bitmap_send(QEMUFile *file,
> + const char *block_name)
> +{
> + RAMBlock *block = qemu_ram_block_by_name(block_name);
> + unsigned long *le_bitmap, nbits;
> + uint64_t size;
> +
> + if (!block) {
> + error_report("%s: invalid block name: %s", __func__, block_name);
> + return -1;
> + }
> +
> + nbits = block->used_length >> TARGET_PAGE_BITS;
> +
> + /*
> + * Make sure the tmp bitmap buffer is big enough, e.g., on 32bit
> + * machines we may need 4 more bytes for padding (see below
> + * comment). So extend it a bit before hand.
> + */
> + le_bitmap = bitmap_new(nbits + BITS_PER_LONG);
> +
> + /*
> + * Always use little endian when sending the bitmap. This is
> + * required that when source and destination VMs are not using the
> + * same endianess. (Note: big endian won't work.)
> + */
> + bitmap_to_le(le_bitmap, block->receivedmap, nbits);
> +
> + /* Size of the bitmap, in bytes */
> + size = DIV_ROUND_UP(nbits, 8);
> +
> + /*
> + * size is always aligned to 8 bytes for 64bit machines, but it
> + * may not be true for 32bit machines. We need this padding to
> + * make sure the migration can survive even between 32bit and
> + * 64bit machines.
> + */
> + size = ROUND_UP(size, 8);
> +
> + qemu_put_be64(file, size);
> + qemu_put_buffer(file, (const uint8_t *)le_bitmap, size);
> + /*
> + * Mark as an end, in case the middle part is screwed up due to
> + * some "misterious" reason.
> + */
> + qemu_put_be64(file, RAMBLOCK_RECV_BITMAP_ENDING);
> + qemu_fflush(file);
> +
> + g_free(le_bitmap);
> +
> + if (qemu_file_get_error(file)) {
> + return qemu_file_get_error(file);
> + }
> +
> + return size + sizeof(size);
> +}
> +
> +/*
> + * An outstanding page request, on the source, having been received
> + * and queued
> + */
> +struct RAMSrcPageRequest {
> + RAMBlock *rb;
> + hwaddr offset;
> + hwaddr len;
> +
> + QSIMPLEQ_ENTRY(RAMSrcPageRequest) next_req;
> +};
> +
> +/* State of RAM for migration */
> +struct RAMState {
> + /* QEMUFile used for this migration */
> + QEMUFile *f;
> + /* Last block that we have visited searching for dirty pages */
> + RAMBlock *last_seen_block;
> + /* Last block from where we have sent data */
> + RAMBlock *last_sent_block;
> + /* Last dirty target page we have sent */
> + ram_addr_t last_page;
> + /* last ram version we have seen */
> + uint32_t last_version;
> + /* We are in the first round */
> + bool ram_bulk_stage;
> + /* The free page optimization is enabled */
> + bool fpo_enabled;
> + /* How many times we have dirty too many pages */
> + int dirty_rate_high_cnt;
> + /* these variables are used for bitmap sync */
> + /* last time we did a full bitmap_sync */
> + int64_t time_last_bitmap_sync;
> + /* bytes transferred at start_time */
> + uint64_t bytes_xfer_prev;
> + /* number of dirty pages since start_time */
> + uint64_t num_dirty_pages_period;
> + /* xbzrle misses since the beginning of the period */
> + uint64_t xbzrle_cache_miss_prev;
> +
> + /* compression statistics since the beginning of the period */
> + /* amount of count that no free thread to compress data */
> + uint64_t compress_thread_busy_prev;
> + /* amount bytes after compression */
> + uint64_t compressed_size_prev;
> + /* amount of compressed pages */
> + uint64_t compress_pages_prev;
> +
> + /* total handled target pages at the beginning of period */
> + uint64_t target_page_count_prev;
> + /* total handled target pages since start */
> + uint64_t target_page_count;
> + /* number of dirty bits in the bitmap */
> + uint64_t migration_dirty_pages;
> + /* Protects modification of the bitmap and migration dirty pages */
> + QemuMutex bitmap_mutex;
> + /* The RAMBlock used in the last src_page_requests */
> + RAMBlock *last_req_rb;
> + /* Queue of outstanding page requests from the destination */
> + QemuMutex src_page_req_mutex;
> + QSIMPLEQ_HEAD(, RAMSrcPageRequest) src_page_requests;
> +};
> +typedef struct RAMState RAMState;
> +
> +static RAMState *ram_state;
> +
> +static NotifierWithReturnList precopy_notifier_list;
> +
> +void precopy_infrastructure_init(void)
> +{
> + notifier_with_return_list_init(&precopy_notifier_list);
> +}
> +
> +void precopy_add_notifier(NotifierWithReturn *n)
> +{
> + notifier_with_return_list_add(&precopy_notifier_list, n);
> +}
> +
> +void precopy_remove_notifier(NotifierWithReturn *n)
> +{
> + notifier_with_return_remove(n);
> +}
> +
> +int precopy_notify(PrecopyNotifyReason reason, Error **errp)
> +{
> + PrecopyNotifyData pnd;
> + pnd.reason = reason;
> + pnd.errp = errp;
> +
> + return notifier_with_return_list_notify(&precopy_notifier_list, &pnd);
> +}
> +
> +void precopy_enable_free_page_optimization(void)
> +{
> + if (!ram_state) {
> + return;
> + }
> +
> + ram_state->fpo_enabled = true;
> +}
> +
> +uint64_t ram_bytes_remaining(void)
> +{
> + return ram_state ? (ram_state->migration_dirty_pages * TARGET_PAGE_SIZE)
> :
> + 0;
> +}
> +
> +MigrationStats ram_counters;
> +
> +/* used by the search for pages to send */
> +struct PageSearchStatus {
> + /* Current block being searched */
> + RAMBlock *block;
> + /* Current page to search from */
> + unsigned long page;
> + /* Set once we wrap around */
> + bool complete_round;
> +};
> +typedef struct PageSearchStatus PageSearchStatus;
> +
> +CompressionStats compression_counters;
> +
> +struct CompressParam {
> + bool done;
> + bool quit;
> + bool zero_page;
> + QEMUFile *file;
> + QemuMutex mutex;
> + QemuCond cond;
> + RAMBlock *block;
> + ram_addr_t offset;
> +
> + /* internally used fields */
> + z_stream stream;
> + uint8_t *originbuf;
> +};
> +typedef struct CompressParam CompressParam;
> +
> +struct DecompressParam {
> + bool done;
> + bool quit;
> + QemuMutex mutex;
> + QemuCond cond;
> + void *des;
> + uint8_t *compbuf;
> + int len;
> + z_stream stream;
> +};
> +typedef struct DecompressParam DecompressParam;
> +
> +static CompressParam *comp_param;
> +static QemuThread *compress_threads;
> +/* comp_done_cond is used to wake up the migration thread when
> + * one of the compression threads has finished the compression.
> + * comp_done_lock is used to co-work with comp_done_cond.
> + */
> +static QemuMutex comp_done_lock;
> +static QemuCond comp_done_cond;
> +/* The empty QEMUFileOps will be used by file in CompressParam */
> +static const QEMUFileOps empty_ops = { };
> +
> +static QEMUFile *decomp_file;
> +static DecompressParam *decomp_param;
> +static QemuThread *decompress_threads;
> +static QemuMutex decomp_done_lock;
> +static QemuCond decomp_done_cond;
> +
> +static bool do_compress_ram_page(QEMUFile *f, z_stream *stream, RAMBlock
> *block,
> + ram_addr_t offset, uint8_t *source_buf);
> +
> +static void *do_data_compress(void *opaque)
> +{
> + CompressParam *param = opaque;
> + RAMBlock *block;
> + ram_addr_t offset;
> + bool zero_page;
> +
> + qemu_mutex_lock(¶m->mutex);
> + while (!param->quit) {
> + if (param->block) {
> + block = param->block;
> + offset = param->offset;
> + param->block = NULL;
> + qemu_mutex_unlock(¶m->mutex);
> +
> + zero_page = do_compress_ram_page(param->file, ¶m->stream,
> + block, offset,
> param->originbuf);
> +
> + qemu_mutex_lock(&comp_done_lock);
> + param->done = true;
> + param->zero_page = zero_page;
> + qemu_cond_signal(&comp_done_cond);
> + qemu_mutex_unlock(&comp_done_lock);
> +
> + qemu_mutex_lock(¶m->mutex);
> + } else {
> + qemu_cond_wait(¶m->cond, ¶m->mutex);
> + }
> + }
> + qemu_mutex_unlock(¶m->mutex);
> +
> + return NULL;
> +}
> +
> +static void compress_threads_save_cleanup(void)
> +{
> + int i, thread_count;
> +
> + if (!migrate_use_compression() || !comp_param) {
> + return;
> + }
> +
> + thread_count = migrate_compress_threads();
> + for (i = 0; i < thread_count; i++) {
> + /*
> + * we use it as a indicator which shows if the thread is
> + * properly init'd or not
> + */
> + if (!comp_param[i].file) {
> + break;
> + }
> +
> + qemu_mutex_lock(&comp_param[i].mutex);
> + comp_param[i].quit = true;
> + qemu_cond_signal(&comp_param[i].cond);
> + qemu_mutex_unlock(&comp_param[i].mutex);
> +
> + qemu_thread_join(compress_threads + i);
> + qemu_mutex_destroy(&comp_param[i].mutex);
> + qemu_cond_destroy(&comp_param[i].cond);
> + deflateEnd(&comp_param[i].stream);
> + g_free(comp_param[i].originbuf);
> + qemu_fclose(comp_param[i].file);
> + comp_param[i].file = NULL;
> + }
> + qemu_mutex_destroy(&comp_done_lock);
> + qemu_cond_destroy(&comp_done_cond);
> + g_free(compress_threads);
> + g_free(comp_param);
> + compress_threads = NULL;
> + comp_param = NULL;
> +}
> +
> +static int compress_threads_save_setup(void)
> +{
> + int i, thread_count;
> +
> + if (!migrate_use_compression()) {
> + return 0;
> + }
> + thread_count = migrate_compress_threads();
> + compress_threads = g_new0(QemuThread, thread_count);
> + comp_param = g_new0(CompressParam, thread_count);
> + qemu_cond_init(&comp_done_cond);
> + qemu_mutex_init(&comp_done_lock);
> + for (i = 0; i < thread_count; i++) {
> + comp_param[i].originbuf = g_try_malloc(TARGET_PAGE_SIZE);
> + if (!comp_param[i].originbuf) {
> + goto exit;
> + }
> +
> + if (deflateInit(&comp_param[i].stream,
> + migrate_compress_level()) != Z_OK) {
> + g_free(comp_param[i].originbuf);
> + goto exit;
> + }
> +
> + /* comp_param[i].file is just used as a dummy buffer to save data,
> + * set its ops to empty.
> + */
> + comp_param[i].file = qemu_fopen_ops(NULL, &empty_ops);
> + comp_param[i].done = true;
> + comp_param[i].quit = false;
> + qemu_mutex_init(&comp_param[i].mutex);
> + qemu_cond_init(&comp_param[i].cond);
> + qemu_thread_create(compress_threads + i, "compress",
> + do_data_compress, comp_param + i,
> + QEMU_THREAD_JOINABLE);
> + }
> + return 0;
> +
> +exit:
> + compress_threads_save_cleanup();
> + return -1;
> +}
> +
> +/* Multiple fd's */
> +
> +#define MULTIFD_MAGIC 0x11223344U
> +#define MULTIFD_VERSION 1
> +
> +#define MULTIFD_FLAG_SYNC (1 << 0)
> +
> +/* This value needs to be a multiple of qemu_target_page_size() */
> +#define MULTIFD_PACKET_SIZE (512 * 1024)
> +
> +typedef struct {
> + uint32_t magic;
> + uint32_t version;
> + unsigned char uuid[16]; /* QemuUUID */
> + uint8_t id;
> + uint8_t unused1[7]; /* Reserved for future use */
> + uint64_t unused2[4]; /* Reserved for future use */
> +} __attribute__((packed)) MultiFDInit_t;
> +
> +typedef struct {
> + uint32_t magic;
> + uint32_t version;
> + uint32_t flags;
> + /* maximum number of allocated pages */
> + uint32_t pages_alloc;
> + uint32_t pages_used;
> + /* size of the next packet that contains pages */
> + uint32_t next_packet_size;
> + uint64_t packet_num;
> + uint64_t unused[4]; /* Reserved for future use */
> + char ramblock[256];
> + uint64_t offset[];
> +} __attribute__((packed)) MultiFDPacket_t;
> +
> +typedef struct {
> + /* number of used pages */
> + uint32_t used;
> + /* number of allocated pages */
> + uint32_t allocated;
> + /* global number of generated multifd packets */
> + uint64_t packet_num;
> + /* offset of each page */
> + ram_addr_t *offset;
> + /* pointer to each page */
> + struct iovec *iov;
> + RAMBlock *block;
> +} MultiFDPages_t;
> +
> +typedef struct {
> + /* this fields are not changed once the thread is created */
> + /* channel number */
> + uint8_t id;
> + /* channel thread name */
> + char *name;
> + /* channel thread id */
> + QemuThread thread;
> + /* communication channel */
> + QIOChannel *c;
> + /* sem where to wait for more work */
> + QemuSemaphore sem;
> + /* this mutex protects the following parameters */
> + QemuMutex mutex;
> + /* is this channel thread running */
> + bool running;
> + /* should this thread finish */
> + bool quit;
> + /* thread has work to do */
> + int pending_job;
> + /* array of pages to sent */
> + MultiFDPages_t *pages;
> + /* packet allocated len */
> + uint32_t packet_len;
> + /* pointer to the packet */
> + MultiFDPacket_t *packet;
> + /* multifd flags for each packet */
> + uint32_t flags;
> + /* size of the next packet that contains pages */
> + uint32_t next_packet_size;
> + /* global number of generated multifd packets */
> + uint64_t packet_num;
> + /* thread local variables */
> + /* packets sent through this channel */
> + uint64_t num_packets;
> + /* pages sent through this channel */
> + uint64_t num_pages;
> +} MultiFDSendParams;
> +
> +typedef struct {
> + /* this fields are not changed once the thread is created */
> + /* channel number */
> + uint8_t id;
> + /* channel thread name */
> + char *name;
> + /* channel thread id */
> + QemuThread thread;
> + /* communication channel */
> + QIOChannel *c;
> + /* this mutex protects the following parameters */
> + QemuMutex mutex;
> + /* is this channel thread running */
> + bool running;
> + /* array of pages to receive */
> + MultiFDPages_t *pages;
> + /* packet allocated len */
> + uint32_t packet_len;
> + /* pointer to the packet */
> + MultiFDPacket_t *packet;
> + /* multifd flags for each packet */
> + uint32_t flags;
> + /* global number of generated multifd packets */
> + uint64_t packet_num;
> + /* thread local variables */
> + /* size of the next packet that contains pages */
> + uint32_t next_packet_size;
> + /* packets sent through this channel */
> + uint64_t num_packets;
> + /* pages sent through this channel */
> + uint64_t num_pages;
> + /* syncs main thread and channels */
> + QemuSemaphore sem_sync;
> +} MultiFDRecvParams;
> +
> +static int multifd_send_initial_packet(MultiFDSendParams *p, Error **errp)
> +{
> + MultiFDInit_t msg;
> + int ret;
> +
> + msg.magic = cpu_to_be32(MULTIFD_MAGIC);
> + msg.version = cpu_to_be32(MULTIFD_VERSION);
> + msg.id = p->id;
> + memcpy(msg.uuid, &qemu_uuid.data, sizeof(msg.uuid));
> +
> + ret = qio_channel_write_all(p->c, (char *)&msg, sizeof(msg), errp);
> + if (ret != 0) {
> + return -1;
> + }
> + return 0;
> +}
> +
> +static int multifd_recv_initial_packet(QIOChannel *c, Error **errp)
> +{
> + MultiFDInit_t msg;
> + int ret;
> +
> + ret = qio_channel_read_all(c, (char *)&msg, sizeof(msg), errp);
> + if (ret != 0) {
> + return -1;
> + }
> +
> + msg.magic = be32_to_cpu(msg.magic);
> + msg.version = be32_to_cpu(msg.version);
> +
> + if (msg.magic != MULTIFD_MAGIC) {
> + error_setg(errp, "multifd: received packet magic %x "
> + "expected %x", msg.magic, MULTIFD_MAGIC);
> + return -1;
> + }
> +
> + if (msg.version != MULTIFD_VERSION) {
> + error_setg(errp, "multifd: received packet version %d "
> + "expected %d", msg.version, MULTIFD_VERSION);
> + return -1;
> + }
> +
> + if (memcmp(msg.uuid, &qemu_uuid, sizeof(qemu_uuid))) {
> + char *uuid = qemu_uuid_unparse_strdup(&qemu_uuid);
> + char *msg_uuid = qemu_uuid_unparse_strdup((const QemuUUID
> *)msg.uuid);
> +
> + error_setg(errp, "multifd: received uuid '%s' and expected "
> + "uuid '%s' for channel %hhd", msg_uuid, uuid, msg.id);
> + g_free(uuid);
> + g_free(msg_uuid);
> + return -1;
> + }
> +
> + if (msg.id > migrate_multifd_channels()) {
> + error_setg(errp, "multifd: received channel version %d "
> + "expected %d", msg.version, MULTIFD_VERSION);
> + return -1;
> + }
> +
> + return msg.id;
> +}
> +
> +static MultiFDPages_t *multifd_pages_init(size_t size)
> +{
> + MultiFDPages_t *pages = g_new0(MultiFDPages_t, 1);
> +
> + pages->allocated = size;
> + pages->iov = g_new0(struct iovec, size);
> + pages->offset = g_new0(ram_addr_t, size);
> +
> + return pages;
> +}
> +
> +static void multifd_pages_clear(MultiFDPages_t *pages)
> +{
> + pages->used = 0;
> + pages->allocated = 0;
> + pages->packet_num = 0;
> + pages->block = NULL;
> + g_free(pages->iov);
> + pages->iov = NULL;
> + g_free(pages->offset);
> + pages->offset = NULL;
> + g_free(pages);
> +}
> +
> +static void multifd_send_fill_packet(MultiFDSendParams *p)
> +{
> + MultiFDPacket_t *packet = p->packet;
> + uint32_t page_max = MULTIFD_PACKET_SIZE / qemu_target_page_size();
> + int i;
> +
> + packet->magic = cpu_to_be32(MULTIFD_MAGIC);
> + packet->version = cpu_to_be32(MULTIFD_VERSION);
> + packet->flags = cpu_to_be32(p->flags);
> + packet->pages_alloc = cpu_to_be32(page_max);
> + packet->pages_used = cpu_to_be32(p->pages->used);
> + packet->next_packet_size = cpu_to_be32(p->next_packet_size);
> + packet->packet_num = cpu_to_be64(p->packet_num);
> +
> + if (p->pages->block) {
> + strncpy(packet->ramblock, p->pages->block->idstr, 256);
> + }
> +
> + for (i = 0; i < p->pages->used; i++) {
> + packet->offset[i] = cpu_to_be64(p->pages->offset[i]);
> + }
> +}
> +
> +static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
> +{
> + MultiFDPacket_t *packet = p->packet;
> + uint32_t pages_max = MULTIFD_PACKET_SIZE / qemu_target_page_size();
> + RAMBlock *block;
> + int i;
> +
> + packet->magic = be32_to_cpu(packet->magic);
> + if (packet->magic != MULTIFD_MAGIC) {
> + error_setg(errp, "multifd: received packet "
> + "magic %x and expected magic %x",
> + packet->magic, MULTIFD_MAGIC);
> + return -1;
> + }
> +
> + packet->version = be32_to_cpu(packet->version);
> + if (packet->version != MULTIFD_VERSION) {
> + error_setg(errp, "multifd: received packet "
> + "version %d and expected version %d",
> + packet->version, MULTIFD_VERSION);
> + return -1;
> + }
> +
> + p->flags = be32_to_cpu(packet->flags);
> +
> + packet->pages_alloc = be32_to_cpu(packet->pages_alloc);
> + /*
> + * If we recevied a packet that is 100 times bigger than expected
> + * just stop migration. It is a magic number.
> + */
> + if (packet->pages_alloc > pages_max * 100) {
> + error_setg(errp, "multifd: received packet "
> + "with size %d and expected a maximum size of %d",
> + packet->pages_alloc, pages_max * 100) ;
> + return -1;
> + }
> + /*
> + * We received a packet that is bigger than expected but inside
> + * reasonable limits (see previous comment). Just reallocate.
> + */
> + if (packet->pages_alloc > p->pages->allocated) {
> + multifd_pages_clear(p->pages);
> + p->pages = multifd_pages_init(packet->pages_alloc);
> + }
> +
> + p->pages->used = be32_to_cpu(packet->pages_used);
> + if (p->pages->used > packet->pages_alloc) {
> + error_setg(errp, "multifd: received packet "
> + "with %d pages and expected maximum pages are %d",
> + p->pages->used, packet->pages_alloc) ;
> + return -1;
> + }
> +
> + p->next_packet_size = be32_to_cpu(packet->next_packet_size);
> + p->packet_num = be64_to_cpu(packet->packet_num);
> +
> + if (p->pages->used) {
> + /* make sure that ramblock is 0 terminated */
> + packet->ramblock[255] = 0;
> + block = qemu_ram_block_by_name(packet->ramblock);
> + if (!block) {
> + error_setg(errp, "multifd: unknown ram block %s",
> + packet->ramblock);
> + return -1;
> + }
> + }
> +
> + for (i = 0; i < p->pages->used; i++) {
> + ram_addr_t offset = be64_to_cpu(packet->offset[i]);
> +
> + if (offset > (block->used_length - TARGET_PAGE_SIZE)) {
> + error_setg(errp, "multifd: offset too long " RAM_ADDR_FMT
> + " (max " RAM_ADDR_FMT ")",
> + offset, block->max_length);
> + return -1;
> + }
> + p->pages->iov[i].iov_base = block->host + offset;
> + p->pages->iov[i].iov_len = TARGET_PAGE_SIZE;
> + }
> +
> + return 0;
> +}
> +
> +struct {
> + MultiFDSendParams *params;
> + /* array of pages to sent */
> + MultiFDPages_t *pages;
> + /* syncs main thread and channels */
> + QemuSemaphore sem_sync;
> + /* global number of generated multifd packets */
> + uint64_t packet_num;
> + /* send channels ready */
> + QemuSemaphore channels_ready;
> +} *multifd_send_state;
> +
> +/*
> + * How we use multifd_send_state->pages and channel->pages?
> + *
> + * We create a pages for each channel, and a main one. Each time that
> + * we need to send a batch of pages we interchange the ones between
> + * multifd_send_state and the channel that is sending it. There are
> + * two reasons for that:
> + * - to not have to do so many mallocs during migration
> + * - to make easier to know what to free at the end of migration
> + *
> + * This way we always know who is the owner of each "pages" struct,
> + * and we don't need any locking. It belongs to the migration thread
> + * or to the channel thread. Switching is safe because the migration
> + * thread is using the channel mutex when changing it, and the channel
> + * have to had finish with its own, otherwise pending_job can't be
> + * false.
> + */
> +
> +static void multifd_send_pages(void)
> +{
> + int i;
> + static int next_channel;
> + MultiFDSendParams *p = NULL; /* make happy gcc */
> + MultiFDPages_t *pages = multifd_send_state->pages;
> + uint64_t transferred;
> +
> + qemu_sem_wait(&multifd_send_state->channels_ready);
> + for (i = next_channel;; i = (i + 1) % migrate_multifd_channels()) {
> + p = &multifd_send_state->params[i];
> +
> + qemu_mutex_lock(&p->mutex);
> + if (!p->pending_job) {
> + p->pending_job++;
> + next_channel = (i + 1) % migrate_multifd_channels();
> + break;
> + }
> + qemu_mutex_unlock(&p->mutex);
> + }
> + p->pages->used = 0;
> +
> + p->packet_num = multifd_send_state->packet_num++;
> + p->pages->block = NULL;
> + multifd_send_state->pages = p->pages;
> + p->pages = pages;
> + transferred = ((uint64_t) pages->used) * TARGET_PAGE_SIZE +
> p->packet_len;
> + ram_counters.multifd_bytes += transferred;
> + ram_counters.transferred += transferred;;
> + qemu_mutex_unlock(&p->mutex);
> + qemu_sem_post(&p->sem);
> +}
> +
> +static void multifd_queue_page(RAMBlock *block, ram_addr_t offset)
> +{
> + MultiFDPages_t *pages = multifd_send_state->pages;
> +
> + if (!pages->block) {
> + pages->block = block;
> + }
> +
> + if (pages->block == block) {
> + pages->offset[pages->used] = offset;
> + pages->iov[pages->used].iov_base = block->host + offset;
> + pages->iov[pages->used].iov_len = TARGET_PAGE_SIZE;
> + pages->used++;
> +
> + if (pages->used < pages->allocated) {
> + return;
> + }
> + }
> +
> + multifd_send_pages();
> +
> + if (pages->block != block) {
> + multifd_queue_page(block, offset);
> + }
> +}
> +
> +static void multifd_send_terminate_threads(Error *err)
> +{
> + int i;
> +
> + if (err) {
> + MigrationState *s = migrate_get_current();
> + migrate_set_error(s, err);
> + if (s->state == MIGRATION_STATUS_SETUP ||
> + s->state == MIGRATION_STATUS_PRE_SWITCHOVER ||
> + s->state == MIGRATION_STATUS_DEVICE ||
> + s->state == MIGRATION_STATUS_ACTIVE) {
> + migrate_set_state(&s->state, s->state,
> + MIGRATION_STATUS_FAILED);
> + }
> + }
> +
> + for (i = 0; i < migrate_multifd_channels(); i++) {
> + MultiFDSendParams *p = &multifd_send_state->params[i];
> +
> + qemu_mutex_lock(&p->mutex);
> + p->quit = true;
> + qemu_sem_post(&p->sem);
> + qemu_mutex_unlock(&p->mutex);
> + }
> +}
> +
> +void multifd_save_cleanup(void)
> +{
> + int i;
> +
> + if (!migrate_use_multifd()) {
> + return;
> + }
> + multifd_send_terminate_threads(NULL);
> + for (i = 0; i < migrate_multifd_channels(); i++) {
> + MultiFDSendParams *p = &multifd_send_state->params[i];
> +
> + if (p->running) {
> + qemu_thread_join(&p->thread);
> + }
> + socket_send_channel_destroy(p->c);
> + p->c = NULL;
> + qemu_mutex_destroy(&p->mutex);
> + qemu_sem_destroy(&p->sem);
> + g_free(p->name);
> + p->name = NULL;
> + multifd_pages_clear(p->pages);
> + p->pages = NULL;
> + p->packet_len = 0;
> + g_free(p->packet);
> + p->packet = NULL;
> + }
> + qemu_sem_destroy(&multifd_send_state->channels_ready);
> + qemu_sem_destroy(&multifd_send_state->sem_sync);
> + g_free(multifd_send_state->params);
> + multifd_send_state->params = NULL;
> + multifd_pages_clear(multifd_send_state->pages);
> + multifd_send_state->pages = NULL;
> + g_free(multifd_send_state);
> + multifd_send_state = NULL;
> +}
> +
> +static void multifd_send_sync_main(void)
> +{
> + int i;
> +
> + if (!migrate_use_multifd()) {
> + return;
> + }
> + if (multifd_send_state->pages->used) {
> + multifd_send_pages();
> + }
> + for (i = 0; i < migrate_multifd_channels(); i++) {
> + MultiFDSendParams *p = &multifd_send_state->params[i];
> +
> + trace_multifd_send_sync_main_signal(p->id);
> +
> + qemu_mutex_lock(&p->mutex);
> +
> + p->packet_num = multifd_send_state->packet_num++;
> + p->flags |= MULTIFD_FLAG_SYNC;
> + p->pending_job++;
> + qemu_mutex_unlock(&p->mutex);
> + qemu_sem_post(&p->sem);
> + }
> + for (i = 0; i < migrate_multifd_channels(); i++) {
> + MultiFDSendParams *p = &multifd_send_state->params[i];
> +
> + trace_multifd_send_sync_main_wait(p->id);
> + qemu_sem_wait(&multifd_send_state->sem_sync);
> + }
> + trace_multifd_send_sync_main(multifd_send_state->packet_num);
> +}
> +
> +static void *multifd_send_thread(void *opaque)
> +{
> + MultiFDSendParams *p = opaque;
> + Error *local_err = NULL;
> + int ret;
> +
> + trace_multifd_send_thread_start(p->id);
> + rcu_register_thread();
> +
> + if (multifd_send_initial_packet(p, &local_err) < 0) {
> + goto out;
> + }
> + /* initial packet */
> + p->num_packets = 1;
> +
> + while (true) {
> + qemu_sem_wait(&p->sem);
> + qemu_mutex_lock(&p->mutex);
> +
> + if (p->pending_job) {
> + uint32_t used = p->pages->used;
> + uint64_t packet_num = p->packet_num;
> + uint32_t flags = p->flags;
> +
> + p->next_packet_size = used * qemu_target_page_size();
> + multifd_send_fill_packet(p);
> + p->flags = 0;
> + p->num_packets++;
> + p->num_pages += used;
> + p->pages->used = 0;
> + qemu_mutex_unlock(&p->mutex);
> +
> + trace_multifd_send(p->id, packet_num, used, flags,
> + p->next_packet_size);
> +
> + ret = qio_channel_write_all(p->c, (void *)p->packet,
> + p->packet_len, &local_err);
> + if (ret != 0) {
> + break;
> + }
> +
> + if (used) {
> + ret = qio_channel_writev_all(p->c, p->pages->iov,
> + used, &local_err);
> + if (ret != 0) {
> + break;
> + }
> + }
> +
> + qemu_mutex_lock(&p->mutex);
> + p->pending_job--;
> + qemu_mutex_unlock(&p->mutex);
> +
> + if (flags & MULTIFD_FLAG_SYNC) {
> + qemu_sem_post(&multifd_send_state->sem_sync);
> + }
> + qemu_sem_post(&multifd_send_state->channels_ready);
> + } else if (p->quit) {
> + qemu_mutex_unlock(&p->mutex);
> + break;
> + } else {
> + qemu_mutex_unlock(&p->mutex);
> + /* sometimes there are spurious wakeups */
> + }
> + }
> +
> +out:
> + if (local_err) {
> + multifd_send_terminate_threads(local_err);
> + }
> +
> + qemu_mutex_lock(&p->mutex);
> + p->running = false;
> + qemu_mutex_unlock(&p->mutex);
> +
> + rcu_unregister_thread();
> + trace_multifd_send_thread_end(p->id, p->num_packets, p->num_pages);
> +
> + return NULL;
> +}
> +
> +static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
> +{
> + MultiFDSendParams *p = opaque;
> + QIOChannel *sioc = QIO_CHANNEL(qio_task_get_source(task));
> + Error *local_err = NULL;
> +
> + if (qio_task_propagate_error(task, &local_err)) {
> + migrate_set_error(migrate_get_current(), local_err);
> + multifd_save_cleanup();
> + } else {
> + p->c = QIO_CHANNEL(sioc);
> + qio_channel_set_delay(p->c, false);
> + p->running = true;
> + qemu_thread_create(&p->thread, p->name, multifd_send_thread, p,
> + QEMU_THREAD_JOINABLE);
> + }
> +}
> +
> +int multifd_save_setup(void)
> +{
> + int thread_count;
> + uint32_t page_count = MULTIFD_PACKET_SIZE / qemu_target_page_size();
> + uint8_t i;
> +
> + if (!migrate_use_multifd()) {
> + return 0;
> + }
> + thread_count = migrate_multifd_channels();
> + multifd_send_state = g_malloc0(sizeof(*multifd_send_state));
> + multifd_send_state->params = g_new0(MultiFDSendParams, thread_count);
> + multifd_send_state->pages = multifd_pages_init(page_count);
> + qemu_sem_init(&multifd_send_state->sem_sync, 0);
> + qemu_sem_init(&multifd_send_state->channels_ready, 0);
> +
> + for (i = 0; i < thread_count; i++) {
> + MultiFDSendParams *p = &multifd_send_state->params[i];
> +
> + qemu_mutex_init(&p->mutex);
> + qemu_sem_init(&p->sem, 0);
> + p->quit = false;
> + p->pending_job = 0;
> + p->id = i;
> + p->pages = multifd_pages_init(page_count);
> + p->packet_len = sizeof(MultiFDPacket_t)
> + + sizeof(ram_addr_t) * page_count;
> + p->packet = g_malloc0(p->packet_len);
> + p->name = g_strdup_printf("multifdsend_%d", i);
> + socket_send_channel_create(multifd_new_send_channel_async, p);
> + }
> + return 0;
> +}
> +
> +struct {
> + MultiFDRecvParams *params;
> + /* number of created threads */
> + int count;
> + /* syncs main thread and channels */
> + QemuSemaphore sem_sync;
> + /* global number of generated multifd packets */
> + uint64_t packet_num;
> +} *multifd_recv_state;
> +
> +static void multifd_recv_terminate_threads(Error *err)
> +{
> + int i;
> +
> + if (err) {
> + MigrationState *s = migrate_get_current();
> + migrate_set_error(s, err);
> + if (s->state == MIGRATION_STATUS_SETUP ||
> + s->state == MIGRATION_STATUS_ACTIVE) {
> + migrate_set_state(&s->state, s->state,
> + MIGRATION_STATUS_FAILED);
> + }
> + }
> +
> + for (i = 0; i < migrate_multifd_channels(); i++) {
> + MultiFDRecvParams *p = &multifd_recv_state->params[i];
> +
> + qemu_mutex_lock(&p->mutex);
> + /* We could arrive here for two reasons:
> + - normal quit, i.e. everything went fine, just finished
> + - error quit: We close the channels so the channel threads
> + finish the qio_channel_read_all_eof() */
> + qio_channel_shutdown(p->c, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
> + qemu_mutex_unlock(&p->mutex);
> + }
> +}
> +
> +int multifd_load_cleanup(Error **errp)
> +{
> + int i;
> + int ret = 0;
> +
> + if (!migrate_use_multifd()) {
> + return 0;
> + }
> + multifd_recv_terminate_threads(NULL);
> + for (i = 0; i < migrate_multifd_channels(); i++) {
> + MultiFDRecvParams *p = &multifd_recv_state->params[i];
> +
> + if (p->running) {
> + qemu_thread_join(&p->thread);
> + }
> + object_unref(OBJECT(p->c));
> + p->c = NULL;
> + qemu_mutex_destroy(&p->mutex);
> + qemu_sem_destroy(&p->sem_sync);
> + g_free(p->name);
> + p->name = NULL;
> + multifd_pages_clear(p->pages);
> + p->pages = NULL;
> + p->packet_len = 0;
> + g_free(p->packet);
> + p->packet = NULL;
> + }
> + qemu_sem_destroy(&multifd_recv_state->sem_sync);
> + g_free(multifd_recv_state->params);
> + multifd_recv_state->params = NULL;
> + g_free(multifd_recv_state);
> + multifd_recv_state = NULL;
> +
> + return ret;
> +}
> +
> +static void multifd_recv_sync_main(void)
> +{
> + int i;
> +
> + if (!migrate_use_multifd()) {
> + return;
> + }
> + for (i = 0; i < migrate_multifd_channels(); i++) {
> + MultiFDRecvParams *p = &multifd_recv_state->params[i];
> +
> + trace_multifd_recv_sync_main_wait(p->id);
> + qemu_sem_wait(&multifd_recv_state->sem_sync);
> + }
> + for (i = 0; i < migrate_multifd_channels(); i++) {
> + MultiFDRecvParams *p = &multifd_recv_state->params[i];
> +
> + qemu_mutex_lock(&p->mutex);
> + if (multifd_recv_state->packet_num < p->packet_num) {
> + multifd_recv_state->packet_num = p->packet_num;
> + }
> + qemu_mutex_unlock(&p->mutex);
> + trace_multifd_recv_sync_main_signal(p->id);
> + qemu_sem_post(&p->sem_sync);
> + }
> + trace_multifd_recv_sync_main(multifd_recv_state->packet_num);
> +}
> +
> +static void *multifd_recv_thread(void *opaque)
> +{
> + MultiFDRecvParams *p = opaque;
> + Error *local_err = NULL;
> + int ret;
> +
> + trace_multifd_recv_thread_start(p->id);
> + rcu_register_thread();
> +
> + while (true) {
> + uint32_t used;
> + uint32_t flags;
> +
> + ret = qio_channel_read_all_eof(p->c, (void *)p->packet,
> + p->packet_len, &local_err);
> + if (ret == 0) { /* EOF */
> + break;
> + }
> + if (ret == -1) { /* Error */
> + break;
> + }
> +
> + qemu_mutex_lock(&p->mutex);
> + ret = multifd_recv_unfill_packet(p, &local_err);
> + if (ret) {
> + qemu_mutex_unlock(&p->mutex);
> + break;
> + }
> +
> + used = p->pages->used;
> + flags = p->flags;
> + trace_multifd_recv(p->id, p->packet_num, used, flags,
> + p->next_packet_size);
> + p->num_packets++;
> + p->num_pages += used;
> + qemu_mutex_unlock(&p->mutex);
> +
> + if (used) {
> + ret = qio_channel_readv_all(p->c, p->pages->iov,
> + used, &local_err);
> + if (ret != 0) {
> + break;
> + }
> + }
> +
> + if (flags & MULTIFD_FLAG_SYNC) {
> + qemu_sem_post(&multifd_recv_state->sem_sync);
> + qemu_sem_wait(&p->sem_sync);
> + }
> + }
> +
> + if (local_err) {
> + multifd_recv_terminate_threads(local_err);
> + }
> + qemu_mutex_lock(&p->mutex);
> + p->running = false;
> + qemu_mutex_unlock(&p->mutex);
> +
> + rcu_unregister_thread();
> + trace_multifd_recv_thread_end(p->id, p->num_packets, p->num_pages);
> +
> + return NULL;
> +}
> +
> +int multifd_load_setup(void)
> +{
> + int thread_count;
> + uint32_t page_count = MULTIFD_PACKET_SIZE / qemu_target_page_size();
> + uint8_t i;
> +
> + if (!migrate_use_multifd()) {
> + return 0;
> + }
> + thread_count = migrate_multifd_channels();
> + multifd_recv_state = g_malloc0(sizeof(*multifd_recv_state));
> + multifd_recv_state->params = g_new0(MultiFDRecvParams, thread_count);
> + atomic_set(&multifd_recv_state->count, 0);
> + qemu_sem_init(&multifd_recv_state->sem_sync, 0);
> +
> + for (i = 0; i < thread_count; i++) {
> + MultiFDRecvParams *p = &multifd_recv_state->params[i];
> +
> + qemu_mutex_init(&p->mutex);
> + qemu_sem_init(&p->sem_sync, 0);
> + p->id = i;
> + p->pages = multifd_pages_init(page_count);
> + p->packet_len = sizeof(MultiFDPacket_t)
> + + sizeof(ram_addr_t) * page_count;
> + p->packet = g_malloc0(p->packet_len);
> + p->name = g_strdup_printf("multifdrecv_%d", i);
> + }
> + return 0;
> +}
> +
> +bool multifd_recv_all_channels_created(void)
> +{
> + int thread_count = migrate_multifd_channels();
> +
> + if (!migrate_use_multifd()) {
> + return true;
> + }
> +
> + return thread_count == atomic_read(&multifd_recv_state->count);
> +}
> +
> +/*
> + * Try to receive all multifd channels to get ready for the migration.
> + * - Return true and do not set @errp when correctly receving all channels;
> + * - Return false and do not set @errp when correctly receiving the current
> one;
> + * - Return false and set @errp when failing to receive the current channel.
> + */
> +bool multifd_recv_new_channel(QIOChannel *ioc, Error **errp)
> +{
> + MultiFDRecvParams *p;
> + Error *local_err = NULL;
> + int id;
> +
> + id = multifd_recv_initial_packet(ioc, &local_err);
> + if (id < 0) {
> + multifd_recv_terminate_threads(local_err);
> + error_propagate_prepend(errp, local_err,
> + "failed to receive packet"
> + " via multifd channel %d: ",
> + atomic_read(&multifd_recv_state->count));
> + return false;
> + }
> +
> + p = &multifd_recv_state->params[id];
> + if (p->c != NULL) {
> + error_setg(&local_err, "multifd: received id '%d' already setup'",
> + id);
> + multifd_recv_terminate_threads(local_err);
> + error_propagate(errp, local_err);
> + return false;
> + }
> + p->c = ioc;
> + object_ref(OBJECT(ioc));
> + /* initial packet */
> + p->num_packets = 1;
> +
> + p->running = true;
> + qemu_thread_create(&p->thread, p->name, multifd_recv_thread, p,
> + QEMU_THREAD_JOINABLE);
> + atomic_inc(&multifd_recv_state->count);
> + return atomic_read(&multifd_recv_state->count) ==
> + migrate_multifd_channels();
> +}
> +
> +/**
> + * save_page_header: write page header to wire
> + *
> + * If this is the 1st block, it also writes the block identification
> + *
> + * Returns the number of bytes written
> + *
> + * @f: QEMUFile where to send the data
> + * @block: block that contains the page we want to send
> + * @offset: offset inside the block for the page
> + * in the lower bits, it contains flags
> + */
> +static size_t save_page_header(RAMState *rs, QEMUFile *f, RAMBlock *block,
> + ram_addr_t offset)
> +{
> + size_t size, len;
> +
> + if (block == rs->last_sent_block) {
> + offset |= RAM_SAVE_FLAG_CONTINUE;
> + }
> + qemu_put_be64(f, offset);
> + size = 8;
> +
> + if (!(offset & RAM_SAVE_FLAG_CONTINUE)) {
> + len = strlen(block->idstr);
> + qemu_put_byte(f, len);
> + qemu_put_buffer(f, (uint8_t *)block->idstr, len);
> + size += 1 + len;
> + rs->last_sent_block = block;
> + }
> + return size;
> +}
> +
> +/**
> + * mig_throttle_guest_down: throotle down the guest
> + *
> + * Reduce amount of guest cpu execution to hopefully slow down memory
> + * writes. If guest dirty memory rate is reduced below the rate at
> + * which we can transfer pages to the destination then we should be
> + * able to complete migration. Some workloads dirty memory way too
> + * fast and will not effectively converge, even with auto-converge.
> + */
> +static void mig_throttle_guest_down(void)
> +{
> + MigrationState *s = migrate_get_current();
> + uint64_t pct_initial = s->parameters.cpu_throttle_initial;
> + uint64_t pct_icrement = s->parameters.cpu_throttle_increment;
> + int pct_max = s->parameters.max_cpu_throttle;
> +
> + /* We have not started throttling yet. Let's start it. */
> + if (!cpu_throttle_active()) {
> + cpu_throttle_set(pct_initial);
> + } else {
> + /* Throttling already on, just increase the rate */
> + cpu_throttle_set(MIN(cpu_throttle_get_percentage() + pct_icrement,
> + pct_max));
> + }
> +}
> +
> +/**
> + * xbzrle_cache_zero_page: insert a zero page in the XBZRLE cache
> + *
> + * @rs: current RAM state
> + * @current_addr: address for the zero page
> + *
> + * Update the xbzrle cache to reflect a page that's been sent as all 0.
> + * The important thing is that a stale (not-yet-0'd) page be replaced
> + * by the new data.
> + * As a bonus, if the page wasn't in the cache it gets added so that
> + * when a small write is made into the 0'd page it gets XBZRLE sent.
> + */
> +static void xbzrle_cache_zero_page(RAMState *rs, ram_addr_t current_addr)
> +{
> + if (rs->ram_bulk_stage || !migrate_use_xbzrle()) {
> + return;
> + }
> +
> + /* We don't care if this fails to allocate a new cache page
> + * as long as it updated an old one */
> + cache_insert(XBZRLE.cache, current_addr, XBZRLE.zero_target_page,
> + ram_counters.dirty_sync_count);
> +}
> +
> +#define ENCODING_FLAG_XBZRLE 0x1
> +
> +/**
> + * save_xbzrle_page: compress and send current page
> + *
> + * Returns: 1 means that we wrote the page
> + * 0 means that page is identical to the one already sent
> + * -1 means that xbzrle would be longer than normal
> + *
> + * @rs: current RAM state
> + * @current_data: pointer to the address of the page contents
> + * @current_addr: addr of the page
> + * @block: block that contains the page we want to send
> + * @offset: offset inside the block for the page
> + * @last_stage: if we are at the completion stage
> + */
> +static int save_xbzrle_page(RAMState *rs, uint8_t **current_data,
> + ram_addr_t current_addr, RAMBlock *block,
> + ram_addr_t offset, bool last_stage)
> +{
> + int encoded_len = 0, bytes_xbzrle;
> + uint8_t *prev_cached_page;
> +
> + if (!cache_is_cached(XBZRLE.cache, current_addr,
> + ram_counters.dirty_sync_count)) {
> + xbzrle_counters.cache_miss++;
> + if (!last_stage) {
> + if (cache_insert(XBZRLE.cache, current_addr, *current_data,
> + ram_counters.dirty_sync_count) == -1) {
> + return -1;
> + } else {
> + /* update *current_data when the page has been
> + inserted into cache */
> + *current_data = get_cached_data(XBZRLE.cache, current_addr);
> + }
> + }
> + return -1;
> + }
> +
> + prev_cached_page = get_cached_data(XBZRLE.cache, current_addr);
> +
> + /* save current buffer into memory */
> + memcpy(XBZRLE.current_buf, *current_data, TARGET_PAGE_SIZE);
> +
> + /* XBZRLE encoding (if there is no overflow) */
> + encoded_len = xbzrle_encode_buffer(prev_cached_page, XBZRLE.current_buf,
> + TARGET_PAGE_SIZE, XBZRLE.encoded_buf,
> + TARGET_PAGE_SIZE);
> +
> + /*
> + * Update the cache contents, so that it corresponds to the data
> + * sent, in all cases except where we skip the page.
> + */
> + if (!last_stage && encoded_len != 0) {
> + memcpy(prev_cached_page, XBZRLE.current_buf, TARGET_PAGE_SIZE);
> + /*
> + * In the case where we couldn't compress, ensure that the caller
> + * sends the data from the cache, since the guest might have
> + * changed the RAM since we copied it.
> + */
> + *current_data = prev_cached_page;
> + }
> +
> + if (encoded_len == 0) {
> + trace_save_xbzrle_page_skipping();
> + return 0;
> + } else if (encoded_len == -1) {
> + trace_save_xbzrle_page_overflow();
> + xbzrle_counters.overflow++;
> + return -1;
> + }
> +
> + /* Send XBZRLE based compressed page */
> + bytes_xbzrle = save_page_header(rs, rs->f, block,
> + offset | RAM_SAVE_FLAG_XBZRLE);
> + qemu_put_byte(rs->f, ENCODING_FLAG_XBZRLE);
> + qemu_put_be16(rs->f, encoded_len);
> + qemu_put_buffer(rs->f, XBZRLE.encoded_buf, encoded_len);
> + bytes_xbzrle += encoded_len + 1 + 2;
> + xbzrle_counters.pages++;
> + xbzrle_counters.bytes += bytes_xbzrle;
> + ram_counters.transferred += bytes_xbzrle;
> +
> + return 1;
> +}
> +
> +/**
> + * migration_bitmap_find_dirty: find the next dirty page from start
> + *
> + * Returns the page offset within memory region of the start of a dirty page
> + *
> + * @rs: current RAM state
> + * @rb: RAMBlock where to search for dirty pages
> + * @start: page where we start the search
> + */
> +static inline
> +unsigned long migration_bitmap_find_dirty(RAMState *rs, RAMBlock *rb,
> + unsigned long start)
> +{
> + unsigned long size = rb->used_length >> TARGET_PAGE_BITS;
> + unsigned long *bitmap = rb->bmap;
> + unsigned long next;
> +
> + if (ramblock_is_ignored(rb)) {
> + return size;
> + }
> +
> + /*
> + * When the free page optimization is enabled, we need to check the
> bitmap
> + * to send the non-free pages rather than all the pages in the bulk
> stage.
> + */
> + if (!rs->fpo_enabled && rs->ram_bulk_stage && start > 0) {
> + next = start + 1;
> + } else {
> + next = find_next_bit(bitmap, size, start);
> + }
> +
> + return next;
> +}
> +
> +static inline bool migration_bitmap_clear_dirty(RAMState *rs,
> + RAMBlock *rb,
> + unsigned long page)
> +{
> + bool ret;
> +
> + qemu_mutex_lock(&rs->bitmap_mutex);
> + ret = test_and_clear_bit(page, rb->bmap);
> +
> + if (ret) {
> + rs->migration_dirty_pages--;
> + }
> + qemu_mutex_unlock(&rs->bitmap_mutex);
> +
> + return ret;
> +}
> +
> +/* Called with RCU critical section */
> +static void migration_bitmap_sync_range(RAMState *rs, RAMBlock *rb,
> + ram_addr_t length)
> +{
> + rs->migration_dirty_pages +=
> + cpu_physical_memory_sync_dirty_bitmap(rb, 0, length,
> + &rs->num_dirty_pages_period);
> +}
> +
> +/**
> + * ram_pagesize_summary: calculate all the pagesizes of a VM
> + *
> + * Returns a summary bitmap of the page sizes of all RAMBlocks
> + *
> + * For VMs with just normal pages this is equivalent to the host page
> + * size. If it's got some huge pages then it's the OR of all the
> + * different page sizes.
> + */
> +uint64_t ram_pagesize_summary(void)
> +{
> + RAMBlock *block;
> + uint64_t summary = 0;
> +
> + RAMBLOCK_FOREACH_NOT_IGNORED(block) {
> + summary |= block->page_size;
> + }
> +
> + return summary;
> +}
> +
> +uint64_t ram_get_total_transferred_pages(void)
> +{
> + return ram_counters.normal + ram_counters.duplicate +
> + compression_counters.pages + xbzrle_counters.pages;
> +}
> +
> +static void migration_update_rates(RAMState *rs, int64_t end_time)
> +{
> + uint64_t page_count = rs->target_page_count - rs->target_page_count_prev;
> + double compressed_size;
> +
> + /* calculate period counters */
> + ram_counters.dirty_pages_rate = rs->num_dirty_pages_period * 1000
> + / (end_time - rs->time_last_bitmap_sync);
> +
> + if (!page_count) {
> + return;
> + }
> +
> + if (migrate_use_xbzrle()) {
> + xbzrle_counters.cache_miss_rate =
> (double)(xbzrle_counters.cache_miss -
> + rs->xbzrle_cache_miss_prev) / page_count;
> + rs->xbzrle_cache_miss_prev = xbzrle_counters.cache_miss;
> + }
> +
> + if (migrate_use_compression()) {
> + compression_counters.busy_rate = (double)(compression_counters.busy -
> + rs->compress_thread_busy_prev) / page_count;
> + rs->compress_thread_busy_prev = compression_counters.busy;
> +
> + compressed_size = compression_counters.compressed_size -
> + rs->compressed_size_prev;
> + if (compressed_size) {
> + double uncompressed_size = (compression_counters.pages -
> + rs->compress_pages_prev) *
> TARGET_PAGE_SIZE;
> +
> + /* Compression-Ratio = Uncompressed-size / Compressed-size */
> + compression_counters.compression_rate =
> + uncompressed_size / compressed_size;
> +
> + rs->compress_pages_prev = compression_counters.pages;
> + rs->compressed_size_prev = compression_counters.compressed_size;
> + }
> + }
> +}
> +
> +static void migration_bitmap_sync(RAMState *rs)
> +{
> + RAMBlock *block;
> + int64_t end_time;
> + uint64_t bytes_xfer_now;
> +
> + ram_counters.dirty_sync_count++;
> +
> + if (!rs->time_last_bitmap_sync) {
> + rs->time_last_bitmap_sync = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> + }
> +
> + trace_migration_bitmap_sync_start();
> + memory_global_dirty_log_sync();
> +
> + qemu_mutex_lock(&rs->bitmap_mutex);
> + rcu_read_lock();
> + RAMBLOCK_FOREACH_NOT_IGNORED(block) {
> + migration_bitmap_sync_range(rs, block, block->used_length);
> + }
> + ram_counters.remaining = ram_bytes_remaining();
> + rcu_read_unlock();
> + qemu_mutex_unlock(&rs->bitmap_mutex);
> +
> + trace_migration_bitmap_sync_end(rs->num_dirty_pages_period);
> +
> + end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> +
> + /* more than 1 second = 1000 millisecons */
> + if (end_time > rs->time_last_bitmap_sync + 1000) {
> + bytes_xfer_now = ram_counters.transferred;
> +
> + /* During block migration the auto-converge logic incorrectly detects
> + * that ram migration makes no progress. Avoid this by disabling the
> + * throttling logic during the bulk phase of block migration. */
> + if (migrate_auto_converge() && !blk_mig_bulk_active()) {
> + /* The following detection logic can be refined later. For now:
> + Check to see if the dirtied bytes is 50% more than the approx.
> + amount of bytes that just got transferred since the last time
> we
> + were in this routine. If that happens twice, start or increase
> + throttling */
> +
> + if ((rs->num_dirty_pages_period * TARGET_PAGE_SIZE >
> + (bytes_xfer_now - rs->bytes_xfer_prev) / 2) &&
> + (++rs->dirty_rate_high_cnt >= 2)) {
> + trace_migration_throttle();
> + rs->dirty_rate_high_cnt = 0;
> + mig_throttle_guest_down();
> + }
> + }
> +
> + migration_update_rates(rs, end_time);
> +
> + rs->target_page_count_prev = rs->target_page_count;
> +
> + /* reset period counters */
> + rs->time_last_bitmap_sync = end_time;
> + rs->num_dirty_pages_period = 0;
> + rs->bytes_xfer_prev = bytes_xfer_now;
> + }
> + if (migrate_use_events()) {
> + qapi_event_send_migration_pass(ram_counters.dirty_sync_count);
> + }
> +}
> +
> +static void migration_bitmap_sync_precopy(RAMState *rs)
> +{
> + Error *local_err = NULL;
> +
> + /*
> + * The current notifier usage is just an optimization to migration, so we
> + * don't stop the normal migration process in the error case.
> + */
> + if (precopy_notify(PRECOPY_NOTIFY_BEFORE_BITMAP_SYNC, &local_err)) {
> + error_report_err(local_err);
> + }
> +
> + migration_bitmap_sync(rs);
> +
> + if (precopy_notify(PRECOPY_NOTIFY_AFTER_BITMAP_SYNC, &local_err)) {
> + error_report_err(local_err);
> + }
> +}
> +
> +/**
> + * save_zero_page_to_file: send the zero page to the file
> + *
> + * Returns the size of data written to the file, 0 means the page is not
> + * a zero page
> + *
> + * @rs: current RAM state
> + * @file: the file where the data is saved
> + * @block: block that contains the page we want to send
> + * @offset: offset inside the block for the page
> + */
> +static int save_zero_page_to_file(RAMState *rs, QEMUFile *file,
> + RAMBlock *block, ram_addr_t offset)
> +{
> + uint8_t *p = block->host + offset;
> + int len = 0;
> +
> + if (is_zero_range(p, TARGET_PAGE_SIZE)) {
> + len += save_page_header(rs, file, block, offset |
> RAM_SAVE_FLAG_ZERO);
> + qemu_put_byte(file, 0);
> + len += 1;
> + }
> + return len;
> +}
> +
> +/**
> + * save_zero_page: send the zero page to the stream
> + *
> + * Returns the number of pages written.
> + *
> + * @rs: current RAM state
> + * @block: block that contains the page we want to send
> + * @offset: offset inside the block for the page
> + */
> +static int save_zero_page(RAMState *rs, RAMBlock *block, ram_addr_t offset)
> +{
> + int len = save_zero_page_to_file(rs, rs->f, block, offset);
> +
> + if (len) {
> + ram_counters.duplicate++;
> + ram_counters.transferred += len;
> + return 1;
> + }
> + return -1;
> +}
> +
> +static void ram_release_pages(const char *rbname, uint64_t offset, int pages)
> +{
> + if (!migrate_release_ram() || !migration_in_postcopy()) {
> + return;
> + }
> +
> + ram_discard_range(rbname, offset, pages << TARGET_PAGE_BITS);
> +}
> +
> +/*
> + * @pages: the number of pages written by the control path,
> + * < 0 - error
> + * > 0 - number of pages written
> + *
> + * Return true if the pages has been saved, otherwise false is returned.
> + */
> +static bool control_save_page(RAMState *rs, RAMBlock *block, ram_addr_t
> offset,
> + int *pages)
> +{
> + uint64_t bytes_xmit = 0;
> + int ret;
> +
> + *pages = -1;
> + ret = ram_control_save_page(rs->f, block->offset, offset,
> TARGET_PAGE_SIZE,
> + &bytes_xmit);
> + if (ret == RAM_SAVE_CONTROL_NOT_SUPP) {
> + return false;
> + }
> +
> + if (bytes_xmit) {
> + ram_counters.transferred += bytes_xmit;
> + *pages = 1;
> + }
> +
> + if (ret == RAM_SAVE_CONTROL_DELAYED) {
> + return true;
> + }
> +
> + if (bytes_xmit > 0) {
> + ram_counters.normal++;
> + } else if (bytes_xmit == 0) {
> + ram_counters.duplicate++;
> + }
> +
> + return true;
> +}
> +
> +/*
> + * directly send the page to the stream
> + *
> + * Returns the number of pages written.
> + *
> + * @rs: current RAM state
> + * @block: block that contains the page we want to send
> + * @offset: offset inside the block for the page
> + * @buf: the page to be sent
> + * @async: send to page asyncly
> + */
> +static int save_normal_page(RAMState *rs, RAMBlock *block, ram_addr_t offset,
> + uint8_t *buf, bool async)
> +{
> + ram_counters.transferred += save_page_header(rs, rs->f, block,
> + offset |
> RAM_SAVE_FLAG_PAGE);
> + if (async) {
> + qemu_put_buffer_async(rs->f, buf, TARGET_PAGE_SIZE,
> + migrate_release_ram() &
> + migration_in_postcopy());
> + } else {
> + qemu_put_buffer(rs->f, buf, TARGET_PAGE_SIZE);
> + }
> + ram_counters.transferred += TARGET_PAGE_SIZE;
> + ram_counters.normal++;
> + return 1;
> +}
> +
> +/**
> + * ram_save_page: send the given page to the stream
> + *
> + * Returns the number of pages written.
> + * < 0 - error
> + * >=0 - Number of pages written - this might legally be 0
> + * if xbzrle noticed the page was the same.
> + *
> + * @rs: current RAM state
> + * @block: block that contains the page we want to send
> + * @offset: offset inside the block for the page
> + * @last_stage: if we are at the completion stage
> + */
> +static int ram_save_page(RAMState *rs, PageSearchStatus *pss, bool
> last_stage)
> +{
> + int pages = -1;
> + uint8_t *p;
> + bool send_async = true;
> + RAMBlock *block = pss->block;
> + ram_addr_t offset = pss->page << TARGET_PAGE_BITS;
> + ram_addr_t current_addr = block->offset + offset;
> +
> + p = block->host + offset;
> + trace_ram_save_page(block->idstr, (uint64_t)offset, p);
> +
> + XBZRLE_cache_lock();
> + if (!rs->ram_bulk_stage && !migration_in_postcopy() &&
> + migrate_use_xbzrle()) {
> + pages = save_xbzrle_page(rs, &p, current_addr, block,
> + offset, last_stage);
> + if (!last_stage) {
> + /* Can't send this cached data async, since the cache page
> + * might get updated before it gets to the wire
> + */
> + send_async = false;
> + }
> + }
> +
> + /* XBZRLE overflow or normal page */
> + if (pages == -1) {
> + pages = save_normal_page(rs, block, offset, p, send_async);
> + }
> +
> + XBZRLE_cache_unlock();
> +
> + return pages;
> +}
> +
> +static int ram_save_multifd_page(RAMState *rs, RAMBlock *block,
> + ram_addr_t offset)
> +{
> + multifd_queue_page(block, offset);
> + ram_counters.normal++;
> +
> + return 1;
> +}
> +
> +static bool do_compress_ram_page(QEMUFile *f, z_stream *stream, RAMBlock
> *block,
> + ram_addr_t offset, uint8_t *source_buf)
> +{
> + RAMState *rs = ram_state;
> + uint8_t *p = block->host + (offset & TARGET_PAGE_MASK);
> + bool zero_page = false;
> + int ret;
> +
> + if (save_zero_page_to_file(rs, f, block, offset)) {
> + zero_page = true;
> + goto exit;
> + }
> +
> + save_page_header(rs, f, block, offset | RAM_SAVE_FLAG_COMPRESS_PAGE);
> +
> + /*
> + * copy it to a internal buffer to avoid it being modified by VM
> + * so that we can catch up the error during compression and
> + * decompression
> + */
> + memcpy(source_buf, p, TARGET_PAGE_SIZE);
> + ret = qemu_put_compression_data(f, stream, source_buf, TARGET_PAGE_SIZE);
> + if (ret < 0) {
> + qemu_file_set_error(migrate_get_current()->to_dst_file, ret);
> + error_report("compressed data failed!");
> + return false;
> + }
> +
> +exit:
> + ram_release_pages(block->idstr, offset & TARGET_PAGE_MASK, 1);
> + return zero_page;
> +}
> +
> +static void
> +update_compress_thread_counts(const CompressParam *param, int bytes_xmit)
> +{
> + ram_counters.transferred += bytes_xmit;
> +
> + if (param->zero_page) {
> + ram_counters.duplicate++;
> + return;
> + }
> +
> + /* 8 means a header with RAM_SAVE_FLAG_CONTINUE. */
> + compression_counters.compressed_size += bytes_xmit - 8;
> + compression_counters.pages++;
> +}
> +
> +static bool save_page_use_compression(RAMState *rs);
> +
> +static void flush_compressed_data(RAMState *rs)
> +{
> + int idx, len, thread_count;
> +
> + if (!save_page_use_compression(rs)) {
> + return;
> + }
> + thread_count = migrate_compress_threads();
> +
> + qemu_mutex_lock(&comp_done_lock);
> + for (idx = 0; idx < thread_count; idx++) {
> + while (!comp_param[idx].done) {
> + qemu_cond_wait(&comp_done_cond, &comp_done_lock);
> + }
> + }
> + qemu_mutex_unlock(&comp_done_lock);
> +
> + for (idx = 0; idx < thread_count; idx++) {
> + qemu_mutex_lock(&comp_param[idx].mutex);
> + if (!comp_param[idx].quit) {
> + len = qemu_put_qemu_file(rs->f, comp_param[idx].file);
> + /*
> + * it's safe to fetch zero_page without holding comp_done_lock
> + * as there is no further request submitted to the thread,
> + * i.e, the thread should be waiting for a request at this point.
> + */
> + update_compress_thread_counts(&comp_param[idx], len);
> + }
> + qemu_mutex_unlock(&comp_param[idx].mutex);
> + }
> +}
> +
> +static inline void set_compress_params(CompressParam *param, RAMBlock *block,
> + ram_addr_t offset)
> +{
> + param->block = block;
> + param->offset = offset;
> +}
> +
> +static int compress_page_with_multi_thread(RAMState *rs, RAMBlock *block,
> + ram_addr_t offset)
> +{
> + int idx, thread_count, bytes_xmit = -1, pages = -1;
> + bool wait = migrate_compress_wait_thread();
> +
> + thread_count = migrate_compress_threads();
> + qemu_mutex_lock(&comp_done_lock);
> +retry:
> + for (idx = 0; idx < thread_count; idx++) {
> + if (comp_param[idx].done) {
> + comp_param[idx].done = false;
> + bytes_xmit = qemu_put_qemu_file(rs->f, comp_param[idx].file);
> + qemu_mutex_lock(&comp_param[idx].mutex);
> + set_compress_params(&comp_param[idx], block, offset);
> + qemu_cond_signal(&comp_param[idx].cond);
> + qemu_mutex_unlock(&comp_param[idx].mutex);
> + pages = 1;
> + update_compress_thread_counts(&comp_param[idx], bytes_xmit);
> + break;
> + }
> + }
> +
> + /*
> + * wait for the free thread if the user specifies 'compress-wait-thread',
> + * otherwise we will post the page out in the main thread as normal page.
> + */
> + if (pages < 0 && wait) {
> + qemu_cond_wait(&comp_done_cond, &comp_done_lock);
> + goto retry;
> + }
> + qemu_mutex_unlock(&comp_done_lock);
> +
> + return pages;
> +}
> +
> +/**
> + * find_dirty_block: find the next dirty page and update any state
> + * associated with the search process.
> + *
> + * Returns true if a page is found
> + *
> + * @rs: current RAM state
> + * @pss: data about the state of the current dirty page scan
> + * @again: set to false if the search has scanned the whole of RAM
> + */
> +static bool find_dirty_block(RAMState *rs, PageSearchStatus *pss, bool
> *again)
> +{
> + pss->page = migration_bitmap_find_dirty(rs, pss->block, pss->page);
> + if (pss->complete_round && pss->block == rs->last_seen_block &&
> + pss->page >= rs->last_page) {
> + /*
> + * We've been once around the RAM and haven't found anything.
> + * Give up.
> + */
> + *again = false;
> + return false;
> + }
> + if ((pss->page << TARGET_PAGE_BITS) >= pss->block->used_length) {
> + /* Didn't find anything in this RAM Block */
> + pss->page = 0;
> + pss->block = QLIST_NEXT_RCU(pss->block, next);
> + if (!pss->block) {
> + /*
> + * If memory migration starts over, we will meet a dirtied page
> + * which may still exists in compression threads's ring, so we
> + * should flush the compressed data to make sure the new page
> + * is not overwritten by the old one in the destination.
> + *
> + * Also If xbzrle is on, stop using the data compression at this
> + * point. In theory, xbzrle can do better than compression.
> + */
> + flush_compressed_data(rs);
> +
> + /* Hit the end of the list */
> + pss->block = QLIST_FIRST_RCU(&ram_list.blocks);
> + /* Flag that we've looped */
> + pss->complete_round = true;
> + rs->ram_bulk_stage = false;
> + }
> + /* Didn't find anything this time, but try again on the new block */
> + *again = true;
> + return false;
> + } else {
> + /* Can go around again, but... */
> + *again = true;
> + /* We've found something so probably don't need to */
> + return true;
> + }
> +}
> +
> +/**
> + * unqueue_page: gets a page of the queue
> + *
> + * Helper for 'get_queued_page' - gets a page off the queue
> + *
> + * Returns the block of the page (or NULL if none available)
> + *
> + * @rs: current RAM state
> + * @offset: used to return the offset within the RAMBlock
> + */
> +static RAMBlock *unqueue_page(RAMState *rs, ram_addr_t *offset)
> +{
> + RAMBlock *block = NULL;
> +
> + if (QSIMPLEQ_EMPTY_ATOMIC(&rs->src_page_requests)) {
> + return NULL;
> + }
> +
> + qemu_mutex_lock(&rs->src_page_req_mutex);
> + if (!QSIMPLEQ_EMPTY(&rs->src_page_requests)) {
> + struct RAMSrcPageRequest *entry =
> + QSIMPLEQ_FIRST(&rs->src_page_requests);
> + block = entry->rb;
> + *offset = entry->offset;
> +
> + if (entry->len > TARGET_PAGE_SIZE) {
> + entry->len -= TARGET_PAGE_SIZE;
> + entry->offset += TARGET_PAGE_SIZE;
> + } else {
> + memory_region_unref(block->mr);
> + QSIMPLEQ_REMOVE_HEAD(&rs->src_page_requests, next_req);
> + g_free(entry);
> + migration_consume_urgent_request();
> + }
> + }
> + qemu_mutex_unlock(&rs->src_page_req_mutex);
> +
> + return block;
> +}
> +
> +/**
> + * get_queued_page: unqueue a page from the postcopy requests
> + *
> + * Skips pages that are already sent (!dirty)
> + *
> + * Returns true if a queued page is found
> + *
> + * @rs: current RAM state
> + * @pss: data about the state of the current dirty page scan
> + */
> +static bool get_queued_page(RAMState *rs, PageSearchStatus *pss)
> +{
> + RAMBlock *block;
> + ram_addr_t offset;
> + bool dirty;
> +
> + do {
> + block = unqueue_page(rs, &offset);
> + /*
> + * We're sending this page, and since it's postcopy nothing else
> + * will dirty it, and we must make sure it doesn't get sent again
> + * even if this queue request was received after the background
> + * search already sent it.
> + */
> + if (block) {
> + unsigned long page;
> +
> + page = offset >> TARGET_PAGE_BITS;
> + dirty = test_bit(page, block->bmap);
> + if (!dirty) {
> + trace_get_queued_page_not_dirty(block->idstr,
> (uint64_t)offset,
> + page, test_bit(page, block->unsentmap));
> + } else {
> + trace_get_queued_page(block->idstr, (uint64_t)offset, page);
> + }
> + }
> +
> + } while (block && !dirty);
> +
> + if (block) {
> + /*
> + * As soon as we start servicing pages out of order, then we have
> + * to kill the bulk stage, since the bulk stage assumes
> + * in (migration_bitmap_find_and_reset_dirty) that every page is
> + * dirty, that's no longer true.
> + */
> + rs->ram_bulk_stage = false;
> +
> + /*
> + * We want the background search to continue from the queued page
> + * since the guest is likely to want other pages near to the page
> + * it just requested.
> + */
> + pss->block = block;
> + pss->page = offset >> TARGET_PAGE_BITS;
> +
> + /*
> + * This unqueued page would break the "one round" check, even is
> + * really rare.
> + */
> + pss->complete_round = false;
> + }
> +
> + return !!block;
> +}
> +
> +/**
> + * migration_page_queue_free: drop any remaining pages in the ram
> + * request queue
> + *
> + * It should be empty at the end anyway, but in error cases there may
> + * be some left. in case that there is any page left, we drop it.
> + *
> + */
> +static void migration_page_queue_free(RAMState *rs)
> +{
> + struct RAMSrcPageRequest *mspr, *next_mspr;
> + /* This queue generally should be empty - but in the case of a failed
> + * migration might have some droppings in.
> + */
> + rcu_read_lock();
> + QSIMPLEQ_FOREACH_SAFE(mspr, &rs->src_page_requests, next_req, next_mspr)
> {
> + memory_region_unref(mspr->rb->mr);
> + QSIMPLEQ_REMOVE_HEAD(&rs->src_page_requests, next_req);
> + g_free(mspr);
> + }
> + rcu_read_unlock();
> +}
> +
> +/**
> + * ram_save_queue_pages: queue the page for transmission
> + *
> + * A request from postcopy destination for example.
> + *
> + * Returns zero on success or negative on error
> + *
> + * @rbname: Name of the RAMBLock of the request. NULL means the
> + * same that last one.
> + * @start: starting address from the start of the RAMBlock
> + * @len: length (in bytes) to send
> + */
> +int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t
> len)
> +{
> + RAMBlock *ramblock;
> + RAMState *rs = ram_state;
> +
> + ram_counters.postcopy_requests++;
> + rcu_read_lock();
> + if (!rbname) {
> + /* Reuse last RAMBlock */
> + ramblock = rs->last_req_rb;
> +
> + if (!ramblock) {
> + /*
> + * Shouldn't happen, we can't reuse the last RAMBlock if
> + * it's the 1st request.
> + */
> + error_report("ram_save_queue_pages no previous block");
> + goto err;
> + }
> + } else {
> + ramblock = qemu_ram_block_by_name(rbname);
> +
> + if (!ramblock) {
> + /* We shouldn't be asked for a non-existent RAMBlock */
> + error_report("ram_save_queue_pages no block '%s'", rbname);
> + goto err;
> + }
> + rs->last_req_rb = ramblock;
> + }
> + trace_ram_save_queue_pages(ramblock->idstr, start, len);
> + if (start+len > ramblock->used_length) {
> + error_report("%s request overrun start=" RAM_ADDR_FMT " len="
> + RAM_ADDR_FMT " blocklen=" RAM_ADDR_FMT,
> + __func__, start, len, ramblock->used_length);
> + goto err;
> + }
> +
> + struct RAMSrcPageRequest *new_entry =
> + g_malloc0(sizeof(struct RAMSrcPageRequest));
> + new_entry->rb = ramblock;
> + new_entry->offset = start;
> + new_entry->len = len;
> +
> + memory_region_ref(ramblock->mr);
> + qemu_mutex_lock(&rs->src_page_req_mutex);
> + QSIMPLEQ_INSERT_TAIL(&rs->src_page_requests, new_entry, next_req);
> + migration_make_urgent_request();
> + qemu_mutex_unlock(&rs->src_page_req_mutex);
> + rcu_read_unlock();
> +
> + return 0;
> +
> +err:
> + rcu_read_unlock();
> + return -1;
> +}
> +
> +static bool save_page_use_compression(RAMState *rs)
> +{
> + if (!migrate_use_compression()) {
> + return false;
> + }
> +
> + /*
> + * If xbzrle is on, stop using the data compression after first
> + * round of migration even if compression is enabled. In theory,
> + * xbzrle can do better than compression.
> + */
> + if (rs->ram_bulk_stage || !migrate_use_xbzrle()) {
> + return true;
> + }
> +
> + return false;
> +}
> +
> +/*
> + * try to compress the page before posting it out, return true if the page
> + * has been properly handled by compression, otherwise needs other
> + * paths to handle it
> + */
> +static bool save_compress_page(RAMState *rs, RAMBlock *block, ram_addr_t
> offset)
> +{
> + if (!save_page_use_compression(rs)) {
> + return false;
> + }
> +
> + /*
> + * When starting the process of a new block, the first page of
> + * the block should be sent out before other pages in the same
> + * block, and all the pages in last block should have been sent
> + * out, keeping this order is important, because the 'cont' flag
> + * is used to avoid resending the block name.
> + *
> + * We post the fist page as normal page as compression will take
> + * much CPU resource.
> + */
> + if (block != rs->last_sent_block) {
> + flush_compressed_data(rs);
> + return false;
> + }
> +
> + if (compress_page_with_multi_thread(rs, block, offset) > 0) {
> + return true;
> + }
> +
> + compression_counters.busy++;
> + return false;
> +}
> +
> +/**
> + * ram_save_target_page: save one target page
> + *
> + * Returns the number of pages written
> + *
> + * @rs: current RAM state
> + * @pss: data about the page we want to send
> + * @last_stage: if we are at the completion stage
> + */
> +static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss,
> + bool last_stage)
> +{
> + RAMBlock *block = pss->block;
> + ram_addr_t offset = pss->page << TARGET_PAGE_BITS;
> + int res;
> +
> + if (control_save_page(rs, block, offset, &res)) {
> + return res;
> + }
> +
> + if (save_compress_page(rs, block, offset)) {
> + return 1;
> + }
> +
> + res = save_zero_page(rs, block, offset);
> + if (res > 0) {
> + /* Must let xbzrle know, otherwise a previous (now 0'd) cached
> + * page would be stale
> + */
> + if (!save_page_use_compression(rs)) {
> + XBZRLE_cache_lock();
> + xbzrle_cache_zero_page(rs, block->offset + offset);
> + XBZRLE_cache_unlock();
> + }
> + ram_release_pages(block->idstr, offset, res);
> + return res;
> + }
> +
> + /*
> + * do not use multifd for compression as the first page in the new
> + * block should be posted out before sending the compressed page
> + */
> + if (!save_page_use_compression(rs) && migrate_use_multifd()) {
> + return ram_save_multifd_page(rs, block, offset);
> + }
> +
> + return ram_save_page(rs, pss, last_stage);
> +}
> +
> +/**
> + * ram_save_host_page: save a whole host page
> + *
> + * Starting at *offset send pages up to the end of the current host
> + * page. It's valid for the initial offset to point into the middle of
> + * a host page in which case the remainder of the hostpage is sent.
> + * Only dirty target pages are sent. Note that the host page size may
> + * be a huge page for this block.
> + * The saving stops at the boundary of the used_length of the block
> + * if the RAMBlock isn't a multiple of the host page size.
> + *
> + * Returns the number of pages written or negative on error
> + *
> + * @rs: current RAM state
> + * @ms: current migration state
> + * @pss: data about the page we want to send
> + * @last_stage: if we are at the completion stage
> + */
> +static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss,
> + bool last_stage)
> +{
> + int tmppages, pages = 0;
> + size_t pagesize_bits =
> + qemu_ram_pagesize(pss->block) >> TARGET_PAGE_BITS;
> +
> + if (ramblock_is_ignored(pss->block)) {
> + error_report("block %s should not be migrated !", pss->block->idstr);
> + return 0;
> + }
> +
> + do {
> + /* Check the pages is dirty and if it is send it */
> + if (!migration_bitmap_clear_dirty(rs, pss->block, pss->page)) {
> + pss->page++;
> + continue;
> + }
> +
> + tmppages = ram_save_target_page(rs, pss, last_stage);
> + if (tmppages < 0) {
> + return tmppages;
> + }
> +
> + pages += tmppages;
> + if (pss->block->unsentmap) {
> + clear_bit(pss->page, pss->block->unsentmap);
> + }
> +
> + pss->page++;
> + } while ((pss->page & (pagesize_bits - 1)) &&
> + offset_in_ramblock(pss->block, pss->page << TARGET_PAGE_BITS));
> +
> + /* The offset we leave with is the last one we looked at */
> + pss->page--;
> + return pages;
> +}
> +
> +/**
> + * ram_find_and_save_block: finds a dirty page and sends it to f
> + *
> + * Called within an RCU critical section.
> + *
> + * Returns the number of pages written where zero means no dirty pages,
> + * or negative on error
> + *
> + * @rs: current RAM state
> + * @last_stage: if we are at the completion stage
> + *
> + * On systems where host-page-size > target-page-size it will send all the
> + * pages in a host page that are dirty.
> + */
> +
> +static int ram_find_and_save_block(RAMState *rs, bool last_stage)
> +{
> + PageSearchStatus pss;
> + int pages = 0;
> + bool again, found;
> +
> + /* No dirty page as there is zero RAM */
> + if (!ram_bytes_total()) {
> + return pages;
> + }
> +
> + pss.block = rs->last_seen_block;
> + pss.page = rs->last_page;
> + pss.complete_round = false;
> +
> + if (!pss.block) {
> + pss.block = QLIST_FIRST_RCU(&ram_list.blocks);
> + }
> +
> + do {
> + again = true;
> + found = get_queued_page(rs, &pss);
> +
> + if (!found) {
> + /* priority queue empty, so just search for something dirty */
> + found = find_dirty_block(rs, &pss, &again);
> + }
> +
> + if (found) {
> + pages = ram_save_host_page(rs, &pss, last_stage);
> + }
> + } while (!pages && again);
> +
> + rs->last_seen_block = pss.block;
> + rs->last_page = pss.page;
> +
> + return pages;
> +}
> +
> +void acct_update_position(QEMUFile *f, size_t size, bool zero)
> +{
> + uint64_t pages = size / TARGET_PAGE_SIZE;
> +
> + if (zero) {
> + ram_counters.duplicate += pages;
> + } else {
> + ram_counters.normal += pages;
> + ram_counters.transferred += size;
> + qemu_update_position(f, size);
> + }
> +}
> +
> +static uint64_t ram_bytes_total_common(bool count_ignored)
> +{
> + RAMBlock *block;
> + uint64_t total = 0;
> +
> + rcu_read_lock();
> + if (count_ignored) {
> + RAMBLOCK_FOREACH_MIGRATABLE(block) {
> + total += block->used_length;
> + }
> + } else {
> + RAMBLOCK_FOREACH_NOT_IGNORED(block) {
> + total += block->used_length;
> + }
> + }
> + rcu_read_unlock();
> + return total;
> +}
> +
> +uint64_t ram_bytes_total(void)
> +{
> + return ram_bytes_total_common(false);
> +}
> +
> +static void xbzrle_load_setup(void)
> +{
> + XBZRLE.decoded_buf = g_malloc(TARGET_PAGE_SIZE);
> +}
> +
> +static void xbzrle_load_cleanup(void)
> +{
> + g_free(XBZRLE.decoded_buf);
> + XBZRLE.decoded_buf = NULL;
> +}
> +
> +static void ram_state_cleanup(RAMState **rsp)
> +{
> + if (*rsp) {
> + migration_page_queue_free(*rsp);
> + qemu_mutex_destroy(&(*rsp)->bitmap_mutex);
> + qemu_mutex_destroy(&(*rsp)->src_page_req_mutex);
> + g_free(*rsp);
> + *rsp = NULL;
> + }
> +}
> +
> +static void xbzrle_cleanup(void)
> +{
> + XBZRLE_cache_lock();
> + if (XBZRLE.cache) {
> + cache_fini(XBZRLE.cache);
> + g_free(XBZRLE.encoded_buf);
> + g_free(XBZRLE.current_buf);
> + g_free(XBZRLE.zero_target_page);
> + XBZRLE.cache = NULL;
> + XBZRLE.encoded_buf = NULL;
> + XBZRLE.current_buf = NULL;
> + XBZRLE.zero_target_page = NULL;
> + }
> + XBZRLE_cache_unlock();
> +}
> +
> +static void ram_save_cleanup(void *opaque)
> +{
> + RAMState **rsp = opaque;
> + RAMBlock *block;
> +
> + /* caller have hold iothread lock or is in a bh, so there is
> + * no writing race against the migration bitmap
> + */
> + memory_global_dirty_log_stop();
> +
> + RAMBLOCK_FOREACH_NOT_IGNORED(block) {
> + g_free(block->bmap);
> + block->bmap = NULL;
> + g_free(block->unsentmap);
> + block->unsentmap = NULL;
> + }
> +
> + xbzrle_cleanup();
> + compress_threads_save_cleanup();
> + ram_state_cleanup(rsp);
> +}
> +
> +static void ram_state_reset(RAMState *rs)
> +{
> + rs->last_seen_block = NULL;
> + rs->last_sent_block = NULL;
> + rs->last_page = 0;
> + rs->last_version = ram_list.version;
> + rs->ram_bulk_stage = true;
> + rs->fpo_enabled = false;
> +}
> +
> +#define MAX_WAIT 50 /* ms, half buffered_file limit */
> +
> +/*
> + * 'expected' is the value you expect the bitmap mostly to be full
> + * of; it won't bother printing lines that are all this value.
> + * If 'todump' is null the migration bitmap is dumped.
> + */
> +void ram_debug_dump_bitmap(unsigned long *todump, bool expected,
> + unsigned long pages)
> +{
> + int64_t cur;
> + int64_t linelen = 128;
> + char linebuf[129];
> +
> + for (cur = 0; cur < pages; cur += linelen) {
> + int64_t curb;
> + bool found = false;
> + /*
> + * Last line; catch the case where the line length
> + * is longer than remaining ram
> + */
> + if (cur + linelen > pages) {
> + linelen = pages - cur;
> + }
> + for (curb = 0; curb < linelen; curb++) {
> + bool thisbit = test_bit(cur + curb, todump);
> + linebuf[curb] = thisbit ? '1' : '.';
> + found = found || (thisbit != expected);
> + }
> + if (found) {
> + linebuf[curb] = '\0';
> + fprintf(stderr, "0x%08" PRIx64 " : %s\n", cur, linebuf);
> + }
> + }
> +}
> +
> +/* **** functions for postcopy ***** */
> +
> +void ram_postcopy_migrated_memory_release(MigrationState *ms)
> +{
> + struct RAMBlock *block;
> +
> + RAMBLOCK_FOREACH_NOT_IGNORED(block) {
> + unsigned long *bitmap = block->bmap;
> + unsigned long range = block->used_length >> TARGET_PAGE_BITS;
> + unsigned long run_start = find_next_zero_bit(bitmap, range, 0);
> +
> + while (run_start < range) {
> + unsigned long run_end = find_next_bit(bitmap, range, run_start +
> 1);
> + ram_discard_range(block->idstr, run_start << TARGET_PAGE_BITS,
> + (run_end - run_start) << TARGET_PAGE_BITS);
> + run_start = find_next_zero_bit(bitmap, range, run_end + 1);
> + }
> + }
> +}
> +
> +/**
> + * postcopy_send_discard_bm_ram: discard a RAMBlock
> + *
> + * Returns zero on success
> + *
> + * Callback from postcopy_each_ram_send_discard for each RAMBlock
> + * Note: At this point the 'unsentmap' is the processed bitmap combined
> + * with the dirtymap; so a '1' means it's either dirty or unsent.
> + *
> + * @ms: current migration state
> + * @pds: state for postcopy
> + * @start: RAMBlock starting page
> + * @length: RAMBlock size
> + */
> +static int postcopy_send_discard_bm_ram(MigrationState *ms,
> + PostcopyDiscardState *pds,
> + RAMBlock *block)
> +{
> + unsigned long end = block->used_length >> TARGET_PAGE_BITS;
> + unsigned long current;
> + unsigned long *unsentmap = block->unsentmap;
> +
> + for (current = 0; current < end; ) {
> + unsigned long one = find_next_bit(unsentmap, end, current);
> +
> + if (one <= end) {
> + unsigned long zero = find_next_zero_bit(unsentmap, end, one + 1);
> + unsigned long discard_length;
> +
> + if (zero >= end) {
> + discard_length = end - one;
> + } else {
> + discard_length = zero - one;
> + }
> + if (discard_length) {
> + postcopy_discard_send_range(ms, pds, one, discard_length);
> + }
> + current = one + discard_length;
> + } else {
> + current = one;
> + }
> + }
> +
> + return 0;
> +}
> +
> +/**
> + * postcopy_each_ram_send_discard: discard all RAMBlocks
> + *
> + * Returns 0 for success or negative for error
> + *
> + * Utility for the outgoing postcopy code.
> + * Calls postcopy_send_discard_bm_ram for each RAMBlock
> + * passing it bitmap indexes and name.
> + * (qemu_ram_foreach_block ends up passing unscaled lengths
> + * which would mean postcopy code would have to deal with target page)
> + *
> + * @ms: current migration state
> + */
> +static int postcopy_each_ram_send_discard(MigrationState *ms)
> +{
> + struct RAMBlock *block;
> + int ret;
> +
> + RAMBLOCK_FOREACH_NOT_IGNORED(block) {
> + PostcopyDiscardState *pds =
> + postcopy_discard_send_init(ms, block->idstr);
> +
> + /*
> + * Postcopy sends chunks of bitmap over the wire, but it
> + * just needs indexes at this point, avoids it having
> + * target page specific code.
> + */
> + ret = postcopy_send_discard_bm_ram(ms, pds, block);
> + postcopy_discard_send_finish(ms, pds);
> + if (ret) {
> + return ret;
> + }
> + }
> +
> + return 0;
> +}
> +
> +/**
> + * postcopy_chunk_hostpages_pass: canocalize bitmap in hostpages
> + *
> + * Helper for postcopy_chunk_hostpages; it's called twice to
> + * canonicalize the two bitmaps, that are similar, but one is
> + * inverted.
> + *
> + * Postcopy requires that all target pages in a hostpage are dirty or
> + * clean, not a mix. This function canonicalizes the bitmaps.
> + *
> + * @ms: current migration state
> + * @unsent_pass: if true we need to canonicalize partially unsent host pages
> + * otherwise we need to canonicalize partially dirty host pages
> + * @block: block that contains the page we want to canonicalize
> + * @pds: state for postcopy
> + */
> +static void postcopy_chunk_hostpages_pass(MigrationState *ms, bool
> unsent_pass,
> + RAMBlock *block,
> + PostcopyDiscardState *pds)
> +{
> + RAMState *rs = ram_state;
> + unsigned long *bitmap = block->bmap;
> + unsigned long *unsentmap = block->unsentmap;
> + unsigned int host_ratio = block->page_size / TARGET_PAGE_SIZE;
> + unsigned long pages = block->used_length >> TARGET_PAGE_BITS;
> + unsigned long run_start;
> +
> + if (block->page_size == TARGET_PAGE_SIZE) {
> + /* Easy case - TPS==HPS for a non-huge page RAMBlock */
> + return;
> + }
> +
> + if (unsent_pass) {
> + /* Find a sent page */
> + run_start = find_next_zero_bit(unsentmap, pages, 0);
> + } else {
> + /* Find a dirty page */
> + run_start = find_next_bit(bitmap, pages, 0);
> + }
> +
> + while (run_start < pages) {
> + bool do_fixup = false;
> + unsigned long fixup_start_addr;
> + unsigned long host_offset;
> +
> + /*
> + * If the start of this run of pages is in the middle of a host
> + * page, then we need to fixup this host page.
> + */
> + host_offset = run_start % host_ratio;
> + if (host_offset) {
> + do_fixup = true;
> + run_start -= host_offset;
> + fixup_start_addr = run_start;
> + /* For the next pass */
> + run_start = run_start + host_ratio;
> + } else {
> + /* Find the end of this run */
> + unsigned long run_end;
> + if (unsent_pass) {
> + run_end = find_next_bit(unsentmap, pages, run_start + 1);
> + } else {
> + run_end = find_next_zero_bit(bitmap, pages, run_start + 1);
> + }
> + /*
> + * If the end isn't at the start of a host page, then the
> + * run doesn't finish at the end of a host page
> + * and we need to discard.
> + */
> + host_offset = run_end % host_ratio;
> + if (host_offset) {
> + do_fixup = true;
> + fixup_start_addr = run_end - host_offset;
> + /*
> + * This host page has gone, the next loop iteration starts
> + * from after the fixup
> + */
> + run_start = fixup_start_addr + host_ratio;
> + } else {
> + /*
> + * No discards on this iteration, next loop starts from
> + * next sent/dirty page
> + */
> + run_start = run_end + 1;
> + }
> + }
> +
> + if (do_fixup) {
> + unsigned long page;
> +
> + /* Tell the destination to discard this page */
> + if (unsent_pass || !test_bit(fixup_start_addr, unsentmap)) {
> + /* For the unsent_pass we:
> + * discard partially sent pages
> + * For the !unsent_pass (dirty) we:
> + * discard partially dirty pages that were sent
> + * (any partially sent pages were already discarded
> + * by the previous unsent_pass)
> + */
> + postcopy_discard_send_range(ms, pds, fixup_start_addr,
> + host_ratio);
> + }
> +
> + /* Clean up the bitmap */
> + for (page = fixup_start_addr;
> + page < fixup_start_addr + host_ratio; page++) {
> + /* All pages in this host page are now not sent */
> + set_bit(page, unsentmap);
> +
> + /*
> + * Remark them as dirty, updating the count for any pages
> + * that weren't previously dirty.
> + */
> + rs->migration_dirty_pages += !test_and_set_bit(page, bitmap);
> + }
> + }
> +
> + if (unsent_pass) {
> + /* Find the next sent page for the next iteration */
> + run_start = find_next_zero_bit(unsentmap, pages, run_start);
> + } else {
> + /* Find the next dirty page for the next iteration */
> + run_start = find_next_bit(bitmap, pages, run_start);
> + }
> + }
> +}
> +
> +/**
> + * postcopy_chuck_hostpages: discrad any partially sent host page
> + *
> + * Utility for the outgoing postcopy code.
> + *
> + * Discard any partially sent host-page size chunks, mark any partially
> + * dirty host-page size chunks as all dirty. In this case the host-page
> + * is the host-page for the particular RAMBlock, i.e. it might be a huge page
> + *
> + * Returns zero on success
> + *
> + * @ms: current migration state
> + * @block: block we want to work with
> + */
> +static int postcopy_chunk_hostpages(MigrationState *ms, RAMBlock *block)
> +{
> + PostcopyDiscardState *pds =
> + postcopy_discard_send_init(ms, block->idstr);
> +
> + /* First pass: Discard all partially sent host pages */
> + postcopy_chunk_hostpages_pass(ms, true, block, pds);
> + /*
> + * Second pass: Ensure that all partially dirty host pages are made
> + * fully dirty.
> + */
> + postcopy_chunk_hostpages_pass(ms, false, block, pds);
> +
> + postcopy_discard_send_finish(ms, pds);
> + return 0;
> +}
> +
> +/**
> + * ram_postcopy_send_discard_bitmap: transmit the discard bitmap
> + *
> + * Returns zero on success
> + *
> + * Transmit the set of pages to be discarded after precopy to the target
> + * these are pages that:
> + * a) Have been previously transmitted but are now dirty again
> + * b) Pages that have never been transmitted, this ensures that
> + * any pages on the destination that have been mapped by background
> + * tasks get discarded (transparent huge pages is the specific
> concern)
> + * Hopefully this is pretty sparse
> + *
> + * @ms: current migration state
> + */
> +int ram_postcopy_send_discard_bitmap(MigrationState *ms)
> +{
> + RAMState *rs = ram_state;
> + RAMBlock *block;
> + int ret;
> +
> + rcu_read_lock();
> +
> + /* This should be our last sync, the src is now paused */
> + migration_bitmap_sync(rs);
> +
> + /* Easiest way to make sure we don't resume in the middle of a host-page
> */
> + rs->last_seen_block = NULL;
> + rs->last_sent_block = NULL;
> + rs->last_page = 0;
> +
> + RAMBLOCK_FOREACH_NOT_IGNORED(block) {
> + unsigned long pages = block->used_length >> TARGET_PAGE_BITS;
> + unsigned long *bitmap = block->bmap;
> + unsigned long *unsentmap = block->unsentmap;
> +
> + if (!unsentmap) {
> + /* We don't have a safe way to resize the sentmap, so
> + * if the bitmap was resized it will be NULL at this
> + * point.
> + */
> + error_report("migration ram resized during precopy phase");
> + rcu_read_unlock();
> + return -EINVAL;
> + }
> + /* Deal with TPS != HPS and huge pages */
> + ret = postcopy_chunk_hostpages(ms, block);
> + if (ret) {
> + rcu_read_unlock();
> + return ret;
> + }
> +
> + /*
> + * Update the unsentmap to be unsentmap = unsentmap | dirty
> + */
> + bitmap_or(unsentmap, unsentmap, bitmap, pages);
> +#ifdef DEBUG_POSTCOPY
> + ram_debug_dump_bitmap(unsentmap, true, pages);
> +#endif
> + }
> + trace_ram_postcopy_send_discard_bitmap();
> +
> + ret = postcopy_each_ram_send_discard(ms);
> + rcu_read_unlock();
> +
> + return ret;
> +}
> +
> +/**
> + * ram_discard_range: discard dirtied pages at the beginning of postcopy
> + *
> + * Returns zero on success
> + *
> + * @rbname: name of the RAMBlock of the request. NULL means the
> + * same that last one.
> + * @start: RAMBlock starting page
> + * @length: RAMBlock size
> + */
> +int ram_discard_range(const char *rbname, uint64_t start, size_t length)
> +{
> + int ret = -1;
> +
> + trace_ram_discard_range(rbname, start, length);
> +
> + rcu_read_lock();
> + RAMBlock *rb = qemu_ram_block_by_name(rbname);
> +
> + if (!rb) {
> + error_report("ram_discard_range: Failed to find block '%s'", rbname);
> + goto err;
> + }
> +
> + /*
> + * On source VM, we don't need to update the received bitmap since
> + * we don't even have one.
> + */
> + if (rb->receivedmap) {
> + bitmap_clear(rb->receivedmap, start >> qemu_target_page_bits(),
> + length >> qemu_target_page_bits());
> + }
> +
> + ret = ram_block_discard_range(rb, start, length);
> +
> +err:
> + rcu_read_unlock();
> +
> + return ret;
> +}
> +
> +/*
> + * For every allocation, we will try not to crash the VM if the
> + * allocation failed.
> + */
> +static int xbzrle_init(void)
> +{
> + Error *local_err = NULL;
> +
> + if (!migrate_use_xbzrle()) {
> + return 0;
> + }
> +
> + XBZRLE_cache_lock();
> +
> + XBZRLE.zero_target_page = g_try_malloc0(TARGET_PAGE_SIZE);
> + if (!XBZRLE.zero_target_page) {
> + error_report("%s: Error allocating zero page", __func__);
> + goto err_out;
> + }
> +
> + XBZRLE.cache = cache_init(migrate_xbzrle_cache_size(),
> + TARGET_PAGE_SIZE, &local_err);
> + if (!XBZRLE.cache) {
> + error_report_err(local_err);
> + goto free_zero_page;
> + }
> +
> + XBZRLE.encoded_buf = g_try_malloc0(TARGET_PAGE_SIZE);
> + if (!XBZRLE.encoded_buf) {
> + error_report("%s: Error allocating encoded_buf", __func__);
> + goto free_cache;
> + }
> +
> + XBZRLE.current_buf = g_try_malloc(TARGET_PAGE_SIZE);
> + if (!XBZRLE.current_buf) {
> + error_report("%s: Error allocating current_buf", __func__);
> + goto free_encoded_buf;
> + }
> +
> + /* We are all good */
> + XBZRLE_cache_unlock();
> + return 0;
> +
> +free_encoded_buf:
> + g_free(XBZRLE.encoded_buf);
> + XBZRLE.encoded_buf = NULL;
> +free_cache:
> + cache_fini(XBZRLE.cache);
> + XBZRLE.cache = NULL;
> +free_zero_page:
> + g_free(XBZRLE.zero_target_page);
> + XBZRLE.zero_target_page = NULL;
> +err_out:
> + XBZRLE_cache_unlock();
> + return -ENOMEM;
> +}
> +
> +static int ram_state_init(RAMState **rsp)
> +{
> + *rsp = g_try_new0(RAMState, 1);
> +
> + if (!*rsp) {
> + error_report("%s: Init ramstate fail", __func__);
> + return -1;
> + }
> +
> + qemu_mutex_init(&(*rsp)->bitmap_mutex);
> + qemu_mutex_init(&(*rsp)->src_page_req_mutex);
> + QSIMPLEQ_INIT(&(*rsp)->src_page_requests);
> +
> + /*
> + * This must match with the initial values of dirty bitmap.
> + * Currently we initialize the dirty bitmap to all zeros so
> + * here the total dirty page count is zero.
> + */
> + (*rsp)->migration_dirty_pages = 0;
> + ram_state_reset(*rsp);
> +
> + return 0;
> +}
> +
> +static void ram_list_init_bitmaps(void)
> +{
> + RAMBlock *block;
> + unsigned long pages;
> +
> + /* Skip setting bitmap if there is no RAM */
> + if (ram_bytes_total()) {
> + RAMBLOCK_FOREACH_NOT_IGNORED(block) {
> + pages = block->max_length >> TARGET_PAGE_BITS;
> + /*
> + * The initial dirty bitmap for migration must be set with all
> + * ones to make sure we'll migrate every guest RAM page to
> + * destination.
> + * Here we didn't set RAMBlock.bmap simply because it is already
> + * set in ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] in
> + * ram_block_add, and that's where we'll sync the dirty bitmaps.
> + * Here setting RAMBlock.bmap would be fine too but not
> necessary.
> + */
> + block->bmap = bitmap_new(pages);
> + if (migrate_postcopy_ram()) {
> + block->unsentmap = bitmap_new(pages);
> + bitmap_set(block->unsentmap, 0, pages);
> + }
> + }
> + }
> +}
> +
> +static void ram_init_bitmaps(RAMState *rs)
> +{
> + /* For memory_global_dirty_log_start below. */
> + qemu_mutex_lock_iothread();
> + qemu_mutex_lock_ramlist();
> + rcu_read_lock();
> +
> + ram_list_init_bitmaps();
> + memory_global_dirty_log_start();
> + migration_bitmap_sync_precopy(rs);
> +
> + rcu_read_unlock();
> + qemu_mutex_unlock_ramlist();
> + qemu_mutex_unlock_iothread();
> +}
> +
> +static int ram_init_all(RAMState **rsp)
> +{
> + if (ram_state_init(rsp)) {
> + return -1;
> + }
> +
> + if (xbzrle_init()) {
> + ram_state_cleanup(rsp);
> + return -1;
> + }
> +
> + ram_init_bitmaps(*rsp);
> +
> + return 0;
> +}
> +
> +static void ram_state_resume_prepare(RAMState *rs, QEMUFile *out)
> +{
> + RAMBlock *block;
> + uint64_t pages = 0;
> +
> + /*
> + * Postcopy is not using xbzrle/compression, so no need for that.
> + * Also, since source are already halted, we don't need to care
> + * about dirty page logging as well.
> + */
> +
> + RAMBLOCK_FOREACH_NOT_IGNORED(block) {
> + pages += bitmap_count_one(block->bmap,
> + block->used_length >> TARGET_PAGE_BITS);
> + }
> +
> + /* This may not be aligned with current bitmaps. Recalculate. */
> + rs->migration_dirty_pages = pages;
> +
> + rs->last_seen_block = NULL;
> + rs->last_sent_block = NULL;
> + rs->last_page = 0;
> + rs->last_version = ram_list.version;
> + /*
> + * Disable the bulk stage, otherwise we'll resend the whole RAM no
> + * matter what we have sent.
> + */
> + rs->ram_bulk_stage = false;
> +
> + /* Update RAMState cache of output QEMUFile */
> + rs->f = out;
> +
> + trace_ram_state_resume_prepare(pages);
> +}
> +
> +/*
> + * This function clears bits of the free pages reported by the caller from
> the
> + * migration dirty bitmap. @addr is the host address corresponding to the
> + * start of the continuous guest free pages, and @len is the total bytes of
> + * those pages.
> + */
> +void qemu_guest_free_page_hint(void *addr, size_t len)
> +{
> + RAMBlock *block;
> + ram_addr_t offset;
> + size_t used_len, start, npages;
> + MigrationState *s = migrate_get_current();
> +
> + /* This function is currently expected to be used during live migration
> */
> + if (!migration_is_setup_or_active(s->state)) {
> + return;
> + }
> +
> + for (; len > 0; len -= used_len, addr += used_len) {
> + block = qemu_ram_block_from_host(addr, false, &offset);
> + if (unlikely(!block || offset >= block->used_length)) {
> + /*
> + * The implementation might not support RAMBlock resize during
> + * live migration, but it could happen in theory with future
> + * updates. So we add a check here to capture that case.
> + */
> + error_report_once("%s unexpected error", __func__);
> + return;
> + }
> +
> + if (len <= block->used_length - offset) {
> + used_len = len;
> + } else {
> + used_len = block->used_length - offset;
> + }
> +
> + start = offset >> TARGET_PAGE_BITS;
> + npages = used_len >> TARGET_PAGE_BITS;
> +
> + qemu_mutex_lock(&ram_state->bitmap_mutex);
> + ram_state->migration_dirty_pages -=
> + bitmap_count_one_with_offset(block->bmap, start,
> npages);
> + bitmap_clear(block->bmap, start, npages);
> + qemu_mutex_unlock(&ram_state->bitmap_mutex);
> + }
> +}
> +
> +/*
> + * Each of ram_save_setup, ram_save_iterate and ram_save_complete has
> + * long-running RCU critical section. When rcu-reclaims in the code
> + * start to become numerous it will be necessary to reduce the
> + * granularity of these critical sections.
> + */
> +
> +/**
> + * ram_save_setup: Setup RAM for migration
> + *
> + * Returns zero to indicate success and negative for error
> + *
> + * @f: QEMUFile where to send the data
> + * @opaque: RAMState pointer
> + */
> +static int ram_save_setup(QEMUFile *f, void *opaque)
> +{
> + RAMState **rsp = opaque;
> + RAMBlock *block;
> +
> + if (compress_threads_save_setup()) {
> + return -1;
> + }
> +
> + /* migration has already setup the bitmap, reuse it. */
> + if (!migration_in_colo_state()) {
> + if (ram_init_all(rsp) != 0) {
> + compress_threads_save_cleanup();
> + return -1;
> + }
> + }
> + (*rsp)->f = f;
> +
> + rcu_read_lock();
> +
> + qemu_put_be64(f, ram_bytes_total_common(true) | RAM_SAVE_FLAG_MEM_SIZE);
> +
> + RAMBLOCK_FOREACH_MIGRATABLE(block) {
> + qemu_put_byte(f, strlen(block->idstr));
> + qemu_put_buffer(f, (uint8_t *)block->idstr, strlen(block->idstr));
> + qemu_put_be64(f, block->used_length);
> + if (migrate_postcopy_ram() && block->page_size !=
> qemu_host_page_size) {
> + qemu_put_be64(f, block->page_size);
> + }
> + if (migrate_ignore_shared()) {
> + qemu_put_be64(f, block->mr->addr);
> + qemu_put_byte(f, ramblock_is_ignored(block) ? 1 : 0);
> + }
> + }
> +
> + rcu_read_unlock();
> +
> + ram_control_before_iterate(f, RAM_CONTROL_SETUP);
> + ram_control_after_iterate(f, RAM_CONTROL_SETUP);
> +
> + multifd_send_sync_main();
> + qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
> + qemu_fflush(f);
> +
> + return 0;
> +}
> +
> +/**
> + * ram_save_iterate: iterative stage for migration
> + *
> + * Returns zero to indicate success and negative for error
> + *
> + * @f: QEMUFile where to send the data
> + * @opaque: RAMState pointer
> + */
> +static int ram_save_iterate(QEMUFile *f, void *opaque)
> +{
> + RAMState **temp = opaque;
> + RAMState *rs = *temp;
> + int ret;
> + int i;
> + int64_t t0;
> + int done = 0;
> +
> + if (blk_mig_bulk_active()) {
> + /* Avoid transferring ram during bulk phase of block migration as
> + * the bulk phase will usually take a long time and transferring
> + * ram updates during that time is pointless. */
> + goto out;
> + }
> +
> + rcu_read_lock();
> + if (ram_list.version != rs->last_version) {
> + ram_state_reset(rs);
> + }
> +
> + /* Read version before ram_list.blocks */
> + smp_rmb();
> +
> + ram_control_before_iterate(f, RAM_CONTROL_ROUND);
> +
> + t0 = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
> + i = 0;
> + while ((ret = qemu_file_rate_limit(f)) == 0 ||
> + !QSIMPLEQ_EMPTY(&rs->src_page_requests)) {
> + int pages;
> +
> + if (qemu_file_get_error(f)) {
> + break;
> + }
> +
> + pages = ram_find_and_save_block(rs, false);
> + /* no more pages to sent */
> + if (pages == 0) {
> + done = 1;
> + break;
> + }
> +
> + if (pages < 0) {
> + qemu_file_set_error(f, pages);
> + break;
> + }
> +
> + rs->target_page_count += pages;
> +
> + /* we want to check in the 1st loop, just in case it was the 1st time
> + and we had to sync the dirty bitmap.
> + qemu_clock_get_ns() is a bit expensive, so we only check each some
> + iterations
> + */
> + if ((i & 63) == 0) {
> + uint64_t t1 = (qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - t0) /
> 1000000;
> + if (t1 > MAX_WAIT) {
> + trace_ram_save_iterate_big_wait(t1, i);
> + break;
> + }
> + }
> + i++;
> + }
> + rcu_read_unlock();
> +
> + /*
> + * Must occur before EOS (or any QEMUFile operation)
> + * because of RDMA protocol.
> + */
> + ram_control_after_iterate(f, RAM_CONTROL_ROUND);
> +
> +out:
> + multifd_send_sync_main();
> + qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
> + qemu_fflush(f);
> + ram_counters.transferred += 8;
> +
> + ret = qemu_file_get_error(f);
> + if (ret < 0) {
> + return ret;
> + }
> +
> + return done;
> +}
> +
> +/**
> + * ram_save_complete: function called to send the remaining amount of ram
> + *
> + * Returns zero to indicate success or negative on error
> + *
> + * Called with iothread lock
> + *
> + * @f: QEMUFile where to send the data
> + * @opaque: RAMState pointer
> + */
> +static int ram_save_complete(QEMUFile *f, void *opaque)
> +{
> + RAMState **temp = opaque;
> + RAMState *rs = *temp;
> + int ret = 0;
> +
> + rcu_read_lock();
> +
> + if (!migration_in_postcopy()) {
> + migration_bitmap_sync_precopy(rs);
> + }
> +
> + ram_control_before_iterate(f, RAM_CONTROL_FINISH);
> +
> + /* try transferring iterative blocks of memory */
> +
> + /* flush all remaining blocks regardless of rate limiting */
> + while (true) {
> + int pages;
> +
> + pages = ram_find_and_save_block(rs, !migration_in_colo_state());
> + /* no more blocks to sent */
> + if (pages == 0) {
> + break;
> + }
> + if (pages < 0) {
> + ret = pages;
> + break;
> + }
> + }
> +
> + flush_compressed_data(rs);
> + ram_control_after_iterate(f, RAM_CONTROL_FINISH);
> +
> + rcu_read_unlock();
> +
> + multifd_send_sync_main();
> + qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
> + qemu_fflush(f);
> +
> + return ret;
> +}
> +
> +static void ram_save_pending(QEMUFile *f, void *opaque, uint64_t max_size,
> + uint64_t *res_precopy_only,
> + uint64_t *res_compatible,
> + uint64_t *res_postcopy_only)
> +{
> + RAMState **temp = opaque;
> + RAMState *rs = *temp;
> + uint64_t remaining_size;
> +
> + remaining_size = rs->migration_dirty_pages * TARGET_PAGE_SIZE;
> +
> + if (!migration_in_postcopy() &&
> + remaining_size < max_size) {
> + qemu_mutex_lock_iothread();
> + rcu_read_lock();
> + migration_bitmap_sync_precopy(rs);
> + rcu_read_unlock();
> + qemu_mutex_unlock_iothread();
> + remaining_size = rs->migration_dirty_pages * TARGET_PAGE_SIZE;
> + }
> +
> + if (migrate_postcopy_ram()) {
> + /* We can do postcopy, and all the data is postcopiable */
> + *res_compatible += remaining_size;
> + } else {
> + *res_precopy_only += remaining_size;
> + }
> +}
> +
> +static int load_xbzrle(QEMUFile *f, ram_addr_t addr, void *host)
> +{
> + unsigned int xh_len;
> + int xh_flags;
> + uint8_t *loaded_data;
> +
> + /* extract RLE header */
> + xh_flags = qemu_get_byte(f);
> + xh_len = qemu_get_be16(f);
> +
> + if (xh_flags != ENCODING_FLAG_XBZRLE) {
> + error_report("Failed to load XBZRLE page - wrong compression!");
> + return -1;
> + }
> +
> + if (xh_len > TARGET_PAGE_SIZE) {
> + error_report("Failed to load XBZRLE page - len overflow!");
> + return -1;
> + }
> + loaded_data = XBZRLE.decoded_buf;
> + /* load data and decode */
> + /* it can change loaded_data to point to an internal buffer */
> + qemu_get_buffer_in_place(f, &loaded_data, xh_len);
> +
> + /* decode RLE */
> + if (xbzrle_decode_buffer(loaded_data, xh_len, host,
> + TARGET_PAGE_SIZE) == -1) {
> + error_report("Failed to load XBZRLE page - decode error!");
> + return -1;
> + }
> +
> + return 0;
> +}
> +
> +/**
> + * ram_block_from_stream: read a RAMBlock id from the migration stream
> + *
> + * Must be called from within a rcu critical section.
> + *
> + * Returns a pointer from within the RCU-protected ram_list.
> + *
> + * @f: QEMUFile where to read the data from
> + * @flags: Page flags (mostly to see if it's a continuation of previous
> block)
> + */
> +static inline RAMBlock *ram_block_from_stream(QEMUFile *f, int flags)
> +{
> + static RAMBlock *block = NULL;
> + char id[256];
> + uint8_t len;
> +
> + if (flags & RAM_SAVE_FLAG_CONTINUE) {
> + if (!block) {
> + error_report("Ack, bad migration stream!");
> + return NULL;
> + }
> + return block;
> + }
> +
> + len = qemu_get_byte(f);
> + qemu_get_buffer(f, (uint8_t *)id, len);
> + id[len] = 0;
> +
> + block = qemu_ram_block_by_name(id);
> + if (!block) {
> + error_report("Can't find block %s", id);
> + return NULL;
> + }
> +
> + if (ramblock_is_ignored(block)) {
> + error_report("block %s should not be migrated !", id);
> + return NULL;
> + }
> +
> + return block;
> +}
> +
> +static inline void *host_from_ram_block_offset(RAMBlock *block,
> + ram_addr_t offset)
> +{
> + if (!offset_in_ramblock(block, offset)) {
> + return NULL;
> + }
> +
> + return block->host + offset;
> +}
> +
> +static inline void *colo_cache_from_block_offset(RAMBlock *block,
> + ram_addr_t offset)
> +{
> + if (!offset_in_ramblock(block, offset)) {
> + return NULL;
> + }
> + if (!block->colo_cache) {
> + error_report("%s: colo_cache is NULL in block :%s",
> + __func__, block->idstr);
> + return NULL;
> + }
> +
> + /*
> + * During colo checkpoint, we need bitmap of these migrated pages.
> + * It help us to decide which pages in ram cache should be flushed
> + * into VM's RAM later.
> + */
> + if (!test_and_set_bit(offset >> TARGET_PAGE_BITS, block->bmap)) {
> + ram_state->migration_dirty_pages++;
> + }
> + return block->colo_cache + offset;
> +}
> +
> +/**
> + * ram_handle_compressed: handle the zero page case
> + *
> + * If a page (or a whole RDMA chunk) has been
> + * determined to be zero, then zap it.
> + *
> + * @host: host address for the zero page
> + * @ch: what the page is filled from. We only support zero
> + * @size: size of the zero page
> + */
> +void ram_handle_compressed(void *host, uint8_t ch, uint64_t size)
> +{
> + if (ch != 0 || !is_zero_range(host, size)) {
> + memset(host, ch, size);
> + }
> +}
> +
> +/* return the size after decompression, or negative value on error */
> +static int
> +qemu_uncompress_data(z_stream *stream, uint8_t *dest, size_t dest_len,
> + const uint8_t *source, size_t source_len)
> +{
> + int err;
> +
> + err = inflateReset(stream);
> + if (err != Z_OK) {
> + return -1;
> + }
> +
> + stream->avail_in = source_len;
> + stream->next_in = (uint8_t *)source;
> + stream->avail_out = dest_len;
> + stream->next_out = dest;
> +
> + err = inflate(stream, Z_NO_FLUSH);
> + if (err != Z_STREAM_END) {
> + return -1;
> + }
> +
> + return stream->total_out;
> +}
> +
> +static void *do_data_decompress(void *opaque)
> +{
> + DecompressParam *param = opaque;
> + unsigned long pagesize;
> + uint8_t *des;
> + int len, ret;
> +
> + qemu_mutex_lock(¶m->mutex);
> + while (!param->quit) {
> + if (param->des) {
> + des = param->des;
> + len = param->len;
> + param->des = 0;
> + qemu_mutex_unlock(¶m->mutex);
> +
> + pagesize = TARGET_PAGE_SIZE;
> +
> + ret = qemu_uncompress_data(¶m->stream, des, pagesize,
> + param->compbuf, len);
> + if (ret < 0 && migrate_get_current()->decompress_error_check) {
> + error_report("decompress data failed");
> + qemu_file_set_error(decomp_file, ret);
> + }
> +
> + qemu_mutex_lock(&decomp_done_lock);
> + param->done = true;
> + qemu_cond_signal(&decomp_done_cond);
> + qemu_mutex_unlock(&decomp_done_lock);
> +
> + qemu_mutex_lock(¶m->mutex);
> + } else {
> + qemu_cond_wait(¶m->cond, ¶m->mutex);
> + }
> + }
> + qemu_mutex_unlock(¶m->mutex);
> +
> + return NULL;
> +}
> +
> +static int wait_for_decompress_done(void)
> +{
> + int idx, thread_count;
> +
> + if (!migrate_use_compression()) {
> + return 0;
> + }
> +
> + thread_count = migrate_decompress_threads();
> + qemu_mutex_lock(&decomp_done_lock);
> + for (idx = 0; idx < thread_count; idx++) {
> + while (!decomp_param[idx].done) {
> + qemu_cond_wait(&decomp_done_cond, &decomp_done_lock);
> + }
> + }
> + qemu_mutex_unlock(&decomp_done_lock);
> + return qemu_file_get_error(decomp_file);
> +}
> +
> +static void compress_threads_load_cleanup(void)
> +{
> + int i, thread_count;
> +
> + if (!migrate_use_compression()) {
> + return;
> + }
> + thread_count = migrate_decompress_threads();
> + for (i = 0; i < thread_count; i++) {
> + /*
> + * we use it as a indicator which shows if the thread is
> + * properly init'd or not
> + */
> + if (!decomp_param[i].compbuf) {
> + break;
> + }
> +
> + qemu_mutex_lock(&decomp_param[i].mutex);
> + decomp_param[i].quit = true;
> + qemu_cond_signal(&decomp_param[i].cond);
> + qemu_mutex_unlock(&decomp_param[i].mutex);
> + }
> + for (i = 0; i < thread_count; i++) {
> + if (!decomp_param[i].compbuf) {
> + break;
> + }
> +
> + qemu_thread_join(decompress_threads + i);
> + qemu_mutex_destroy(&decomp_param[i].mutex);
> + qemu_cond_destroy(&decomp_param[i].cond);
> + inflateEnd(&decomp_param[i].stream);
> + g_free(decomp_param[i].compbuf);
> + decomp_param[i].compbuf = NULL;
> + }
> + g_free(decompress_threads);
> + g_free(decomp_param);
> + decompress_threads = NULL;
> + decomp_param = NULL;
> + decomp_file = NULL;
> +}
> +
> +static int compress_threads_load_setup(QEMUFile *f)
> +{
> + int i, thread_count;
> +
> + if (!migrate_use_compression()) {
> + return 0;
> + }
> +
> + thread_count = migrate_decompress_threads();
> + decompress_threads = g_new0(QemuThread, thread_count);
> + decomp_param = g_new0(DecompressParam, thread_count);
> + qemu_mutex_init(&decomp_done_lock);
> + qemu_cond_init(&decomp_done_cond);
> + decomp_file = f;
> + for (i = 0; i < thread_count; i++) {
> + if (inflateInit(&decomp_param[i].stream) != Z_OK) {
> + goto exit;
> + }
> +
> + decomp_param[i].compbuf = g_malloc0(compressBound(TARGET_PAGE_SIZE));
> + qemu_mutex_init(&decomp_param[i].mutex);
> + qemu_cond_init(&decomp_param[i].cond);
> + decomp_param[i].done = true;
> + decomp_param[i].quit = false;
> + qemu_thread_create(decompress_threads + i, "decompress",
> + do_data_decompress, decomp_param + i,
> + QEMU_THREAD_JOINABLE);
> + }
> + return 0;
> +exit:
> + compress_threads_load_cleanup();
> + return -1;
> +}
> +
> +static void decompress_data_with_multi_threads(QEMUFile *f,
> + void *host, int len)
> +{
> + int idx, thread_count;
> +
> + thread_count = migrate_decompress_threads();
> + qemu_mutex_lock(&decomp_done_lock);
> + while (true) {
> + for (idx = 0; idx < thread_count; idx++) {
> + if (decomp_param[idx].done) {
> + decomp_param[idx].done = false;
> + qemu_mutex_lock(&decomp_param[idx].mutex);
> + qemu_get_buffer(f, decomp_param[idx].compbuf, len);
> + decomp_param[idx].des = host;
> + decomp_param[idx].len = len;
> + qemu_cond_signal(&decomp_param[idx].cond);
> + qemu_mutex_unlock(&decomp_param[idx].mutex);
> + break;
> + }
> + }
> + if (idx < thread_count) {
> + break;
> + } else {
> + qemu_cond_wait(&decomp_done_cond, &decomp_done_lock);
> + }
> + }
> + qemu_mutex_unlock(&decomp_done_lock);
> +}
> +
> +/*
> + * colo cache: this is for secondary VM, we cache the whole
> + * memory of the secondary VM, it is need to hold the global lock
> + * to call this helper.
> + */
> +int colo_init_ram_cache(void)
> +{
> + RAMBlock *block;
> +
> + rcu_read_lock();
> + RAMBLOCK_FOREACH_NOT_IGNORED(block) {
> + block->colo_cache = qemu_anon_ram_alloc(block->used_length,
> + NULL,
> + false);
> + if (!block->colo_cache) {
> + error_report("%s: Can't alloc memory for COLO cache of block %s,"
> + "size 0x" RAM_ADDR_FMT, __func__, block->idstr,
> + block->used_length);
> + goto out_locked;
> + }
> + memcpy(block->colo_cache, block->host, block->used_length);
> + }
> + rcu_read_unlock();
> + /*
> + * Record the dirty pages that sent by PVM, we use this dirty bitmap
> together
> + * with to decide which page in cache should be flushed into SVM's RAM.
> Here
> + * we use the same name 'ram_bitmap' as for migration.
> + */
> + if (ram_bytes_total()) {
> + RAMBlock *block;
> +
> + RAMBLOCK_FOREACH_NOT_IGNORED(block) {
> + unsigned long pages = block->max_length >> TARGET_PAGE_BITS;
> +
> + block->bmap = bitmap_new(pages);
> + bitmap_set(block->bmap, 0, pages);
> + }
> + }
> + ram_state = g_new0(RAMState, 1);
> + ram_state->migration_dirty_pages = 0;
> + qemu_mutex_init(&ram_state->bitmap_mutex);
> + memory_global_dirty_log_start();
> +
> + return 0;
> +
> +out_locked:
> +
> + RAMBLOCK_FOREACH_NOT_IGNORED(block) {
> + if (block->colo_cache) {
> + qemu_anon_ram_free(block->colo_cache, block->used_length);
> + block->colo_cache = NULL;
> + }
> + }
> +
> + rcu_read_unlock();
> + return -errno;
> +}
> +
> +/* It is need to hold the global lock to call this helper */
> +void colo_release_ram_cache(void)
> +{
> + RAMBlock *block;
> +
> + memory_global_dirty_log_stop();
> + RAMBLOCK_FOREACH_NOT_IGNORED(block) {
> + g_free(block->bmap);
> + block->bmap = NULL;
> + }
> +
> + rcu_read_lock();
> +
> + RAMBLOCK_FOREACH_NOT_IGNORED(block) {
> + if (block->colo_cache) {
> + qemu_anon_ram_free(block->colo_cache, block->used_length);
> + block->colo_cache = NULL;
> + }
> + }
> +
> + rcu_read_unlock();
> + qemu_mutex_destroy(&ram_state->bitmap_mutex);
> + g_free(ram_state);
> + ram_state = NULL;
> +}
> +
> +/**
> + * ram_load_setup: Setup RAM for migration incoming side
> + *
> + * Returns zero to indicate success and negative for error
> + *
> + * @f: QEMUFile where to receive the data
> + * @opaque: RAMState pointer
> + */
> +static int ram_load_setup(QEMUFile *f, void *opaque)
> +{
> + if (compress_threads_load_setup(f)) {
> + return -1;
> + }
> +
> + xbzrle_load_setup();
> + ramblock_recv_map_init();
> +
> + return 0;
> +}
> +
> +static int ram_load_cleanup(void *opaque)
> +{
> + RAMBlock *rb;
> +
> + RAMBLOCK_FOREACH_NOT_IGNORED(rb) {
> + if (ramblock_is_pmem(rb)) {
> + pmem_persist(rb->host, rb->used_length);
> + }
> + }
> +
> + xbzrle_load_cleanup();
> + compress_threads_load_cleanup();
> +
> + RAMBLOCK_FOREACH_NOT_IGNORED(rb) {
> + g_free(rb->receivedmap);
> + rb->receivedmap = NULL;
> + }
> +
> + return 0;
> +}
> +
> +/**
> + * ram_postcopy_incoming_init: allocate postcopy data structures
> + *
> + * Returns 0 for success and negative if there was one error
> + *
> + * @mis: current migration incoming state
> + *
> + * Allocate data structures etc needed by incoming migration with
> + * postcopy-ram. postcopy-ram's similarly names
> + * postcopy_ram_incoming_init does the work.
> + */
> +int ram_postcopy_incoming_init(MigrationIncomingState *mis)
> +{
> + return postcopy_ram_incoming_init(mis);
> +}
> +
> +/**
> + * ram_load_postcopy: load a page in postcopy case
> + *
> + * Returns 0 for success or -errno in case of error
> + *
> + * Called in postcopy mode by ram_load().
> + * rcu_read_lock is taken prior to this being called.
> + *
> + * @f: QEMUFile where to send the data
> + */
> +static int ram_load_postcopy(QEMUFile *f)
> +{
> + int flags = 0, ret = 0;
> + bool place_needed = false;
> + bool matches_target_page_size = false;
> + MigrationIncomingState *mis = migration_incoming_get_current();
> + /* Temporary page that is later 'placed' */
> + void *postcopy_host_page = postcopy_get_tmp_page(mis);
> + void *last_host = NULL;
> + bool all_zero = false;
> +
> + while (!ret && !(flags & RAM_SAVE_FLAG_EOS)) {
> + ram_addr_t addr;
> + void *host = NULL;
> + void *page_buffer = NULL;
> + void *place_source = NULL;
> + RAMBlock *block = NULL;
> + uint8_t ch;
> +
> + addr = qemu_get_be64(f);
> +
> + /*
> + * If qemu file error, we should stop here, and then "addr"
> + * may be invalid
> + */
> + ret = qemu_file_get_error(f);
> + if (ret) {
> + break;
> + }
> +
> + flags = addr & ~TARGET_PAGE_MASK;
> + addr &= TARGET_PAGE_MASK;
> +
> + trace_ram_load_postcopy_loop((uint64_t)addr, flags);
> + place_needed = false;
> + if (flags & (RAM_SAVE_FLAG_ZERO | RAM_SAVE_FLAG_PAGE)) {
> + block = ram_block_from_stream(f, flags);
> +
> + host = host_from_ram_block_offset(block, addr);
> + if (!host) {
> + error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
> + ret = -EINVAL;
> + break;
> + }
> + matches_target_page_size = block->page_size == TARGET_PAGE_SIZE;
> + /*
> + * Postcopy requires that we place whole host pages atomically;
> + * these may be huge pages for RAMBlocks that are backed by
> + * hugetlbfs.
> + * To make it atomic, the data is read into a temporary page
> + * that's moved into place later.
> + * The migration protocol uses, possibly smaller, target-pages
> + * however the source ensures it always sends all the components
> + * of a host page in order.
> + */
> + page_buffer = postcopy_host_page +
> + ((uintptr_t)host & (block->page_size - 1));
> + /* If all TP are zero then we can optimise the place */
> + if (!((uintptr_t)host & (block->page_size - 1))) {
> + all_zero = true;
> + } else {
> + /* not the 1st TP within the HP */
> + if (host != (last_host + TARGET_PAGE_SIZE)) {
> + error_report("Non-sequential target page %p/%p",
> + host, last_host);
> + ret = -EINVAL;
> + break;
> + }
> + }
> +
> +
> + /*
> + * If it's the last part of a host page then we place the host
> + * page
> + */
> + place_needed = (((uintptr_t)host + TARGET_PAGE_SIZE) &
> + (block->page_size - 1)) == 0;
> + place_source = postcopy_host_page;
> + }
> + last_host = host;
> +
> + switch (flags & ~RAM_SAVE_FLAG_CONTINUE) {
> + case RAM_SAVE_FLAG_ZERO:
> + ch = qemu_get_byte(f);
> + memset(page_buffer, ch, TARGET_PAGE_SIZE);
> + if (ch) {
> + all_zero = false;
> + }
> + break;
> +
> + case RAM_SAVE_FLAG_PAGE:
> + all_zero = false;
> + if (!matches_target_page_size) {
> + /* For huge pages, we always use temporary buffer */
> + qemu_get_buffer(f, page_buffer, TARGET_PAGE_SIZE);
> + } else {
> + /*
> + * For small pages that matches target page size, we
> + * avoid the qemu_file copy. Instead we directly use
> + * the buffer of QEMUFile to place the page. Note: we
> + * cannot do any QEMUFile operation before using that
> + * buffer to make sure the buffer is valid when
> + * placing the page.
> + */
> + qemu_get_buffer_in_place(f, (uint8_t **)&place_source,
> + TARGET_PAGE_SIZE);
> + }
> + break;
> + case RAM_SAVE_FLAG_EOS:
> + /* normal exit */
> + multifd_recv_sync_main();
> + break;
> + default:
> + error_report("Unknown combination of migration flags: %#x"
> + " (postcopy mode)", flags);
> + ret = -EINVAL;
> + break;
> + }
> +
> + /* Detect for any possible file errors */
> + if (!ret && qemu_file_get_error(f)) {
> + ret = qemu_file_get_error(f);
> + }
> +
> + if (!ret && place_needed) {
> + /* This gets called at the last target page in the host page */
> + void *place_dest = host + TARGET_PAGE_SIZE - block->page_size;
> +
> + if (all_zero) {
> + ret = postcopy_place_page_zero(mis, place_dest,
> + block);
> + } else {
> + ret = postcopy_place_page(mis, place_dest,
> + place_source, block);
> + }
> + }
> + }
> +
> + return ret;
> +}
> +
> +static bool postcopy_is_advised(void)
> +{
> + PostcopyState ps = postcopy_state_get();
> + return ps >= POSTCOPY_INCOMING_ADVISE && ps < POSTCOPY_INCOMING_END;
> +}
> +
> +static bool postcopy_is_running(void)
> +{
> + PostcopyState ps = postcopy_state_get();
> + return ps >= POSTCOPY_INCOMING_LISTENING && ps < POSTCOPY_INCOMING_END;
> +}
> +
> +/*
> + * Flush content of RAM cache into SVM's memory.
> + * Only flush the pages that be dirtied by PVM or SVM or both.
> + */
> +static void colo_flush_ram_cache(void)
> +{
> + RAMBlock *block = NULL;
> + void *dst_host;
> + void *src_host;
> + unsigned long offset = 0;
> +
> + memory_global_dirty_log_sync();
> + rcu_read_lock();
> + RAMBLOCK_FOREACH_NOT_IGNORED(block) {
> + migration_bitmap_sync_range(ram_state, block, block->used_length);
> + }
> + rcu_read_unlock();
> +
> + trace_colo_flush_ram_cache_begin(ram_state->migration_dirty_pages);
> + rcu_read_lock();
> + block = QLIST_FIRST_RCU(&ram_list.blocks);
> +
> + while (block) {
> + offset = migration_bitmap_find_dirty(ram_state, block, offset);
> +
> + if (offset << TARGET_PAGE_BITS >= block->used_length) {
> + offset = 0;
> + block = QLIST_NEXT_RCU(block, next);
> + } else {
> + migration_bitmap_clear_dirty(ram_state, block, offset);
> + dst_host = block->host + (offset << TARGET_PAGE_BITS);
> + src_host = block->colo_cache + (offset << TARGET_PAGE_BITS);
> + memcpy(dst_host, src_host, TARGET_PAGE_SIZE);
> + }
> + }
> +
> + rcu_read_unlock();
> + trace_colo_flush_ram_cache_end();
> +}
> +
> +static int ram_load(QEMUFile *f, void *opaque, int version_id)
> +{
> + int flags = 0, ret = 0, invalid_flags = 0;
> + static uint64_t seq_iter;
> + int len = 0;
> + /*
> + * If system is running in postcopy mode, page inserts to host memory
> must
> + * be atomic
> + */
> + bool postcopy_running = postcopy_is_running();
> + /* ADVISE is earlier, it shows the source has the postcopy capability on
> */
> + bool postcopy_advised = postcopy_is_advised();
> +
> + seq_iter++;
> +
> + if (version_id != 4) {
> + ret = -EINVAL;
> + }
> +
> + if (!migrate_use_compression()) {
> + invalid_flags |= RAM_SAVE_FLAG_COMPRESS_PAGE;
> + }
> + /* This RCU critical section can be very long running.
> + * When RCU reclaims in the code start to become numerous,
> + * it will be necessary to reduce the granularity of this
> + * critical section.
> + */
> + rcu_read_lock();
> +
> + if (postcopy_running) {
> + ret = ram_load_postcopy(f);
> + }
> +
> + while (!postcopy_running && !ret && !(flags & RAM_SAVE_FLAG_EOS)) {
> + ram_addr_t addr, total_ram_bytes;
> + void *host = NULL;
> + uint8_t ch;
> +
> + addr = qemu_get_be64(f);
> + flags = addr & ~TARGET_PAGE_MASK;
> + addr &= TARGET_PAGE_MASK;
> +
> + if (flags & invalid_flags) {
> + if (flags & invalid_flags & RAM_SAVE_FLAG_COMPRESS_PAGE) {
> + error_report("Received an unexpected compressed page");
> + }
> +
> + ret = -EINVAL;
> + break;
> + }
> +
> + if (flags & (RAM_SAVE_FLAG_ZERO | RAM_SAVE_FLAG_PAGE |
> + RAM_SAVE_FLAG_COMPRESS_PAGE | RAM_SAVE_FLAG_XBZRLE)) {
> + RAMBlock *block = ram_block_from_stream(f, flags);
> +
> + /*
> + * After going into COLO, we should load the Page into
> colo_cache.
> + */
> + if (migration_incoming_in_colo_state()) {
> + host = colo_cache_from_block_offset(block, addr);
> + } else {
> + host = host_from_ram_block_offset(block, addr);
> + }
> + if (!host) {
> + error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
> + ret = -EINVAL;
> + break;
> + }
> +
> + if (!migration_incoming_in_colo_state()) {
> + ramblock_recv_bitmap_set(block, host);
> + }
> +
> + trace_ram_load_loop(block->idstr, (uint64_t)addr, flags, host);
> + }
> +
> + switch (flags & ~RAM_SAVE_FLAG_CONTINUE) {
> + case RAM_SAVE_FLAG_MEM_SIZE:
> + /* Synchronize RAM block list */
> + total_ram_bytes = addr;
> + while (!ret && total_ram_bytes) {
> + RAMBlock *block;
> + char id[256];
> + ram_addr_t length;
> +
> + len = qemu_get_byte(f);
> + qemu_get_buffer(f, (uint8_t *)id, len);
> + id[len] = 0;
> + length = qemu_get_be64(f);
> +
> + block = qemu_ram_block_by_name(id);
> + if (block && !qemu_ram_is_migratable(block)) {
> + error_report("block %s should not be migrated !", id);
> + ret = -EINVAL;
> + } else if (block) {
> + if (length != block->used_length) {
> + Error *local_err = NULL;
> +
> + ret = qemu_ram_resize(block, length,
> + &local_err);
> + if (local_err) {
> + error_report_err(local_err);
> + }
> + }
> + /* For postcopy we need to check hugepage sizes match */
> + if (postcopy_advised &&
> + block->page_size != qemu_host_page_size) {
> + uint64_t remote_page_size = qemu_get_be64(f);
> + if (remote_page_size != block->page_size) {
> + error_report("Mismatched RAM page size %s "
> + "(local) %zd != %" PRId64,
> + id, block->page_size,
> + remote_page_size);
> + ret = -EINVAL;
> + }
> + }
> + if (migrate_ignore_shared()) {
> + hwaddr addr = qemu_get_be64(f);
> + bool ignored = qemu_get_byte(f);
> + if (ignored != ramblock_is_ignored(block)) {
> + error_report("RAM block %s should %s be
> migrated",
> + id, ignored ? "" : "not");
> + ret = -EINVAL;
> + }
> + if (ramblock_is_ignored(block) &&
> + block->mr->addr != addr) {
> + error_report("Mismatched GPAs for block %s "
> + "%" PRId64 "!= %" PRId64,
> + id, (uint64_t)addr,
> + (uint64_t)block->mr->addr);
> + ret = -EINVAL;
> + }
> + }
> + ram_control_load_hook(f, RAM_CONTROL_BLOCK_REG,
> + block->idstr);
> + } else {
> + error_report("Unknown ramblock \"%s\", cannot "
> + "accept migration", id);
> + ret = -EINVAL;
> + }
> +
> + total_ram_bytes -= length;
> + }
> + break;
> +
> + case RAM_SAVE_FLAG_ZERO:
> + ch = qemu_get_byte(f);
> + ram_handle_compressed(host, ch, TARGET_PAGE_SIZE);
> + break;
> +
> + case RAM_SAVE_FLAG_PAGE:
> + qemu_get_buffer(f, host, TARGET_PAGE_SIZE);
> + break;
> +
> + case RAM_SAVE_FLAG_COMPRESS_PAGE:
> + len = qemu_get_be32(f);
> + if (len < 0 || len > compressBound(TARGET_PAGE_SIZE)) {
> + error_report("Invalid compressed data length: %d", len);
> + ret = -EINVAL;
> + break;
> + }
> + decompress_data_with_multi_threads(f, host, len);
> + break;
> +
> + case RAM_SAVE_FLAG_XBZRLE:
> + if (load_xbzrle(f, addr, host) < 0) {
> + error_report("Failed to decompress XBZRLE page at "
> + RAM_ADDR_FMT, addr);
> + ret = -EINVAL;
> + break;
> + }
> + break;
> + case RAM_SAVE_FLAG_EOS:
> + /* normal exit */
> + multifd_recv_sync_main();
> + break;
> + default:
> + if (flags & RAM_SAVE_FLAG_HOOK) {
> + ram_control_load_hook(f, RAM_CONTROL_HOOK, NULL);
> + } else {
> + error_report("Unknown combination of migration flags: %#x",
> + flags);
> + ret = -EINVAL;
> + }
> + }
> + if (!ret) {
> + ret = qemu_file_get_error(f);
> + }
> + }
> +
> + ret |= wait_for_decompress_done();
> + rcu_read_unlock();
> + trace_ram_load_complete(ret, seq_iter);
> +
> + if (!ret && migration_incoming_in_colo_state()) {
> + colo_flush_ram_cache();
> + }
> + return ret;
> +}
> +
> +static bool ram_has_postcopy(void *opaque)
> +{
> + RAMBlock *rb;
> + RAMBLOCK_FOREACH_NOT_IGNORED(rb) {
> + if (ramblock_is_pmem(rb)) {
> + info_report("Block: %s, host: %p is a nvdimm memory, postcopy"
> + "is not supported now!", rb->idstr, rb->host);
> + return false;
> + }
> + }
> +
> + return migrate_postcopy_ram();
> +}
> +
> +/* Sync all the dirty bitmap with destination VM. */
> +static int ram_dirty_bitmap_sync_all(MigrationState *s, RAMState *rs)
> +{
> + RAMBlock *block;
> + QEMUFile *file = s->to_dst_file;
> + int ramblock_count = 0;
> +
> + trace_ram_dirty_bitmap_sync_start();
> +
> + RAMBLOCK_FOREACH_NOT_IGNORED(block) {
> + qemu_savevm_send_recv_bitmap(file, block->idstr);
> + trace_ram_dirty_bitmap_request(block->idstr);
> + ramblock_count++;
> + }
> +
> + trace_ram_dirty_bitmap_sync_wait();
> +
> + /* Wait until all the ramblocks' dirty bitmap synced */
> + while (ramblock_count--) {
> + qemu_sem_wait(&s->rp_state.rp_sem);
> + }
> +
> + trace_ram_dirty_bitmap_sync_complete();
> +
> + return 0;
> +}
> +
> +static void ram_dirty_bitmap_reload_notify(MigrationState *s)
> +{
> + qemu_sem_post(&s->rp_state.rp_sem);
> +}
> +
> +/*
> + * Read the received bitmap, revert it as the initial dirty bitmap.
> + * This is only used when the postcopy migration is paused but wants
> + * to resume from a middle point.
> + */
> +int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block)
> +{
> + int ret = -EINVAL;
> + QEMUFile *file = s->rp_state.from_dst_file;
> + unsigned long *le_bitmap, nbits = block->used_length >> TARGET_PAGE_BITS;
> + uint64_t local_size = DIV_ROUND_UP(nbits, 8);
> + uint64_t size, end_mark;
> +
> + trace_ram_dirty_bitmap_reload_begin(block->idstr);
> +
> + if (s->state != MIGRATION_STATUS_POSTCOPY_RECOVER) {
> + error_report("%s: incorrect state %s", __func__,
> + MigrationStatus_str(s->state));
> + return -EINVAL;
> + }
> +
> + /*
> + * Note: see comments in ramblock_recv_bitmap_send() on why we
> + * need the endianess convertion, and the paddings.
> + */
> + local_size = ROUND_UP(local_size, 8);
> +
> + /* Add paddings */
> + le_bitmap = bitmap_new(nbits + BITS_PER_LONG);
> +
> + size = qemu_get_be64(file);
> +
> + /* The size of the bitmap should match with our ramblock */
> + if (size != local_size) {
> + error_report("%s: ramblock '%s' bitmap size mismatch "
> + "(0x%"PRIx64" != 0x%"PRIx64")", __func__,
> + block->idstr, size, local_size);
> + ret = -EINVAL;
> + goto out;
> + }
> +
> + size = qemu_get_buffer(file, (uint8_t *)le_bitmap, local_size);
> + end_mark = qemu_get_be64(file);
> +
> + ret = qemu_file_get_error(file);
> + if (ret || size != local_size) {
> + error_report("%s: read bitmap failed for ramblock '%s': %d"
> + " (size 0x%"PRIx64", got: 0x%"PRIx64")",
> + __func__, block->idstr, ret, local_size, size);
> + ret = -EIO;
> + goto out;
> + }
> +
> + if (end_mark != RAMBLOCK_RECV_BITMAP_ENDING) {
> + error_report("%s: ramblock '%s' end mark incorrect: 0x%"PRIu64,
> + __func__, block->idstr, end_mark);
> + ret = -EINVAL;
> + goto out;
> + }
> +
> + /*
> + * Endianess convertion. We are during postcopy (though paused).
> + * The dirty bitmap won't change. We can directly modify it.
> + */
> + bitmap_from_le(block->bmap, le_bitmap, nbits);
> +
> + /*
> + * What we received is "received bitmap". Revert it as the initial
> + * dirty bitmap for this ramblock.
> + */
> + bitmap_complement(block->bmap, block->bmap, nbits);
> +
> + trace_ram_dirty_bitmap_reload_complete(block->idstr);
> +
> + /*
> + * We succeeded to sync bitmap for current ramblock. If this is
> + * the last one to sync, we need to notify the main send thread.
> + */
> + ram_dirty_bitmap_reload_notify(s);
> +
> + ret = 0;
> +out:
> + g_free(le_bitmap);
> + return ret;
> +}
> +
> +static int ram_resume_prepare(MigrationState *s, void *opaque)
> +{
> + RAMState *rs = *(RAMState **)opaque;
> + int ret;
> +
> + ret = ram_dirty_bitmap_sync_all(s, rs);
> + if (ret) {
> + return ret;
> + }
> +
> + ram_state_resume_prepare(rs, s->to_dst_file);
> +
> + return 0;
> +}
> +
> +static SaveVMHandlers savevm_ram_handlers = {
> + .save_setup = ram_save_setup,
> + .save_live_iterate = ram_save_iterate,
> + .save_live_complete_postcopy = ram_save_complete,
> + .save_live_complete_precopy = ram_save_complete,
> + .has_postcopy = ram_has_postcopy,
> + .save_live_pending = ram_save_pending,
> + .load_state = ram_load,
> + .save_cleanup = ram_save_cleanup,
> + .load_setup = ram_load_setup,
> + .load_cleanup = ram_load_cleanup,
> + .resume_prepare = ram_resume_prepare,
> +};
> +
> +void ram_mig_init(void)
> +{
> + qemu_mutex_init(&XBZRLE.lock);
> + register_savevm_live(NULL, "ram", 0, 4, &savevm_ram_handlers,
> &ram_state);
> +}
> diff --git a/migration/ram.c.rej b/migration/ram.c.rej
> new file mode 100644
> index 0000000000..1bcfb8066a
> --- /dev/null
> +++ b/migration/ram.c.rej
> @@ -0,0 +1,33 @@
> +--- migration/ram.c
> ++++ migration/ram.c
> +@@ -3224,15 +3253,30 @@ static int ram_state_init(RAMState **rsp)
> +
> + static void ram_list_init_bitmaps(void)
> + {
> ++ MigrationState *ms = migrate_get_current();
> + RAMBlock *block;
> + unsigned long pages;
> ++ uint8_t shift;
> +
> + /* Skip setting bitmap if there is no RAM */
> + if (ram_bytes_total()) {
> ++ shift = ms->clear_bitmap_shift;
> ++ if (shift > CLEAR_BITMAP_SHIFT_MAX) {
> ++ error_report("clear_bitmap_shift (%u) too big, using "
> ++ "max value (%u)", shift, CLEAR_BITMAP_SHIFT_MAX);
> ++ shift = CLEAR_BITMAP_SHIFT_MAX;
> ++ } else if (shift < CLEAR_BITMAP_SHIFT_MIN) {
> ++ error_report("clear_bitmap_shift (%u) too small, using "
> ++ "min value (%u)", shift, CLEAR_BITMAP_SHIFT_MIN);
> ++ shift = CLEAR_BITMAP_SHIFT_MIN;
> ++ }
> ++
> + RAMBLOCK_FOREACH_NOT_IGNORED(block) {
> + pages = block->max_length >> TARGET_PAGE_BITS;
> + block->bmap = bitmap_new(pages);
> + bitmap_set(block->bmap, 0, pages);
> ++ block->clear_bmap_shift = shift;
> ++ block->clear_bmap = bitmap_new(clear_bmap_size(pages, shift));
> + if (migrate_postcopy_ram()) {
> + block->unsentmap = bitmap_new(pages);
> + bitmap_set(block->unsentmap, 0, pages);
> diff --git a/migration/trace-events b/migration/trace-events
> index cd50a1e659..d8e54c367a 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -79,6 +79,7 @@ get_queued_page(const char *block_name, uint64_t
> tmp_offset, unsigned long page_
> get_queued_page_not_dirty(const char *block_name, uint64_t tmp_offset,
> unsigned long page_abs, int sent) "%s/0x%" PRIx64 " page_abs=0x%lx (sent=%d)"
> migration_bitmap_sync_start(void) ""
> migration_bitmap_sync_end(uint64_t dirty_pages) "dirty_pages %" PRIu64
> +migration_bitmap_clear_dirty(char *str, uint64_t start, uint64_t size,
> unsigned long page) "rb %s start 0x%"PRIx64" size 0x%"PRIx64" page 0x%lx"
> migration_throttle(void) ""
> multifd_recv(uint8_t id, uint64_t packet_num, uint32_t used, uint32_t flags,
> uint32_t next_packet_size) "channel %d packet_num %" PRIu64 " pages %d flags
> 0x%x next packet size %d"
> multifd_recv_sync_main(long packet_num) "packet num %ld"
> diff --git a/migration/trace-events.orig b/migration/trace-events.orig
> new file mode 100644
> index 0000000000..cd50a1e659
> --- /dev/null
> +++ b/migration/trace-events.orig
> @@ -0,0 +1,297 @@
> +# See docs/devel/tracing.txt for syntax documentation.
> +
> +# savevm.c
> +qemu_loadvm_state_section(unsigned int section_type) "%d"
> +qemu_loadvm_state_section_command(int ret) "%d"
> +qemu_loadvm_state_section_partend(uint32_t section_id) "%u"
> +qemu_loadvm_state_post_main(int ret) "%d"
> +qemu_loadvm_state_section_startfull(uint32_t section_id, const char *idstr,
> uint32_t instance_id, uint32_t version_id) "%u(%s) %u %u"
> +qemu_savevm_send_packaged(void) ""
> +loadvm_state_setup(void) ""
> +loadvm_state_cleanup(void) ""
> +loadvm_handle_cmd_packaged(unsigned int length) "%u"
> +loadvm_handle_cmd_packaged_main(int ret) "%d"
> +loadvm_handle_cmd_packaged_received(int ret) "%d"
> +loadvm_handle_recv_bitmap(char *s) "%s"
> +loadvm_postcopy_handle_advise(void) ""
> +loadvm_postcopy_handle_listen(void) ""
> +loadvm_postcopy_handle_run(void) ""
> +loadvm_postcopy_handle_run_cpu_sync(void) ""
> +loadvm_postcopy_handle_run_vmstart(void) ""
> +loadvm_postcopy_handle_resume(void) ""
> +loadvm_postcopy_ram_handle_discard(void) ""
> +loadvm_postcopy_ram_handle_discard_end(void) ""
> +loadvm_postcopy_ram_handle_discard_header(const char *ramid, uint16_t len)
> "%s: %ud"
> +loadvm_process_command(uint16_t com, uint16_t len) "com=0x%x len=%d"
> +loadvm_process_command_ping(uint32_t val) "0x%x"
> +postcopy_ram_listen_thread_exit(void) ""
> +postcopy_ram_listen_thread_start(void) ""
> +qemu_savevm_send_postcopy_advise(void) ""
> +qemu_savevm_send_postcopy_ram_discard(const char *id, uint16_t len) "%s: %ud"
> +savevm_command_send(uint16_t command, uint16_t len) "com=0x%x len=%d"
> +savevm_section_start(const char *id, unsigned int section_id) "%s,
> section_id %u"
> +savevm_section_end(const char *id, unsigned int section_id, int ret) "%s,
> section_id %u -> %d"
> +savevm_section_skip(const char *id, unsigned int section_id) "%s, section_id
> %u"
> +savevm_send_open_return_path(void) ""
> +savevm_send_ping(uint32_t val) "0x%x"
> +savevm_send_postcopy_listen(void) ""
> +savevm_send_postcopy_run(void) ""
> +savevm_send_postcopy_resume(void) ""
> +savevm_send_colo_enable(void) ""
> +savevm_send_recv_bitmap(char *name) "%s"
> +savevm_state_setup(void) ""
> +savevm_state_resume_prepare(void) ""
> +savevm_state_header(void) ""
> +savevm_state_iterate(void) ""
> +savevm_state_cleanup(void) ""
> +savevm_state_complete_precopy(void) ""
> +vmstate_save(const char *idstr, const char *vmsd_name) "%s, %s"
> +vmstate_load(const char *idstr, const char *vmsd_name) "%s, %s"
> +postcopy_pause_incoming(void) ""
> +postcopy_pause_incoming_continued(void) ""
> +
> +# vmstate.c
> +vmstate_load_field_error(const char *field, int ret) "field \"%s\" load
> failed, ret = %d"
> +vmstate_load_state(const char *name, int version_id) "%s v%d"
> +vmstate_load_state_end(const char *name, const char *reason, int val) "%s
> %s/%d"
> +vmstate_load_state_field(const char *name, const char *field) "%s:%s"
> +vmstate_n_elems(const char *name, int n_elems) "%s: %d"
> +vmstate_subsection_load(const char *parent) "%s"
> +vmstate_subsection_load_bad(const char *parent, const char *sub, const char
> *sub2) "%s: %s/%s"
> +vmstate_subsection_load_good(const char *parent) "%s"
> +vmstate_save_state_pre_save_res(const char *name, int res) "%s/%d"
> +vmstate_save_state_loop(const char *name, const char *field, int n_elems)
> "%s/%s[%d]"
> +vmstate_save_state_top(const char *idstr) "%s"
> +vmstate_subsection_save_loop(const char *name, const char *sub) "%s/%s"
> +vmstate_subsection_save_top(const char *idstr) "%s"
> +
> +# vmstate-types.c
> +get_qtailq(const char *name, int version_id) "%s v%d"
> +get_qtailq_end(const char *name, const char *reason, int val) "%s %s/%d"
> +put_qtailq(const char *name, int version_id) "%s v%d"
> +put_qtailq_end(const char *name, const char *reason) "%s %s"
> +
> +# qemu-file.c
> +qemu_file_fclose(void) ""
> +
> +# ram.c
> +get_queued_page(const char *block_name, uint64_t tmp_offset, unsigned long
> page_abs) "%s/0x%" PRIx64 " page_abs=0x%lx"
> +get_queued_page_not_dirty(const char *block_name, uint64_t tmp_offset,
> unsigned long page_abs, int sent) "%s/0x%" PRIx64 " page_abs=0x%lx (sent=%d)"
> +migration_bitmap_sync_start(void) ""
> +migration_bitmap_sync_end(uint64_t dirty_pages) "dirty_pages %" PRIu64
> +migration_throttle(void) ""
> +multifd_recv(uint8_t id, uint64_t packet_num, uint32_t used, uint32_t flags,
> uint32_t next_packet_size) "channel %d packet_num %" PRIu64 " pages %d flags
> 0x%x next packet size %d"
> +multifd_recv_sync_main(long packet_num) "packet num %ld"
> +multifd_recv_sync_main_signal(uint8_t id) "channel %d"
> +multifd_recv_sync_main_wait(uint8_t id) "channel %d"
> +multifd_recv_thread_end(uint8_t id, uint64_t packets, uint64_t pages)
> "channel %d packets %" PRIu64 " pages %" PRIu64
> +multifd_recv_thread_start(uint8_t id) "%d"
> +multifd_send(uint8_t id, uint64_t packet_num, uint32_t used, uint32_t flags,
> uint32_t next_packet_size) "channel %d packet_num %" PRIu64 " pages %d flags
> 0x%x next packet size %d"
> +multifd_send_sync_main(long packet_num) "packet num %ld"
> +multifd_send_sync_main_signal(uint8_t id) "channel %d"
> +multifd_send_sync_main_wait(uint8_t id) "channel %d"
> +multifd_send_thread_end(uint8_t id, uint64_t packets, uint64_t pages)
> "channel %d packets %" PRIu64 " pages %" PRIu64
> +multifd_send_thread_start(uint8_t id) "%d"
> +ram_discard_range(const char *rbname, uint64_t start, size_t len) "%s:
> start: %" PRIx64 " %zx"
> +ram_load_loop(const char *rbname, uint64_t addr, int flags, void *host) "%s:
> addr: 0x%" PRIx64 " flags: 0x%x host: %p"
> +ram_load_postcopy_loop(uint64_t addr, int flags) "@%" PRIx64 " %x"
> +ram_postcopy_send_discard_bitmap(void) ""
> +ram_save_page(const char *rbname, uint64_t offset, void *host) "%s: offset:
> 0x%" PRIx64 " host: %p"
> +ram_save_queue_pages(const char *rbname, size_t start, size_t len) "%s:
> start: 0x%zx len: 0x%zx"
> +ram_dirty_bitmap_request(char *str) "%s"
> +ram_dirty_bitmap_reload_begin(char *str) "%s"
> +ram_dirty_bitmap_reload_complete(char *str) "%s"
> +ram_dirty_bitmap_sync_start(void) ""
> +ram_dirty_bitmap_sync_wait(void) ""
> +ram_dirty_bitmap_sync_complete(void) ""
> +ram_state_resume_prepare(uint64_t v) "%" PRId64
> +colo_flush_ram_cache_begin(uint64_t dirty_pages) "dirty_pages %" PRIu64
> +colo_flush_ram_cache_end(void) ""
> +save_xbzrle_page_skipping(void) ""
> +save_xbzrle_page_overflow(void) ""
> +ram_save_iterate_big_wait(uint64_t milliconds, int iterations) "big wait: %"
> PRIu64 " milliseconds, %d iterations"
> +ram_load_complete(int ret, uint64_t seq_iter) "exit_code %d seq iteration %"
> PRIu64
> +
> +# migration.c
> +await_return_path_close_on_source_close(void) ""
> +await_return_path_close_on_source_joining(void) ""
> +migrate_set_state(const char *new_state) "new state %s"
> +migrate_fd_cleanup(void) ""
> +migrate_fd_error(const char *error_desc) "error=%s"
> +migrate_fd_cancel(void) ""
> +migrate_handle_rp_req_pages(const char *rbname, size_t start, size_t len)
> "in %s at 0x%zx len 0x%zx"
> +migrate_pending(uint64_t size, uint64_t max, uint64_t pre, uint64_t compat,
> uint64_t post) "pending size %" PRIu64 " max %" PRIu64 " (pre = %" PRIu64 "
> compat=%" PRIu64 " post=%" PRIu64 ")"
> +migrate_send_rp_message(int msg_type, uint16_t len) "%d: len %d"
> +migrate_send_rp_recv_bitmap(char *name, int64_t size) "block '%s' size
> 0x%"PRIi64
> +migration_completion_file_err(void) ""
> +migration_completion_postcopy_end(void) ""
> +migration_completion_postcopy_end_after_complete(void) ""
> +migration_return_path_end_before(void) ""
> +migration_return_path_end_after(int rp_error) "%d"
> +migration_thread_after_loop(void) ""
> +migration_thread_file_err(void) ""
> +migration_thread_ratelimit_pre(int ms) "%d ms"
> +migration_thread_ratelimit_post(int urgent) "urgent: %d"
> +migration_thread_setup_complete(void) ""
> +open_return_path_on_source(void) ""
> +open_return_path_on_source_continue(void) ""
> +postcopy_start(void) ""
> +postcopy_pause_return_path(void) ""
> +postcopy_pause_return_path_continued(void) ""
> +postcopy_pause_continued(void) ""
> +postcopy_start_set_run(void) ""
> +source_return_path_thread_bad_end(void) ""
> +source_return_path_thread_end(void) ""
> +source_return_path_thread_entry(void) ""
> +source_return_path_thread_loop_top(void) ""
> +source_return_path_thread_pong(uint32_t val) "0x%x"
> +source_return_path_thread_shut(uint32_t val) "0x%x"
> +source_return_path_thread_resume_ack(uint32_t v) "%"PRIu32
> +migration_thread_low_pending(uint64_t pending) "%" PRIu64
> +migrate_transferred(uint64_t tranferred, uint64_t time_spent, uint64_t
> bandwidth, uint64_t size) "transferred %" PRIu64 " time_spent %" PRIu64 "
> bandwidth %" PRIu64 " max_size %" PRId64
> +process_incoming_migration_co_end(int ret, int ps) "ret=%d postcopy-state=%d"
> +process_incoming_migration_co_postcopy_end_main(void) ""
> +
> +# channel.c
> +migration_set_incoming_channel(void *ioc, const char *ioctype) "ioc=%p
> ioctype=%s"
> +migration_set_outgoing_channel(void *ioc, const char *ioctype, const char
> *hostname, void *err) "ioc=%p ioctype=%s hostname=%s err=%p"
> +
> +# global_state.c
> +migrate_state_too_big(void) ""
> +migrate_global_state_post_load(const char *state) "loaded state: %s"
> +migrate_global_state_pre_save(const char *state) "saved state: %s"
> +
> +# rdma.c
> +qemu_rdma_accept_incoming_migration(void) ""
> +qemu_rdma_accept_incoming_migration_accepted(void) ""
> +qemu_rdma_accept_pin_state(bool pin) "%d"
> +qemu_rdma_accept_pin_verbsc(void *verbs) "Verbs context after listen: %p"
> +qemu_rdma_block_for_wrid_miss(const char *wcompstr, int wcomp, const char
> *gcompstr, uint64_t req) "A Wanted wrid %s (%d) but got %s (%" PRIu64 ")"
> +qemu_rdma_cleanup_disconnect(void) ""
> +qemu_rdma_close(void) ""
> +qemu_rdma_connect_pin_all_requested(void) ""
> +qemu_rdma_connect_pin_all_outcome(bool pin) "%d"
> +qemu_rdma_dest_init_trying(const char *host, const char *ip) "%s => %s"
> +qemu_rdma_dump_gid(const char *who, const char *src, const char *dst) "%s
> Source GID: %s, Dest GID: %s"
> +qemu_rdma_exchange_get_response_start(const char *desc) "CONTROL: %s
> receiving..."
> +qemu_rdma_exchange_get_response_none(const char *desc, int type) "Surprise:
> got %s (%d)"
> +qemu_rdma_exchange_send_issue_callback(void) ""
> +qemu_rdma_exchange_send_waiting(const char *desc) "Waiting for response %s"
> +qemu_rdma_exchange_send_received(const char *desc) "Response %s received."
> +qemu_rdma_fill(size_t control_len, size_t size) "RDMA %zd of %zd bytes
> already in buffer"
> +qemu_rdma_init_ram_blocks(int blocks) "Allocated %d local ram block
> structures"
> +qemu_rdma_poll_recv(const char *compstr, int64_t comp, int64_t id, int sent)
> "completion %s #%" PRId64 " received (%" PRId64 ") left %d"
> +qemu_rdma_poll_write(const char *compstr, int64_t comp, int left, uint64_t
> block, uint64_t chunk, void *local, void *remote) "completions %s (%" PRId64
> ") left %d, block %" PRIu64 ", chunk: %" PRIu64 " %p %p"
> +qemu_rdma_poll_other(const char *compstr, int64_t comp, int left) "other
> completion %s (%" PRId64 ") received left %d"
> +qemu_rdma_post_send_control(const char *desc) "CONTROL: sending %s.."
> +qemu_rdma_register_and_get_keys(uint64_t len, void *start) "Registering %"
> PRIu64 " bytes @ %p"
> +qemu_rdma_registration_handle_compress(int64_t length, int index, int64_t
> offset) "Zapping zero chunk: %" PRId64 " bytes, index %d, offset %" PRId64
> +qemu_rdma_registration_handle_finished(void) ""
> +qemu_rdma_registration_handle_ram_blocks(void) ""
> +qemu_rdma_registration_handle_ram_blocks_loop(const char *name, uint64_t
> offset, uint64_t length, void *local_host_addr, unsigned int src_index) "%s:
> @0x%" PRIx64 "/%" PRIu64 " host:@%p src_index: %u"
> +qemu_rdma_registration_handle_register(int requests) "%d requests"
> +qemu_rdma_registration_handle_register_loop(int req, int index, uint64_t
> addr, uint64_t chunks) "Registration request (%d): index %d, current_addr %"
> PRIu64 " chunks: %" PRIu64
> +qemu_rdma_registration_handle_register_rkey(int rkey) "0x%x"
> +qemu_rdma_registration_handle_unregister(int requests) "%d requests"
> +qemu_rdma_registration_handle_unregister_loop(int count, int index, uint64_t
> chunk) "Unregistration request (%d): index %d, chunk %" PRIu64
> +qemu_rdma_registration_handle_unregister_success(uint64_t chunk) "%" PRIu64
> +qemu_rdma_registration_handle_wait(void) ""
> +qemu_rdma_registration_start(uint64_t flags) "%" PRIu64
> +qemu_rdma_registration_stop(uint64_t flags) "%" PRIu64
> +qemu_rdma_registration_stop_ram(void) ""
> +qemu_rdma_resolve_host_trying(const char *host, const char *ip) "Trying %s
> => %s"
> +qemu_rdma_signal_unregister_append(uint64_t chunk, int pos) "Appending
> unregister chunk %" PRIu64 " at position %d"
> +qemu_rdma_signal_unregister_already(uint64_t chunk) "Unregister chunk %"
> PRIu64 " already in queue"
> +qemu_rdma_unregister_waiting_inflight(uint64_t chunk) "Cannot unregister
> inflight chunk: %" PRIu64
> +qemu_rdma_unregister_waiting_proc(uint64_t chunk, int pos) "Processing
> unregister for chunk: %" PRIu64 " at position %d"
> +qemu_rdma_unregister_waiting_send(uint64_t chunk) "Sending unregister for
> chunk: %" PRIu64
> +qemu_rdma_unregister_waiting_complete(uint64_t chunk) "Unregister for chunk:
> %" PRIu64 " complete."
> +qemu_rdma_write_flush(int sent) "sent total: %d"
> +qemu_rdma_write_one_block(int count, int block, uint64_t chunk, uint64_t
> current, uint64_t len, int nb_sent, int nb_chunks) "(%d) Not clobbering:
> block: %d chunk %" PRIu64 " current %" PRIu64 " len %" PRIu64 " %d %d"
> +qemu_rdma_write_one_post(uint64_t chunk, long addr, long remote, uint32_t
> len) "Posting chunk: %" PRIu64 ", addr: 0x%lx remote: 0x%lx, bytes %" PRIu32
> +qemu_rdma_write_one_queue_full(void) ""
> +qemu_rdma_write_one_recvregres(int mykey, int theirkey, uint64_t chunk)
> "Received registration result: my key: 0x%x their key 0x%x, chunk %" PRIu64
> +qemu_rdma_write_one_sendreg(uint64_t chunk, int len, int index, int64_t
> offset) "Sending registration request chunk %" PRIu64 " for %d bytes, index:
> %d, offset: %" PRId64
> +qemu_rdma_write_one_top(uint64_t chunks, uint64_t size) "Writing %" PRIu64 "
> chunks, (%" PRIu64 " MB)"
> +qemu_rdma_write_one_zero(uint64_t chunk, int len, int index, int64_t offset)
> "Entire chunk is zero, sending compress: %" PRIu64 " for %d bytes, index: %d,
> offset: %" PRId64
> +rdma_add_block(const char *block_name, int block, uint64_t addr, uint64_t
> offset, uint64_t len, uint64_t end, uint64_t bits, int chunks) "Added Block:
> '%s':%d, addr: %" PRIu64 ", offset: %" PRIu64 " length: %" PRIu64 " end: %"
> PRIu64 " bits %" PRIu64 " chunks %d"
> +rdma_block_notification_handle(const char *name, int index) "%s at %d"
> +rdma_delete_block(void *block, uint64_t addr, uint64_t offset, uint64_t len,
> uint64_t end, uint64_t bits, int chunks) "Deleted Block: %p, addr: %" PRIu64
> ", offset: %" PRIu64 " length: %" PRIu64 " end: %" PRIu64 " bits %" PRIu64 "
> chunks %d"
> +rdma_start_incoming_migration(void) ""
> +rdma_start_incoming_migration_after_dest_init(void) ""
> +rdma_start_incoming_migration_after_rdma_listen(void) ""
> +rdma_start_outgoing_migration_after_rdma_connect(void) ""
> +rdma_start_outgoing_migration_after_rdma_source_init(void) ""
> +
> +# postcopy-ram.c
> +postcopy_discard_send_finish(const char *ramblock, int nwords, int ncmds)
> "%s mask words sent=%d in %d commands"
> +postcopy_discard_send_range(const char *ramblock, unsigned long start,
> unsigned long length) "%s:%lx/%lx"
> +postcopy_cleanup_range(const char *ramblock, void *host_addr, size_t offset,
> size_t length) "%s: %p offset=0x%zx length=0x%zx"
> +postcopy_init_range(const char *ramblock, void *host_addr, size_t offset,
> size_t length) "%s: %p offset=0x%zx length=0x%zx"
> +postcopy_nhp_range(const char *ramblock, void *host_addr, size_t offset,
> size_t length) "%s: %p offset=0x%zx length=0x%zx"
> +postcopy_place_page(void *host_addr) "host=%p"
> +postcopy_place_page_zero(void *host_addr) "host=%p"
> +postcopy_ram_enable_notify(void) ""
> +mark_postcopy_blocktime_begin(uint64_t addr, void *dd, uint32_t time, int
> cpu, int received) "addr: 0x%" PRIx64 ", dd: %p, time: %u, cpu: %d,
> already_received: %d"
> +mark_postcopy_blocktime_end(uint64_t addr, void *dd, uint32_t time, int
> affected_cpu) "addr: 0x%" PRIx64 ", dd: %p, time: %u, affected_cpu: %d"
> +postcopy_pause_fault_thread(void) ""
> +postcopy_pause_fault_thread_continued(void) ""
> +postcopy_ram_fault_thread_entry(void) ""
> +postcopy_ram_fault_thread_exit(void) ""
> +postcopy_ram_fault_thread_fds_core(int baseufd, int quitfd) "ufd: %d quitfd:
> %d"
> +postcopy_ram_fault_thread_fds_extra(size_t index, const char *name, int fd)
> "%zd/%s: %d"
> +postcopy_ram_fault_thread_quit(void) ""
> +postcopy_ram_fault_thread_request(uint64_t hostaddr, const char *ramblock,
> size_t offset, uint32_t pid) "Request for HVA=0x%" PRIx64 " rb=%s
> offset=0x%zx pid=%u"
> +postcopy_ram_incoming_cleanup_closeuf(void) ""
> +postcopy_ram_incoming_cleanup_entry(void) ""
> +postcopy_ram_incoming_cleanup_exit(void) ""
> +postcopy_ram_incoming_cleanup_join(void) ""
> +postcopy_ram_incoming_cleanup_blocktime(uint64_t total) "total blocktime %"
> PRIu64
> +postcopy_request_shared_page(const char *sharer, const char *rb, uint64_t
> rb_offset) "for %s in %s offset 0x%"PRIx64
> +postcopy_request_shared_page_present(const char *sharer, const char *rb,
> uint64_t rb_offset) "%s already %s offset 0x%"PRIx64
> +postcopy_wake_shared(uint64_t client_addr, const char *rb) "at 0x%"PRIx64"
> in %s"
> +
> +get_mem_fault_cpu_index(int cpu, uint32_t pid) "cpu: %d, pid: %u"
> +
> +# exec.c
> +migration_exec_outgoing(const char *cmd) "cmd=%s"
> +migration_exec_incoming(const char *cmd) "cmd=%s"
> +
> +# fd.c
> +migration_fd_outgoing(int fd) "fd=%d"
> +migration_fd_incoming(int fd) "fd=%d"
> +
> +# socket.c
> +migration_socket_incoming_accepted(void) ""
> +migration_socket_outgoing_connected(const char *hostname) "hostname=%s"
> +migration_socket_outgoing_error(const char *err) "error=%s"
> +
> +# tls.c
> +migration_tls_outgoing_handshake_start(const char *hostname) "hostname=%s"
> +migration_tls_outgoing_handshake_error(const char *err) "err=%s"
> +migration_tls_outgoing_handshake_complete(void) ""
> +migration_tls_incoming_handshake_start(void) ""
> +migration_tls_incoming_handshake_error(const char *err) "err=%s"
> +migration_tls_incoming_handshake_complete(void) ""
> +
> +# colo.c
> +colo_vm_state_change(const char *old, const char *new) "Change '%s' => '%s'"
> +colo_send_message(const char *msg) "Send '%s' message"
> +colo_receive_message(const char *msg) "Receive '%s' message"
> +
> +# colo-failover.c
> +colo_failover_set_state(const char *new_state) "new state %s"
> +
> +# block-dirty-bitmap.c
> +send_bitmap_header_enter(void) ""
> +send_bitmap_bits(uint32_t flags, uint64_t start_sector, uint32_t nr_sectors,
> uint64_t data_size) "flags: 0x%x, start_sector: %" PRIu64 ", nr_sectors: %"
> PRIu32 ", data_size: %" PRIu64
> +dirty_bitmap_save_iterate(int in_postcopy) "in postcopy: %d"
> +dirty_bitmap_save_complete_enter(void) ""
> +dirty_bitmap_save_complete_finish(void) ""
> +dirty_bitmap_save_pending(uint64_t pending, uint64_t max_size) "pending %"
> PRIu64 " max: %" PRIu64
> +dirty_bitmap_load_complete(void) ""
> +dirty_bitmap_load_bits_enter(uint64_t first_sector, uint32_t nr_sectors)
> "chunk: %" PRIu64 " %" PRIu32
> +dirty_bitmap_load_bits_zeroes(void) ""
> +dirty_bitmap_load_header(uint32_t flags) "flags 0x%x"
> +dirty_bitmap_load_enter(void) ""
> +dirty_bitmap_load_success(void) ""
> --
> 2.21.0
>
>
--
Dr. David Alan Gilbert / address@hidden / Manchester, UK
- [Qemu-devel] [PULL 11/19] bitmap: Add bitmap_copy_with_{src|dst}_offset(), (continued)
- [Qemu-devel] [PULL 11/19] bitmap: Add bitmap_copy_with_{src|dst}_offset(), Juan Quintela, 2019/07/11
- [Qemu-devel] [PULL 05/19] migration/xbzrle: update cache and current_data in one place, Juan Quintela, 2019/07/11
- [Qemu-devel] [PULL 09/19] migration: No need to take rcu during sync_dirty_bitmap, Juan Quintela, 2019/07/11
- [Qemu-devel] [PULL 03/19] migration-test: Add migration multifd test, Juan Quintela, 2019/07/11
- [Qemu-devel] [PULL 12/19] memory: Pass mr into snapshot_and_clear_dirty, Juan Quintela, 2019/07/11
- [Qemu-devel] [PULL 04/19] migration/multifd: call multifd_send_sync_main when sending RAM_SAVE_FLAG_EOS, Juan Quintela, 2019/07/11
- [Qemu-devel] [PULL 17/19] kvm: Support KVM_CLEAR_DIRTY_LOG, Juan Quintela, 2019/07/11
- [Qemu-devel] [PULL 15/19] kvm: Persistent per kvmslot dirty bitmap, Juan Quintela, 2019/07/11
- [Qemu-devel] [PULL 16/19] kvm: Introduce slots lock for memory listener, Juan Quintela, 2019/07/11
- [Qemu-devel] [PULL 18/19] migration: Split log_clear() into smaller chunks, Juan Quintela, 2019/07/11
- Re: [Qemu-devel] [PULL 18/19] migration: Split log_clear() into smaller chunks,
Dr. David Alan Gilbert <=
- Re: [Qemu-devel] [PULL 00/19] Migration patches, Paolo Bonzini, 2019/07/11
- Re: [Qemu-devel] [PULL 00/19] Migration patches, Peter Maydell, 2019/07/11
- Re: [Qemu-devel] [PULL 00/19] Migration patches, Christian Borntraeger, 2019/07/11
- Re: [Qemu-devel] [PULL 00/19] Migration patches, no-reply, 2019/07/11
- Re: [Qemu-devel] [PULL 00/19] Migration patches, no-reply, 2019/07/12