Re: [Qemu-devel] [PATCH v9 00/56] Postcopy implementation

From: Vladimir Sementsov-Ogievskiy
Subject: Re: [Qemu-devel] [PATCH v9 00/56] Postcopy implementation
Date: Wed, 27 Jan 2016 17:47:01 +0300
Ok, thanks, I'll think about it.

On 27.01.2016 15:42, Dr. David Alan Gilbert wrote:
* Vladimir Sementsov-Ogievskiy (address@hidden) wrote:
Hello, Dr. Gilbert!
Hi Vladimir,

As I understand this is not a complete realization of post-copy stage of
migration, but only realization of ram post-copy.
Yes, that's correct; RAM was the only one we'd had much interest in; but
I can understand wanting to help block as well.

I need to implement
post-copy migration of block-dirty-bitmaps. And it should work with/without
ram post-copy. Did you plan a possibility of post-copying something except
ram? If yes, can you please help me with an interface? Should I implement
separate thread like postcopy_ram_listen_thread and other things, or
something may be reused?
I think with the structure that's in, it should be possible to add
another device doing postcopy as well.  Although maybe it's worth checking
whether these dirty bitmaps behave similarly enough to a non-guest visible
RAM block, so then the existing migrate code would handle it.

1) I think you're probably best reusing the postcopy_ram_listen_thread
for all postcopy data on the load side;  I think that's probably already
OK to do that; just rename it.  The migration stream once in postcopy
mode is just still a normal migration stream, so it can still have a mix
of RAM and disk or whatever.

2) The tricky bit I can think of is on the sending side making sure
you interleave the requests for RAM and disk requests sensibly; at the moment
when in postcopy we drop the bandwidth limit so it'll keep doing RAM transfers
and respond to ram requests as soon as possible;  you'll need to persuade
it to somehow realise there's a disk request and get the RAM code to
drop out of it's loop to give the disk code a chance.  There's already
the opportunity to improve the behaviour with just RAM to stop it flooding
the network buffers with background page requests which delay the responses
to the postcopy requests.

My knowledge of the block code is pretty minimal, so I've not had a chance
to follow much of your dirty-bitmap series.


On 05.11.2015 21:10, Dr. David Alan Gilbert (git) wrote:
From: "Dr. David Alan Gilbert" <address@hidden>

   This is the 9th cut of my version of postcopy.

The userfaultfd linux kernel code is now in the upstream kernel
tree, and so 4.3 can be used without modification.

This qemu series can be found at:
on the wp3-postcopy-v9 tag

Testing status:
   * Tested heavily on x86
   * Smoke tested on aarch64 (so it does work on different page sizes)

This work has been partially funded by the EU Orbit project:
   see http://www.orbitproject.eu/about/

   Almost all of the changes are changes from review comments, and most are

   The following are new patches, mostly split out from earlier patches
   (The exception being adding userfaultfd.h header back in - which was in v5
    but we took out, but is needed again due to 1842bdfd)

     04/56 Move page_size_init earlier
     08/56 qemu_ram_block_by_name
     12/56 Factor out host_from_stream_offset call and check
     15/56 Add Linux userfaultfd.h header back
     21/56 migration_is_setup_or_active
     30/56 migration_completion: Take current state
     34/56 Maintain unsentmap

   The previous patches 03,10,13,36/54 went in upstream already.

   Fix for assert using hotplug (Thanks Bharata for spotting that)
   Fix for migrate_cancel a second time
   Rework for migration_bitmap_rcu after Denis's deadlock fix
   Rework ram_load into a separate postcopy loop
   The 'sentmap' is now an 'unsentmap' - this saves a complement step at the end
      The unsentmap creation is now split into a separate patch
      The unsentmap is now stored in the RCU structure (although it can't
        really resize during the migrate)
   Fix for block migration
      still not a suggested combination.
      trace_savevm_send_open_return_path added
      split migration_is_active function into separate patch and made
      move file reads into migrate_handle_advise and migrate_handle_packaged
      use of MIN in send-packaged-chunk
      migration_thread_started -> migration_thread_running
      updated qemu_get_buffer_in_place with Juan's version (and size_t'ified it)
      split the host_from_stream_offset change into a separate patch
        'ram_load: Factor out host_from_stream_offset call and check'
      split out ram_find_block_by_id into a separate patch and now
        called qemu_ram_block_by_name
      postcopy_discard_send_range etc now take start/length rather than 
        (also added another trace)
      split ram_save_host_page into ram_save_host_page/ram_save_target_page
      split host page cleanup function into a core that handles both passes


Dr. David Alan Gilbert (56):
   Add postcopy documentation
   Provide runtime Target page information
   Move configuration section writing
   Move page_size_init earlier
   Rename mis->file to from_src_file
   Add qemu_get_buffer_in_place to avoid copies some of the time
   Add wrapper for setting blocking status on a QEMUFile
   ram_debug_dump_bitmap: Dump a migration bitmap as text
   ram_load: Factor out host_from_stream_offset call and check
   migrate_init: Call from savevm
   Rename save_live_complete to save_live_complete_precopy
   Add Linux userfaultfd.h header
   Return path: Open a return path on QEMUFile for sockets
   Return path: socket_writev_buffer: Block even on non-blocking fd's
   Migration commands
   Return path: Control commands
   Return path: Send responses from destination to source
   Return path: Source handling of return path
   Rework loadvm path for subloops
   Add migration-capability boolean for postcopy-ram.
   Add wrappers and handlers for sending/receiving the postcopy-ram
     migration messages.
   MIG_CMD_PACKAGED: Send a packaged chunk of migration stream
   Modify save_live_pending for postcopy
   postcopy: OS support test
   migrate_start_postcopy: Command to trigger transition to postcopy
   migration_completion: Take current state
   MIGRATION_STATUS_POSTCOPY_ACTIVE: Add new migration state
   Avoid sending vmdescription during postcopy
   Add qemu_savevm_state_complete_postcopy
   Postcopy: Maintain unsentmap
   Postcopy: Calculate discard
   postcopy: Incoming initialisation
   postcopy: ram_enable_notify to switch on userfault
   Postcopy: Postcopy startup in migration thread
   Postcopy: End of iteration
   Page request:  Add MIG_RP_MSG_REQ_PAGES reverse command
   Page request: Process incoming page request
   Page request: Consume pages off the post-copy queue
   postcopy_ram.c: place_page and helpers
   Postcopy: Use helpers to map pages during migration
   postcopy: Check order of received target pages
   Don't sync dirty bitmaps in postcopy
   Don't iterate on precopy-only devices during postcopy
   Host page!=target page: Cleanup bitmaps
   Round up RAMBlock sizes to host page sizes
   Postcopy; Handle userfault requests
   Start up a postcopy/listener thread ready for incoming page data
   postcopy: Wire up loadvm_postcopy_handle_ commands
   Postcopy: Mark nohugepage before discard
   End of migration for postcopy
   Disable mlock around incoming postcopy
   Inhibit ballooning during postcopy

Best regards,
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.

Dr. David Alan Gilbert / address@hidden / Manchester, UK

Best regards,
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.

