qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] GSoC Application 2016 (Postcopy Migration: Recovery from a


From: Ashi
Subject: [Qemu-devel] GSoC Application 2016 (Postcopy Migration: Recovery from a broken network connection)
Date: Thu, 17 Mar 2016 00:06:39 +0530
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0

Hi all,

I have completed my GSoC Application for the project idea regarding postcopy migration. I would like to get views from the qemu community before I finally post it.

Proposal Timeline:

The progress will be tracked through weekly email updates and blog posts
documenting the work.


-> Before April 30th:

1. Familiarize myself with QEMU source code of postcopy migration process and identify the parts of the code which will need modifications.

-> April 30th - May 23rd:

2. Familiarize myself with the live migration process as a whole and its
real time statistics.
3. Remain in constant touch with the community and finalize the design.

-> May 23rd - June 28th:

1. Stage 1: Pausing the VM (10-14 days)
        At network failure, don't kill the destination by calling
qemu_file_shutdown, else pause the VMs and make the destination listen to a connection by some socket_listen and migration_incoming functions to setup a different network. The source should try and remember its migration state.
2. Stage 2: Re-establish the network (10-12 days)
Try to reconnect source to the destination to carry on the transfer of remaining requested pages. 3. Testing the working of above changes and prepare a prototype of a backup migration file.

-> June 21st - 28th:

Midterm evaluations of the work done so far. Making required changes in the code to improve its functionality and bug fixes. Document the work done.

-> June 28th - July 20th:

1. Stage 3: Hunting the missing pages: (14-20 days)
        While the connection is getting re-established, start a recovery
thread find_missing_pages at the destination VM that iterates all the
memory to find all the missing pages by page fault mechanism and requests the same using the reverse communication channel when the connection resumes. Fit the received pages in the slots using remap_anon_pages atomically.

-> July 20th - August 16th:

1. Stage 4: Backup migration file (20-25 days)
Use a migration backup file to be used for recovery if we fail to resume the network in several attempts. The file will be used only in
emergency after a certain time boundary so that we don't lose the device
state and complete the migration. The tricky bit will be to keep the file size small. 2. Prepare the documentation of the project, write tests, cleanup the code and final evaluation of the mentor.

-> August 16th - August 30th:
1. Most of the time will be used in rigorous testing and bug fixes.
2. Complete the documentation.
3. Final submission of the code, documentation and test results to Google.


Any small suggestions are welcomed and will be very helpful.

Thanks!

Ashijeet Acharya



reply via email to

[Prev in Thread] Current Thread [Next in Thread]