|
From: | Vladimir Sementsov-Ogievskiy |
Subject: | Re: [Qemu-devel] [PATCH] migration: invalidate cache before source start |
Date: | Tue, 26 Jun 2018 10:31:25 +0300 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 |
25.06.2018 21:03, John Snow wrote:
On 06/25/2018 01:50 PM, Dr. David Alan Gilbert wrote:* Dr. David Alan Gilbert (address@hidden) wrote:* Vladimir Sementsov-Ogievskiy (address@hidden) wrote:15.06.2018 15:06, Dr. David Alan Gilbert wrote:* Vladimir Sementsov-Ogievskiy (address@hidden) wrote:Invalidate cache before source start in case of failed migration. Signed-off-by: Vladimir Sementsov-Ogievskiy <address@hidden>Why doesn't the code at the bottom of migration_completion, fail_invalidate: and the code in migrate_fd_cancel handle this? What case did you see it in that those didn't handle? (Also I guess it probably should set s->block_inactive = false)on source I see: address@hidden:migrate_set_state new state 7 address@hidden:migration_thread_file_err address@hidden:migration_thread_after_loop so, we are leaving loop on if (qemu_file_get_error(s->to_dst_file)) { migrate_set_state(&s->state, current_active_state, MIGRATION_STATUS_FAILED); trace_migration_thread_file_err(); break; } and skip migration_completion()Yeh, OK; I'd seen soemthing else a few days ago, where a cancellation test that had previously ended with a 'cancelled' state has now ended up in 'failed' (which is the state 7 you have above). I suspect there's something else going on as well; I think what is supposed to happen in the case of 'cancel' is that it spins in 'cancelling' for a while in migrate_fd_cancel and then at the bottom of migrate_fd_cancel it does the recovery, but because it's going to failed instead, then it's jumping over that recovery.Going back and actually looking at the patch again; can I ask for 1 small change; Can you set s->block_inactive = false in the case where you don't get the local_err (Like we do at the bottom of migrate_fd_cancel) Does that make sense? Thanks, DaveVladimir, one more question for you because I'm not as familiar with this code: In the normal case we need to invalidate the qcow2 cache as a way to re-engage the disk (yes?) when we have failed during the late-migration steps. In this case, we seem to be observing a failure during the bulk transfer loop. Why is it important to invalidate the cache at this step -- would the disk have been inactivated yet? It shouldn't, because it's in the bulk transfer phase -- or am I missing something? I feel like this code is behaving in a way that's fairly surprising for a casual reader so I was hoping you could elaborate for me. --js
In my case, source is already in postcopy state, when error occured, so it is inactivated.
-- Best regards, Vladimir
[Prev in Thread] | Current Thread | [Next in Thread] |