qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v5 13/21] migration: Postcopy recover with preempt enabled


From: manish.mishra
Subject: Re: [PATCH v5 13/21] migration: Postcopy recover with preempt enabled
Date: Mon, 16 May 2022 20:21:23 +0530
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.9.0


On 16/05/22 7:41 pm, Peter Xu wrote:
Hi, Manish,

On Mon, May 16, 2022 at 07:01:35PM +0530, manish.mishra wrote:
On 26/04/22 5:08 am, Peter Xu wrote:
LGTM,
Peter, I wanted to give review-tag for this and ealier patch too. I am new
to qemu
review process so not sure how give review-tag, did not find any reference
on
google too. So if you please let me know how to do it.
It's here:

https://urldefense.proofpoint.com/v2/url?u=https-3A__git.qemu.org_-3Fp-3Dqemu.git-3Ba-3Dblob-3Bf-3Ddocs_devel_submitting-2Da-2Dpatch.rst-3Bhb-3DHEAD-23l492&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=c4KON2DiMd-szjwjggQcuUvTsPWblztAL0gVzaHnNmc&m=8LU6rphEJ5GMSXEpSxe8JZ_hpn6TQDUXfjWM6Vt7DdShxnU3X5zYXbAMBLPYchdK&s=TUNUCtdl7LWhrdlfnIx1F08kC0d9IMvArl6cNMpfXkc&e= 

Since afaict QEMU is mostly following what Linux does, you can also
reference to this one with more context:

https://urldefense.proofpoint.com/v2/url?u=https-3A__www.kernel.org_doc_html_v4.17_process_submitting-2Dpatches.html-23using-2Dreported-2Dby-2Dtested-2Dby-2Dreviewed-2Dby-2Dsuggested-2Dby-2Dand-2Dfixes&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=c4KON2DiMd-szjwjggQcuUvTsPWblztAL0gVzaHnNmc&m=8LU6rphEJ5GMSXEpSxe8JZ_hpn6TQDUXfjWM6Vt7DdShxnU3X5zYXbAMBLPYchdK&s=TJmr_eC4LAccVY1EqgkLleXfJhUgtIjTJmLc3cedYr0&e= 

But since you're still having question regarding this patch, no rush on
providing your R-bs; let's finish the discussion first.

[...]

+static void postcopy_pause_ram_fast_load(MigrationIncomingState *mis)
+{
+    trace_postcopy_pause_fast_load();
+    qemu_mutex_unlock(&mis->postcopy_prio_thread_mutex);
I may have misunderstood synchronisation here but very very rare chance,

as both threads are working independently is it possible qemu_sem_post on

postcopy_pause_sem_fast_load is called by main thread even before we go

to qemu_sem_wait in next line, causing some kind of deadlock. That's should

not be possible as i understand it requires manually calling
qmp_migration_recover

so chances are almost impossible. Just wanted to confirm it.
Sorry I don't quite get the question, could you elaborate?  E.g., when the
described deadlock happened, what is both main thread and preempt load
thread doing?  What are they waiting at?

Note: we have already released mutex before waiting on sem.

What i meant here is deadlock could be due the reason that we infinately wait

on qemu_sem_wait(&mis->postcopy_pause_sem_fast_load), because main

thread already called post on postcopy_pause_sem_fast_load after recovery

even before we moved to qemu_sem_wait(&mis->postcopy_pause_sem_fast_load)

in next line. Basically if we miss a post on postcopy_pause_sem_fast_load.

This is nearly impossibily case becuase it requires full recovery path to be completed

before this thread executes just next line. Also as recovery needs to be called manually,

So please ignore this.

Basically i wanted to check if we should use something like

int pthread_cond_wait(pthread_cond_t *restrict cond, 
                   pthread_mutex_t *restrict mutex);

so that there is no race between releasing mutex and calling wait.



        
+    qemu_sem_wait(&mis->postcopy_pause_sem_fast_load);
Just wanted to confirm why postcopy_pause_incoming is not called here
itself.
postcopy_pause_incoming() is only used in the main ram load thread, while
this function (postcopy_pause_ram_fast_load) is only called by the preempt
load thread.

ok got it, thanks Peter, i meant if we should close both the channels as soon

as we relise there is some failure instead of main thread waiting for error event

and then closing and pausing post-copy. But agree current approach is good.

Is it based on assumption that if there is error in any of the channel it
will

eventually be paused on source side, closing both channels, resulting

postcopy_pause_incoming will be called from main thread on destination?
Yes.

Usually it should be good to call as early as possible. It is left to main

thread default path so that we do not have any synchronisation overhead?
What's the sync overhead you mentioned? What we want to do here is simply
to put all the dest QEMU migration threads into a halted state rather than
quitting, so that they can be continued when necessary.

Also Peter, i was trying to understand postcopy recovery model so is use
case

of qmp_migrate_pause just for debugging purpose?
Yes.  It's also a way to cleanly stop using the network (comparing to force
unplug the nic ports?) for whatever reason with a shutdown() syscall upon
the socket.  I just don't know whether there's any real use case of that in
reality.

Thanks,


reply via email to

[Prev in Thread] Current Thread [Next in Thread]