qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 3/7] support UFFD write fault processing in ram_save_itera


From: Peter Xu
Subject: Re: [PATCH v3 3/7] support UFFD write fault processing in ram_save_iterate()
Date: Tue, 24 Nov 2020 10:17:29 -0500

On Tue, Nov 24, 2020 at 11:02:09AM +0300, Andrey Gruzdev wrote:
> On 24.11.2020 00:34, Peter Xu wrote:
> > On Fri, Nov 20, 2020 at 07:53:34PM +0300, Andrey Gruzdev wrote:
> > > On 20.11.2020 19:43, Peter Xu wrote:
> > > > On Fri, Nov 20, 2020 at 07:15:07PM +0300, Andrey Gruzdev wrote:
> > > > > Yeah, I think we can re-use the postcopy queue code for faulting 
> > > > > pages. I'm
> > > > > worring a little about some additional overhead dealing with urgent 
> > > > > request
> > > > > semaphore. Also, the code won't change a lot, something like:
> > > > > 
> > > > > [...]
> > > > >           /* In case of 'write-tracking' migration we first try
> > > > >            * to poll UFFD and sse if we have write page fault event */
> > > > >           poll_fault_page(rs);
> > > > > 
> > > > >           again = true;
> > > > >           found = get_queued_page(rs, &pss);
> > > > > 
> > > > >           if (!found) {
> > > > >               /* priority queue empty, so just search for something 
> > > > > dirty */
> > > > >               found = find_dirty_block(rs, &pss, &again);
> > > > >           }
> > > > > [...]
> > > > 
> > > > Could I ask what's the "urgent request semaphore"?  Thanks,
> > > > 
> > > 
> > > These function use it (the correct name is 'rate_limit_sem'):
> > > 
> > > void migration_make_urgent_request(void)
> > > {
> > >      qemu_sem_post(&migrate_get_current()->rate_limit_sem);
> > > }
> > > 
> > > void migration_consume_urgent_request(void)
> > > {
> > >      qemu_sem_wait(&migrate_get_current()->rate_limit_sem);
> > > }
> > > 
> > > They are called from ram_save_queue_pages and unqueue_page, accordingly, 
> > > to
> > > control migration rate limiter.
> > > 
> > > bool migration_rate_limit(void)
> > > {
> > > [...]
> > >          /*
> > >           * Wait for a delay to do rate limiting OR
> > >           * something urgent to post the semaphore.
> > >           */
> > >          int ms = s->iteration_start_time + BUFFER_DELAY - now;
> > >          trace_migration_rate_limit_pre(ms);
> > >          if (qemu_sem_timedwait(&s->rate_limit_sem, ms) == 0) {
> > >              /*
> > >               * We were woken by one or more urgent things but
> > >               * the timedwait will have consumed one of them.
> > >               * The service routine for the urgent wake will dec
> > >               * the semaphore itself for each item it consumes,
> > >               * so add this one we just eat back.
> > >               */
> > >              qemu_sem_post(&s->rate_limit_sem);
> > >              urgent = true;
> > >          }
> > > [...]
> > > }
> > > 
> > 
> > Hmm... Why its overhead could be a problem?  If it's an overhead that can be
> > avoided, then postcopy might want that too.
> > 
> > The thing is I really feel like the snapshot logic can leverage the whole 
> > idea
> > of existing postcopy (like get_queued_page/unqueue_page; it's probably due 
> > to
> > the fact that both of them want to "migrate some more urgent pages than the
> > background migration, due to either missing/wrprotected pages"), but I might
> > have something missing.
> > 
> > Thanks,
> > 
> 
> I don't think this overhead itself is a problem since its negligible
> compared to other stuff.. You're undoubtly correct about using postcopy idea
> in case when wr-fault pages arrive from the separate thread or external
> source. Then we really need to decouple that separate thread or external
> requestor from the migration thread.
> 
> In this patch series wr-faults arrive in the same migration loop with normal
> scan, so I don't see direct reason to pass it through the queue, but I might
> have missed something from your proposal.

I see your point.  For me whether using a thread is not extremely important -
actually we can have a standalone thread to handle the vcpu faults too just
like postcopy; it's just run on the src host.  My major point is that we should
avoid introducing the extra pss logic because fundamentally it's doing the same
thing as get_queued_page() right now, unless I'm wrong...

So I think your previous poll_fault_page() call is okay; assuming the same poll
model as you used, I mean something like this:

------8<-------
diff --git a/migration/ram.c b/migration/ram.c
index 7811cde643..718dd276c7 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1473,6 +1473,14 @@ static bool get_queued_page(RAMState *rs, 
PageSearchStatus *pss)
 
     } while (block && !dirty);
 
+    if (!block) {
+        /*
+         * Set the block too if live snapshot is enabled; that's when we have
+         * vcpus got blocked by the wr-protected pages.
+         */
+        block = poll_fault_page(rs, &offset);
+    }
+
     if (block) {
         /*
          * As soon as we start servicing pages out of order, then we have
------8<-------

So as long as we set the block and offset, pss will be updated just like
postcopy.  That's exactly what we want, am I right?

> 
> Do you mean that if we use postcopy logic, we'll allocate memory and make
> copies of faulting pages and then immediately un-protect them?
> Or we just put faulting page address to the queued item and release
> protection after page content has been saved.

I think current approach would be fine, so we don't copy page and unprotect at
a single place after the page is dumped to the precopy stream.  If you could
measure the latencies then it'll be even better, then we'll know what to expect.

IIRC the "copy page first" idea came from Denis's initial version, in which I
thought as too aggresive since potentially it can eat twice the memory on the
host for the single guest, not to mention when live snapshot is taken for
mutliple guests on the same host.  It's just not something that can be directly
expected when the user wants to take a snapshot.  That's something we can do
upon this work, imho, if we'll have very high latency on resolving the page
faults. But that's still the last thing to try (or at least an "by default off"
option) so we can still think about some other solution before trying to eat
too much host mem.

Thanks,

-- 
Peter Xu




reply via email to

[Prev in Thread] Current Thread [Next in Thread]